https://www.postgresql.org/docs/10/app-pgbasebackup.html
https://www.postgresql.org/docs/9.4/app-pgbasebackup.html
http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-REPLICATION
http://brandonkonkle.com/blog/2010/oct/20/postgres-9-streaming-replication-and-django-balanc/
http://eggie5.com/15-setting-up-pg9-streaming-replication
http://blog.railsupgrade.com/2011/02/streaming-replication-in-postgresql-91.html
http://bradmontgomery.net/blog/streaming-replication-in-postgresql-91/
http://tektastic.com/2011/11/why-i-like-postgresql.html
http://packetcloud.net/2010/12/09/setting-up-streaming-replication-in-postgresql-9-0/
There is 2 ways to fix a posgresql slave
First way : rsync (quicker than the other, but less safe)
second way : pg_backup
Check the version of the master and the slave in production
su - postgres pg_lsclusters
for this script we have the postgresql datadir in /data but it could be in /var/lib please check before doing something wrong
On master and Slave :
apt-get install rsync
put this script replication_pgsql.sh in the $homedir of the postgres user master
#!/bin/sh datadir='/data/postgresql/9.6/main/' slave='pgsql-stats-prod-ro-mk-001' time /usr/bin/rsync \ --rsh="ssh -2 -l postgres" \ --verbose \ --progress \ --stats \ --archive \ --delete \ /data/postgresql/9.6/main/ $slave:/data/postgresql/9.6/main/ time /usr/bin/rsync \ --rsh="ssh -2 -l postgres" \ --verbose \ --progress \ --stats \ --archive \ --delete \ /data/postgresql/walshipping/9.6/main/ $slave:/data/postgresql/walshipping/9.6/main/
We need that the listen directive to be on stars : postgresql.conf
listen_addresses = '*'
wal logs need to be active on both master and slave
wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 32
Alternative postgresql 9.6 and > :
wal_level = replica
Slave need to be allowed to replicate on the master : pg_hba.conf
host replication postgres 10.191.1.0/24 trust
Reload on the master to take in count the new pg_hba.conf
pg_ctlcluster 9.6 main reload
Postgresql on the slave need to be off
/etc/init.d/postgresql stop
On the master
su - postgres psql SELECT pg_start_backup('checkpoint1');
on the slave we clean the datadir WARNING TOU CAN LOOSE DATA IF YOU FAILED HERE
rm -rf /data/postgresql/9.6/main/*
On the Master
su - postgres sh replication_pgsql.sh
When the rsync is finish you can come back on the psql master shell and say :
SELECT pg_stop_backup();
On slave you should be able to launch the postgresql
/etc/init.t/postgresql start
If all is ok you should see a streaming process son of postgresql master process on slave
postgres 25526 0.0 0.1 420428 90528 ? S 07:40 0:00 /usr/lib/postgresql/9.6/bin/postgres -D /data/postgresql/9.6/main -c config_file=/etc/postgre postgres 25527 10.3 0.2 423116 153848 ? Ss 07:40 54:46 \_ postgres: 9.6/main: startup process recovering 0000000300008D1B00000056 postgres 25544 0.3 0.2 421604 140280 ? Ss 07:40 1:52 \_ postgres: 9.6/main: checkpointer process postgres 25545 1.0 0.2 420568 137724 ? Ss 07:40 5:34 \_ postgres: 9.6/main: writer process postgres 27059 0.0 0.0 143056 3844 ? Ss 10:57 0:05 \_ postgres: 9.6/main: stats collector process postgres 27060 1.3 0.0 425248 11064 ? Ss 10:57 4:24 \_ postgres: 9.6/main: wal receiver process streaming 8D1B/562937C0
On slave you have to clean the datadir (see how to do that in rsync procedure)
rm -rf /data/postgresql/9.6/main/*
Postgresql >= 9.6 and after on the slave in postgres user :
pg_basebackup -h hostname_of_the_master -D /data/postgresql/9.6/main/ -R -P -U postgres --xlog-method=stream
Postgresql >= 10 and after on the slave in postgres user :
pg_basebackup -h hostname_of_the_master -D /data/postgresql/9.6/main/ -R -P -U postgres --wal-method=stream
on the master force a checkpoint for the backup to launch
su - postgres psql CHECKPOINT;
The backup of the master data will be deployed on the slave (should take a lot of time) Be sure to have enough space on the slave to host all the datadir space !
When the pg_backup restored is finish you'll get a very long restoration on the slave :
postgres: 9.6/main: startup process recovering 0000000300008D1B00000056
Once the recovering is finish you'll get a streaming replication from the slave :
postgres 25526 0.0 0.1 420428 90528 ? S 07:40 0:00 /usr/lib/postgresql/9.6/bin/postgres -D /data/postgresql/9.6/main -c config_file=/etc/postgre postgres 25527 9.9 0.2 423116 153848 ? Ss 07:40 55:00 \_ postgres: 9.6/main: startup process recovering 0000000300008D1B00000087 postgres 25544 0.3 0.2 421604 140280 ? Ss 07:40 1:54 \_ postgres: 9.6/main: checkpointer process postgres 25545 1.0 0.2 420568 137724 ? Ss 07:40 5:51 \_ postgres: 9.6/main: writer process postgres 27059 0.0 0.0 143056 3844 ? Ss 10:57 0:05 \_ postgres: 9.6/main: stats collector process postgres 27060 1.2 0.0 425248 11064 ? Ds 10:57 4:37 \_ postgres: 9.6/main: wal receiver process streaming 8D1B/87ECCEF8 postgres 29494 0.0 0.0 421716 13680 ? Ss 16:33 0:00 \_ postgres: 9.6/main: postgres postgres [local] idle
On master :
postgres=# **SELECT * from pg_stat_replication;** pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+---------+---------------+----------------+----------------+-----------------+---------------+------------ 20985 | 10 | postgres | walreceiver | 10.24.0.xx | | 57756 | 2019-04-17 10:57:22.725532+02 | | catchup | 8D18/47A00000 | 8D18/474E0000 | 8D18/47000000 | 8D18/46FFFA60 | 0 | async (1 row)
check the binary position on master and diff with the slave :
postgres=# SELECT pg_current_xlog_location(); pg_current_xlog_location -------------------------- 8D1B/6EC44000 (1 row)
On slave :
postgres=# **SELECT pg_last_xlog_receive_location();** pg_last_xlog_receive_location ------------------------------- 8D1B/654D0A70 (1 row)
SELECT pg_current_xlog_location(); --- on master SELECT pg_last_xlog_receive_location(); --- on slave SELECT pg_last_xlog_replay_location(); --- on slave
#!/bin/sh # si la replication a plus de 600s entre deux checkpoint alors elle est cassé timeout=600 control='/usr/lib/postgresql/9.xx/bin/pg_controldata' datadir='/var/lib/postgresql/9.xx/main/' last_check=`$control $datadir | grep 'Time of latest checkpoint' | awk '{print $5,$6,$7,$8,$9}' | date +%s` now=`date +%s` if [ `expr $now - $last_check` -gt $timeout ]; then echo Replication cassé else echo replication ok fi
to know timelag in pgsql replication
SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;