PREAMBLE
It goes without saying that if you want to set up replication then you need a master and a slave postgresql server. There is a bit of configuration to do in order to get it running.
This how to was originally written for older versions of Postgresql but has been updated to reflect the significant changes in later versions. It has been tested against version 15.
WARNING: Before continuing with this tutorial you should ensure that you can connect to both servers using SSL as replicating over the internet without ssl is a little bit insane. See here
We are using sha-256 encryption for passwords (which is now the default) so the first thing we want to do is edit postgresql.conf on the master and check that password_encryption is set correctly.
vi /etc/postgresql/15/main/postgresql.conf
NOTE: The directory path will change depending on the version.
Look for the “password_encryption” directive and change it to “scram-sha-256”. This line may be commented out, if it is then uncomment it.
password_encryption = scram-sha-256 # md5 or scram-sha-256
If you have already set a password for your replication user that is not sha-256 encoded then you will have to set it again or you will not be able to authenticate. The reason quoted below is explained in a stack overflow question here.
Each user password hash is saved in the table pg_authid. It includes the hashing algorithm that is used to transform the password to its hash. When setting the password_encryption in postgresql.conf, you are setting the default encryption, i.e. the one used when creating a user or when (re)setting your password. The table pg_authid is not updated. When changing pg_hba.conf, you are saying to accept only passwords hashed using the given method. The table pg_authid is not updated.
MASTER
On the master the first thing we need to do is create a user for replication purposes so start a postgresql shell and add a user.
CREATE ROLE replicate LOGIN REPLICATION ENCRYPTED PASSWORD 'mySecretPassword';
While we are in the postgresql shell we also want to create a slot. For some reason we create a slot using a SLEECT statement, not sure why, but it is unimportant in any event.
SELECT * FROM pg_create_physical_replication_slot('replicate');
The slot does not have to have the same name as the user, we just use “replicate” in our example for the sake of simplicity. NOTE: It is not necessary to create a slot as postgresql will create a temporary slot if one isn’t explicitly created.
Now we can exit our postgresql client shell and move on to editing our pg_hba.conf file, usually located in /etc/postgresql/11/main
# TYPE DATABASE USER ADDRESS METHOD ... hostssl replication replicate xxx.xxx.xxx.xxx/xx scram-sha-256 ....
IMPORTANT: Replace xxx.xxx.xxx.xxx/xx with the ip and subnet mask of the slave.
The line highlighted above will allow the user replicate to use the built in role ‘replication’ from the slave. The type is set to ‘hostssl‘ to force the use of ssl encryption and ‘scram-sha-256‘ is the method used to decrypt the password.
Next you need to edit the postgresql.conf file to tell the server that it is acting as a master server.
#vi /etc/postgresql/11/main/postgresql.conf
edit the file as follows, uncommenting anything that is commented out.
listen_addresses = 'localhost,xxx.xxx.xxx.xxx' wal_level = replica max_wal_senders = 10 wal_keep_size = 1024MB
Note that xxx.xxx.xxx.xxx is the address of the interface you want to listen on, you can add several addresses here if you are listening on the WAN port and the LAN port for example, albeit if this is the case you may just wish to use a wildcard instead.
listen_address = '*'
Once you have done that you can restart the master and move on to configuring the slave.
#service postgresql restart
SLAVE
Before you configure the slave you need to create an ssl certificate using the Certificate Authority that you should have already created on the master (this is a prerequisite).
Once you have generated your certificate you need to copy the .key and .crt files that you have generated across to the slave along with the postgresCA.crt file. These should be in the /pgsql folder.
Make sure that the postgres user owns them and that they are accessible and that the .key file has limited permissions.
chown postgres:postgres /pgsql/*.crt chown postgres:postgres /pgsql/*.key chmod 400 /pgsql/*.key
Next stop the slave server
#service postgresql stop
Now you can configure the slave in a similar manner to the master. Firstly edit postgresql.conf.
WARNING: If you have a recovery.conf file in your data directory you need to get rid of it or postgresql will not start. More information provided here https://www.cybertec-postgresql.com/en/recovery-conf-is-gone-in-postgresql-v12/ (Acknowledgement).
Find the following directives in the file and set/uncomment them paying particular attention to the primary_conninfo line where you put the connection string.
NOTE that the text between the ‘ ‘ after the primary_conninfo directive is all on the same line.
You will need to change the host, password, sslcert and sslkey directives to reflect your configuration.
primary_conninfo = 'host=XXX.XXX.XXX.XXX port=5432 user=replicate password=mysecretpassword sslmode=require sslcert=/pgsql/slave.crt sslkey=/pgsql/slave.key sslrootcert=/pgsql/postgresCA.crt' primary_slot_name = 'replicate' # replication slot on sending server #promote_trigger_file = '' # file name whose presence ends recovery hot_standby = on # "off" disallows queries during recovery # (change requires restart) #max_standby_archive_delay = 30s # max delay before canceling queries # when reading WAL from archive; # -1 allows indefinite delay #max_standby_streaming_delay = 30s # max delay before canceling queries # when reading streaming WAL; # -1 allows indefinite delay wal_receiver_create_temp_slot = on # create temp slot if primary_slot_name # is not set wal_receiver_status_interval = 10s # send replies at least this often # 0 disables hot_standby_feedback = on # send info from standby to prevent # query conflicts #wal_receiver_timeout = 60s # time that receiver waits for # communication from primary # in milliseconds; 0 disables wal_retrieve_retry_interval = 5s # time to wait before retrying to # retrieve WAL after a failed attempt #recovery_min_apply_delay = 0 # minimum delay for applying changes during recovery
Next we need to remove any existing data and copy the folders from the master
rm -R /pgsql/15 pg_basebackup -v -h xxx.xxx.xxx.xxx -U replicate -P -D /pgsql/data --slot replicate chown -R postgres:postgres /pgsql/15
where xxx.xxx.xxx.xxx is the ip address of the master. If you have set up postgresql according to this site then the data directory will be /pgsql. Note you may also need to set the permissions if you are not logged in as the postgres user.
You will be prompted for the ‘replicate‘ user’s password. If all is well then the script should terminate with something like
177156/177156 kB (100%), 1/1 tablespace
The slave’s data folder should now be populated with a copy of the master.
NOTE: If you have issues performing this task then you need to resolve them before proceeding.
Now in order for our slave to know it is acting as a standby server we need to create an empty text file in the data directory.
touch /pgsql/15/standby.signal chown postgres:postgres /pgsql/15/standby.signal
You should now be able to start the server and all should work as it should.
systemctl start postgrtesql
Troubleshooting
The log files in /var/log/postgresql on the master and the slave may help resolve problems.
On the master you can issue the following in the psql client
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn FROM pg_stat_replication;
which should return a record. The state should be “streaming”
Most of the problems are caused by permissions so review permissions on the data directory and for the ssl certificates.
Double check the connection string paying attention to the names of the certificates and make sure that they match the files stored in /pgsql/
On the slave from the psql client run
\x SELECT * FROM pg_stat_wal_receiver;
Again there should be a record for the master.
Check the password for the replication user is encoded with sha-256
SELECT rolpassword from pg_authid where rolname = 'replicate';
rollpassword should begin with SCRAM-SHA-256
If there is no result then check that the user exists.
NOTE: One of my databases has lots of roles that cause \du to bring up reams of gibberish. For this reason I use this lookup query instead:
SELECT usename AS role_name, CASE WHEN usesuper AND usecreatedb THEN CAST('superuser, create database' AS pg_catalog.text) WHEN usesuper THEN CAST('superuser' AS pg_catalog.text) WHEN usecreatedb THEN CAST('create database' AS pg_catalog.text) ELSE CAST('' AS pg_catalog.text) END role_attributes FROM pg_catalog.pg_user ORDER BY role_name desc;
WAL files not being deleted
For some reason the WAL files used for replication do not always get deleted and simply build up until you run out of space on the partition.
To get rid of them manually from psql -U postgres run
postgres=# SELECT slot_name, lpad((pg_control_checkpoint()).timeline_id::text, 8, '0') || lpad(split_part(restart_lsn::text, '/', 1), 8, '0') || lpad(substr(split_part(restart_lsn::text, '/', 2), 1, 2), 8, '0') AS wal_file FROM pg_replication_slots; slot_name | wal_file -----------+-------------------------- postgres2 | 0000000100000000000000B0 (1 row) postgres=# SELECT pg_drop_replication_slot('postgres2'); pg_drop_replication_slot -------------------------- (1 row)
and the shortly after most of the WAL files will have disappeared. Change “postgres2” in the second statement to whatever “slot_name” is returned in the first.
Acknowledgement: https://stackoverflow.com/questions/49539938/postgres-wal-file-not-getting-deleted
Recent Comments