Zero to PostgreSQL streaming replication in 10 mins

I’ve found a number of articles on the web for setting up streaming replication in PostgreSQL, but none of them seemed to actually put everything together that I needed, and none of them seemed to use the new pg_basebackup in PostgreSQL 9.1 and later. So with that in mind, here are a set of steps you can use to set up streaming replication, over the internet if you wish, using an encrypted SSL connection. We’re not going to set up log archiving – we’re going to rely solely on the streaming replication for now.

I’m assuming you have a master server set up on Ubuntu 10.04 or 12.04, running PostgreSQL 9.2.x, and you have a new slave server set up on the same OS and pg version. The IP of the master is 1.2.3.4, and the IP of the slave is 5.6.7.8.

First, create the replication user on the master:

Note that we are using REPLICATION permissions, rather than creating a superuser.

Next, configure the master for streaming replication. Edit postgresql.conf (on Ubuntu, this is at /etc/postgresql/9.2/main/postgresql.conf):

We’re configuring 8 WAL segments here; each is 16MB. If you expect your database to have more than 128MB of changes in the time it will take to make a copy of it across the network to your slave, or in the time you expect your slave to be down for maintenance or something, then consider increasing those values.

Then edit the access control on the master to allow the connection from the slave; in pg_hba.conf (/etc/postgresql/9.2/main/pg_hba.conf on Ubuntu):

In this case, 5.6.7.8 is the IP address of the slave that will be connecting for replication, and hostssl means this host can only connect via SSL.

You’ll need to restart the master after making the above changes.

Now on to the slave. In the slave’s postgresql.conf, add the following:

Then restart the slave. No changes are required in the slave’s pg_hba.conf specifically to support replication. You’ll still need to make whatever entries you need in order to connect to it from your application and run queries, if you wish.

That’s all the initial setup we need to do. After you’ve done the above configuration, running the following script on the slave will copy the database over and begin replication (1.2.3.4 is the IP of the master):

That script will stop the slave, delete the old slave cluster directory, run pg_basebackup connecting to the master to copy over the databases (you’ll see the progress as it goes), create a new recovery.conf file, and start the slave. If you look at the logs after this script completes, you should hopefully see messages about it having reached a consistent recovery state.

Be careful – that script is going to delete the old database cluster on your slave, so make sure to read through it and understand what it’s doing.

At this point, you can try writing data to the master database, and you should see it mirrored over to the slave. To check the replication status, you can run the following on the master:

If you found this guide handy, you might also find the PostgreSQL 9 High Availability Cookbook useful as well!

79 thoughts on “Zero to PostgreSQL streaming replication in 10 mins

    1. KajMagnus

      It worked great with 9.1 for me too :-) when I fixed a 9.1 streaming replication setup that had broken.
      Basically there were just two thing to do: the pg_basebackup command, and then copying back my old recovery.conf.

      Reply
  1. daniel

    Thanks for this article, very useful. Is creating the trigger file, then reverting back to master a simple task?

    Reply
    1. gregr Post author

      Well, creating the trigger file is easy enough. Reverting back to master, I believe, would require generally the same steps in reverse – set up the old master as a slave, synchronize, and fail over.

      Reply
    1. gregr Post author

      Yeah…I’m not sure why it was doing that. I just switched to a different plugin for displaying gists – hopefully this one will work better.

      Reply
  2. Marios Zindilis

    This also works on CentOS, with the minor difference that the “main” directory is named “data”. Mind that “pg_basebackup” will overwrite postgresql.conf on CentOS, so the changes you made prior to “pg_basebackup” will be lost, unless you backup your slave’s posgresql.conf and restore it afterwards. Other that than, everything works, and thanks for all your effort in writing this!

    Reply
    1. gregr Post author

      With the configuration in the article, at least on Ubuntu, fail over is pretty trivial:

      touch /tmp/postgresql.trigger

      is all it takes. That will terminate the replication, configure the slave to be writable, and will move your recovery.conf to recovery.done – at that point you should be good to go.

      Reply
  3. arunnitt

    when i access slave db server, it says “the database system is starting up”. but replication is work. how can i check data in slave db server?

    =>psql -U postgres
    psql: FATAL: the database system is starting up

    Reply
      1. mike

        Daniel & arunnitt – I’m not sure if you were able to resolve your issue, but I had the same problem. I solved this by moving my postgresql.conf and pg_hba.conf files out of the directory which gets removed in the step “Cleaning up old cluster directory”, and then moving them back before starting the server. Otherwise your slave will startup with whatever configurations are in your master’s data directory. In my case, the configurations in my master’s data directory were not actually even being used and were incorrect (i didn’t set up the master myself).

      2. Ryan Stelly

        Hello! This is occurring because the behavior of database after executing the given script makes the slave machine essentially just a back up. It is still getting the streamed XLOGs, but you can’t access it from a client in this state. If you want to instead use the slave (in a read-only manner) you need to change the hot_standby option in your postgresql.conf to on. You should be able to psql into it then.

    1. John

      The answer is that, in postgresql.conf, it needs to be wal_level = ‘hot_standby’ and hot_standby = ‘on’. They must be single-quoted. It took me far too long to discover this, as most blog posts on the subject do not quote their config values.

      Reply
      1. gregr Post author

        John, that must be specific to a specific OS or postgres version; I can say with absolute certainty that quotes are not required on Ubuntu 12.04 with PostgreSQL 9.3.4.

  4. Martin

    Very useful tutorial, thanks! It works on Postgresql 9.3 x64.

    One thing – it might be worth mentioning that you need to switch on SSL if you use the SSL connection (on Windows that’s not automatic…).

    Reply
  5. yunus

    My OS is Centos 5.7 and postgres-9.3. After running the backup script upon making changes related to Centos mentioned in this forum, the postgres on Slave fails to start with the following error.

    >LOG: invalid checkpoint record
    FATAL: could not locate required checkpoint record
    HINT: If you are not restoring from a backup, try removing the file “/var/lib/pgsql/9.3/data/backup_label”.
    LOG: startup process (PID 1646) exited with exit code 1
    LOG: aborting startup due to startup process failure

    Can you anyone please advise how to fix this error.

    Reply
    1. Phil Hord

      This happens because your backup took long enough that you’ve missed a few wal files before it finished. For example, when your backup starts the master is on WAL file A. The slave gets a backup_label that tells him to recover from point “A” in the WAL history. By the time the backup finishes, though, the master has moved on to WAL file E. The slave wants to begin recovery at point A, but when streams from the master he can only find WAL “E”.

      One solution is to use “-X stream” or “-X fetch” in your pg_basebackup command. You may also need to increase the wal_keep_segments setting on your master to ensure enough segments are kept around to span the amount of time it takes to perform the backup.

      Reply
  6. Jossafat Barón

    Hi!
    I tried this, but on the script’s step (pg_basebackup) I had to modify the postgres.conf because I had an error, changing archive_mode=on, then the script asked for an archive_command, so on the same postgresql.conf I put this: archive_command = ‘cp %p /opt/pgsql91/archive/%f’ but it launched an error: ‘Verify that your archive_command is running correctly’, what am I doing wrong? I think that my archive_command is wrong, also if I skip the archive_command the error is the same.

    Reply
    1. Ironchunk

      Has the folder of your archive owner and group labled as ‘postgres’? If not the archive_command can’t write on folders that not contain owner/group ‘postgres’.

      Reply
  7. rcpeters

    I’m wondering about
    trigger_file = ‘/tmp/postgresql.trigger’

    All files in /tmp get deleted on restart, so if the slave get rebooted for any reason it will come back up thinking it’s a master. Am I interpreting the documentation incorrectly?

    Reply
    1. John

      Absence of the trigger file doesn’t cause any action. The trigger file is supposed to be absent on the slave until you need to do a failover. At which point you create the trigger file on the slave (or some monitoring software creates the trigger file), then the slave becomes the new master. Once it becomes the master it stops caring about the trigger file, so it doesn’t matter if the trigger file disappears at some later time after the promotion.

      Reply
  8. gadekarr

    Great Work !! I used the same steps with PG 9.3 on CentOS 6.3, it worked perfectly. I just changed PG data path and used Postgres user instead of replicator. Thanks a lot for sharing your knowledge.

    Reply
  9. Divya

    This was of great help. Thanks!! Small changes that I had to do was switching SSL on in master, taking a copy of .conf files on slave before removing cluster and replacing it later, and renaming folder name ‘main’ to ‘data’ as I am using CentOS.

    Reply
  10. Adam

    Just wanted to say thank you for this guide, much more succinct than others. First time setting up replication with Postgres, so it’s appreciated.

    Reply
  11. Mobina

    Slave server not able to start. In the logfile showing

    startup process (PID 3825) exited with exit code 1
    LOG: aborting startup due to startup process failure
    LOG: database system was shut down at 2014-05-22 19:41:28 IST
    LOG: entering standby mode
    WARNING: WAL was generated with wal_level=minimal, data may be missing
    HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.
    FATAL: hot standby is not possible because wal_level was not set to “hot_standby” on the master server
    HINT: Either set wal_level to “hot_standby” on the master, or turn off hot_standby here.

    Please help me in this regard.

    Reply
    1. Ganesh Sayee Ramakrishnan

      Hi Mobina – Were you able to get this resolved? I am stuck with the same problem. My master configuration has wal_archive = ‘hot_standby’. Even then, the slave fails to start up with the exact error you mentioned in your post.

      Reply
  12. Alex

    Trying to get Postgresql 9.3 replication going on Mac OS X 10.9. After making the changes you outline in your article, when starting Postgresql I receive the error message:

    FATAL: could not load server certificate file “server.crt”: No such file or directory

    How does one generate a server certificate file, and where does it get installed?

    Reply
    1. bn0ir

      /var/lib/postgresql/9.x/main/server.crt and /var/lib/postgresql/9.x/main/server.key are symlinks to /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key you can create them and restart server.

      Reply
    2. bitdivine

      I changed the end of start_replication.sh to this: (It creates a self signed cert, thanks to shahjapan http://ubuntuforums.org/showthread.php?t=735020 and the postgre docs he links to. Depending on your setup you may want to get your key countersigned by a proper authority.)
      echo Setting up the ssl cert:

      pushd /var/lib/postgresql/9.1/main/
      openssl req -new -text -out server.req
      openssl rsa -in privkey.pem -out server.key
      rm privkey.pem
      openssl req -x509 -in server.req -text -key server.key -out server.crt
      chmod og-rwx server.key
      chown postgres:postgres server.*
      chmod 600 server.*
      popd

      echo Starting PostgreSQL
      sudo service postgresql start

      Reply
  13. Alex

    Was able to make the SSL cert. That works. But Postgres complains:

    LOG: invalid IP mask “md5”: nodename nor servname provided, or not known
    CONTEXT: line 114 of configuration file “/opt/local/var/db/postgresql93/defaultdb/pg_hba.conf”
    FATAL: could not load pg_hba.conf

    Line 114 of the pg_hba.conf file is this:

    hostssl replication replicator xxx.xxx.xxx.xxx md5

    …where the xxx’s represent the master’s IP address. Don’t know what it’s not liking…

    Reply
  14. Alex

    Got everything working, yeah! But I can’t access the database, boo!

    $ psql
    psql: FATAL: the database system is starting up
    FATAL: the database system is starting up

    Okay, so the log file says this:

    2014-05-22 13:55:04.349 PDT [ ] LOG: database system was shut down in recovery at 2014-05-22 13:53:58 PDT
    2014-05-22 13:55:04.366 PDT [ ] LOG: entering standby mode
    2014-05-22 13:55:04.449 PDT [ ] LOG: redo starts at 0/70028B0
    2014-05-22 13:55:04.449 PDT [ ] LOG: consistent recovery state reached at 0/7002950
    2014-05-22 13:55:04.449 PDT [ ] LOG: record with zero length at 0/7002950
    2014-05-22 13:55:04.851 PDT [ ] LOG: started streaming WAL from primary at 0/7000000 on timeline 1
    2014-05-22 13:57:48.622 PDT [alex ::1(49156)] FATAL: the database system is starting up
    2014-05-22 13:57:48.623 PDT [alex ::1(49157)] FATAL: the database system is starting up

    …and ps shows this:

    110 ?? S 0:00.03 /opt/local/lib/postgresql93/bin/postgres -D /opt/local/var/db/postgresql93/defaultdb
    134 ?? Ss 0:00.00 postgres: logger process
    145 ?? Ss 0:00.01 postgres: startup process recovering 000000010000000000000007
    146 ?? Ss 0:00.00 postgres: checkpointer process
    147 ?? Ss 0:00.02 postgres: writer process
    148 ?? Ss 0:00.13 postgres: wal receiver process streaming 0/70029F0

    So what could the problem be?

    Reply
  15. Hiram Pérez

    Works very well on Debian 7 (master and slave)! Thx.

    Just one detail: In Debian you don’t have sudo by default, so you need run the commands in the script one by one (I recommend use su -l postgres, for example, to use one terminal).

    Reply
  16. Pingback: Screen6 – Postgresql on AWS with puppet: High Availability

  17. Georgi Denchev

    Just to throw my 2 cents – may help someone:
    1. My setup was with Ubuntu 14.04 and postgresql-9.3
    2. DO NOT make my mistake – I have initially created the recovery.conf in /etc/postgresql/… – you need to create the file in /var/lib/postgresql/…
    3. DO NOT start the SLAVE without having a proper recovery.conf in place – the replication breaks if you add the file later and re-start the database on SLAVE

    HTH,
    Georgi

    Reply
    1. Ivan G

      Thank you so much for your 3rd tip.
      After some flawless tests using docker containers, I tried to setup a slave on a development machine and by mistake put the recovery.conf in /var/lib/postgresql instead of ./main and of course getting the infamous “record with zero length at xxxx”.

      Reply
  18. AT

    Thanks for writing this nice tutorial.

    Can you guide what configuration would be required to achive cascading replication? Can we transform this master-slave setup to master-slave-slave setup?

    Reply
  19. Francesc Gonzàlez

    Congratulations! after many hours searching for this info, even in the Postgresql site, just in 5 minutes my SL 6.4 Postgresql 9.2.10 is replicating in streaming mode!

    Thank you very much!

    Reply
  20. iq9

    Great article. Succinct and accurate.

    In 9.4.1, I had to include the CIDR Mask Length in the “replicator” line in pg_hba.conf on the Master, or Postgres wouldn’t launch:

    5.6.7.8/32

    Not sure if that’s true of all PG versions. Can only speak for 9.4.1.

    Reply
  21. Richard Cross

    Watch out on Linux distributions that put everything under a “data” directory, including your postgresql.conf, as the first action in the shell script is to delete the equivalent data directory. Found this out the hard way :-/

    Reply
  22. Pingback: PostgreSQL database replication

  23. Shylu

    Hi Greg! I have tried all the steps mentioned above, however while running the shell script on my slave, I’m getting the below errors:

    pg_basebackup: could not connect to server: could not connect to server: Connection refused
    Is the server running on host “192.168.1.40” and accepting
    TCP/IP connections on port 5432?

    * Error: Port conflict: another instance is already running on /var/run/postgresql with port 5432

    Please help me out here, as I’m quite new to PostgreSQL. I’m unsure how to proceed next.
    Thanks in advance.

    Reply
  24. Bob

    In your second code block you have

    listen_address = # make sure we’re listening as appropriate

    shouldn’t it be

    listen_address = ‘localhost,1.2.3.4’ # make sure we’re listening as appropriate

    Reply
  25. Juan Jose Morales

    the crontab affect the replication, why i make a replication and work good about 6 hour them loss conextion the, i make some uptades , i try to restar the slave and dont start the service but the master work fine

    Reply
  26. Ravish

    Hi,

    I am trying to create a HA Pgsql server. Is there a way to automate the running of pg_basebackup command on the recovered server to copy latest data from running server.

    Regards

    Reply
  27. ZM-73

    I’ve used this procedure several times on Ubuntu {12.04|14.04|16.04} with PostgreSQL {9.2|9.6}. Works great every time. Had a prod DB that I set up (Ubuntu 14.04/PostgreSQL 9.2) when it stopped replicating (because the devs did something after I had it running perfectly, they didn’t bother telling me). I ran the script to re-set up the slave and its been fine since. Took all of 5 seconds to run the scrip & watch it sync (tiny, but important DB), then another ~half minute to log into the master and check repl status, and finally a couple of minutes to bask in success. I really should take a few more minutes to lock the devs out…

    Coming up on four years later, this is still as valid today with PgSQL 9.6 (and IPv6) as it was in early 2013.

    Thank you for taking the time to put this together.

    Reply
  28. yhluo

    Awesome. However, I don’t understand why you update postgresql.conf on slave in slave_postgresql.conf, doesn’t this file would be removed by “sudo -u postgres rm -rf /var/lib/postgresql/9.2/main” in start_replication.sh later? It seems that updating postgresql.conf on slave before pg_basebackup is meaningless. Should we update postgresql.conf in start_replication.sh, maybe just before creating recovery.conf ?
    I have write a short script to set up postgresql streaming replication basing on your blog:
    https://github.com/ictlyh/Utils/blob/master/shell/setup_pg_stream_replicate.sh

    Reply
  29. mrglitch01

    Someone please help! I have followed all the steps but in the end I am unable to psql into the slave and check whether data is replicating from the master. I get error as follows:

    psql FATAL: the database system is starting up

    Please, does anyone have a solution to this? I have tried changing the postgresql.conf file on slave to hot_standby=’on’ and wal_level=’hot_standby’ but it still wont work.

    Reply
  30. Mehman

    Hi guys,
    I try replication with postgresql 9.5 on Red Hat 7.2. when I try this part

    sudo -u postgres pg_basebackup -h 1.2.3.4 -D /var/lib/postgresql/9.2/main -U replicator -v -P
    (of course changed for redhat and version for RHEL)

    there is permission error although it has permission and all configurations are ok.

    Besides this I google a lot about main path and don’t understand where it is in my postgresql installed host.

    in a lot of source the path is in that format /var/lib/postgresql/9.x/main

    when I installed postresql 9.5 on my host I didn’t install postgresql packages when installing RHEL. I install postgresql after RHEL installation to specific directory (from source not with yum command) as /usr/local/pgsql/

    As a result postgresql installed successfully and database works.

    I don’t understand clearly …/main parts equal to what in my configuration path?
    I guess it is equal to /usr/local/pgsql/data in my configuration. Am I right???

    The error is like that when try to use pg_basebackup command

    [postgres@slave ~]$ pg_basebackup -h 192.168.1.108 -D /usr/local/pgsql/data -U replica -v -P
    pg_basebackup: could not access directory “/usr/local/pgsql/data”: Permission denied

    The IP here is master server IP.
    And I mv ../data/ to backup location on standby (slave) before replication

    Please HELP ME?? It is urgent

    Reply
  31. voodooKobra

    Suggestion: replace the `rm -rf` command with a rename (e.g. `mv main main-backup`). I accidentally deleted the wrong cluster (master server rather than replica server).

    Reply
  32. alan

    FATAL: incorrect checksum in control file.

    You may see this, as I did, if you try to replicate from a 32 bit to a 64 bit machine. You can’t.
    https://www.postgresql.org/message-id/13614.1094787412%40sss.pgh.pa.us

    Also be careful if the slave is CentOS all the paths are different and the config files are not in /etc/postgres they are in the data directory which you will delete when you convert this line:

    sudo -u postgres rm -rf /var/lib/postgresql/9.2/main
    to
    sudo -u postgres rm -rf /var/lib/pgsql/9.3/data

    I had to rebuild postgresql.conf and pg_hba.conf and the certs:
    https://www.howtoforge.com/postgresql-ssl-certificates

    Anyway its an excellent tutorial I it worked for me when I used it previously between two debian servers, but not under today’s circumstances.

    Reply
  33. Simba

    Version 9.6 postgres requires /32 in pg_hba.conf (or appropriate mask), checkpoint_segments is deprecated and will cause a crash on startup, and wal_level should be ‘replicate’ instead of hot_standby.

    Besides those things, this guide worked fine on Debian 9.

    Reply

Leave a Reply to Amin Cancel reply