Zero to PostgreSQL streaming replication in 10 mins

I’ve found a number of articles on the web for setting up streaming replication in PostgreSQL, but none of them seemed to actually put everything together that I needed, and none of them seemed to use the new pg_basebackup in PostgreSQL 9.1 and later. So with that in mind, here are a set of steps you can use to set up streaming replication, over the internet if you wish, using an encrypted SSL connection. We’re not going to set up log archiving – we’re going to rely solely on the streaming replication for now.

I’m assuming you have a master server set up on Ubuntu 10.04 or 12.04, running PostgreSQL 9.2.x, and you have a new slave server set up on the same OS and pg version. The IP of the master is 1.2.3.4, and the IP of the slave is 5.6.7.8.

First, create the replication user on the master:

Note that we are using REPLICATION permissions, rather than creating a superuser.

Next, configure the master for streaming replication. Edit postgresql.conf (on Ubuntu, this is at /etc/postgresql/9.2/main/postgresql.conf):

We’re configuring 8 WAL segments here; each is 16MB. If you expect your database to have more than 128MB of changes in the time it will take to make a copy of it across the network to your slave, or in the time you expect your slave to be down for maintenance or something, then consider increasing those values.

Then edit the access control on the master to allow the connection from the slave; in pg_hba.conf (/etc/postgresql/9.2/main/pg_hba.conf on Ubuntu):

In this case, 5.6.7.8 is the IP address of the slave that will be connecting for replication, and hostssl means this host can only connect via SSL.

You’ll need to restart the master after making the above changes.

Now on to the slave. In the slave’s postgresql.conf, add the following:

Then restart the slave. No changes are required in the slave’s pg_hba.conf specifically to support replication. You’ll still need to make whatever entries you need in order to connect to it from your application and run queries, if you wish.

That’s all the initial setup we need to do. After you’ve done the above configuration, running the following script on the slave will copy the database over and begin replication (1.2.3.4 is the IP of the master):

That script will stop the slave, delete the old slave cluster directory, run pg_basebackup connecting to the master to copy over the databases (you’ll see the progress as it goes), create a new recovery.conf file, and start the slave. If you look at the logs after this script completes, you should hopefully see messages about it having reached a consistent recovery state.

Be careful – that script is going to delete the old database cluster on your slave, so make sure to read through it and understand what it’s doing.

At this point, you can try writing data to the master database, and you should see it mirrored over to the slave. To check the replication status, you can run the following on the master:

45 thoughts on “Zero to PostgreSQL streaming replication in 10 mins

    1. KajMagnus

      It worked great with 9.1 for me too :-) when I fixed a 9.1 streaming replication setup that had broken.
      Basically there were just two thing to do: the pg_basebackup command, and then copying back my old recovery.conf.

      Reply
  1. daniel

    Thanks for this article, very useful. Is creating the trigger file, then reverting back to master a simple task?

    Reply
    1. gregr Post author

      Well, creating the trigger file is easy enough. Reverting back to master, I believe, would require generally the same steps in reverse – set up the old master as a slave, synchronize, and fail over.

      Reply
    1. gregr Post author

      Yeah…I’m not sure why it was doing that. I just switched to a different plugin for displaying gists – hopefully this one will work better.

      Reply
  2. Marios Zindilis

    This also works on CentOS, with the minor difference that the “main” directory is named “data”. Mind that “pg_basebackup” will overwrite postgresql.conf on CentOS, so the changes you made prior to “pg_basebackup” will be lost, unless you backup your slave’s posgresql.conf and restore it afterwards. Other that than, everything works, and thanks for all your effort in writing this!

    Reply
    1. gregr Post author

      With the configuration in the article, at least on Ubuntu, fail over is pretty trivial:

      touch /tmp/postgresql.trigger

      is all it takes. That will terminate the replication, configure the slave to be writable, and will move your recovery.conf to recovery.done – at that point you should be good to go.

      Reply
  3. arunnitt

    when i access slave db server, it says “the database system is starting up”. but replication is work. how can i check data in slave db server?

    =>psql -U postgres
    psql: FATAL: the database system is starting up

    Reply
      1. mike

        Daniel & arunnitt – I’m not sure if you were able to resolve your issue, but I had the same problem. I solved this by moving my postgresql.conf and pg_hba.conf files out of the directory which gets removed in the step “Cleaning up old cluster directory”, and then moving them back before starting the server. Otherwise your slave will startup with whatever configurations are in your master’s data directory. In my case, the configurations in my master’s data directory were not actually even being used and were incorrect (i didn’t set up the master myself).

      2. Ryan Stelly

        Hello! This is occurring because the behavior of database after executing the given script makes the slave machine essentially just a back up. It is still getting the streamed XLOGs, but you can’t access it from a client in this state. If you want to instead use the slave (in a read-only manner) you need to change the hot_standby option in your postgresql.conf to on. You should be able to psql into it then.

    1. John

      The answer is that, in postgresql.conf, it needs to be wal_level = ‘hot_standby’ and hot_standby = ‘on’. They must be single-quoted. It took me far too long to discover this, as most blog posts on the subject do not quote their config values.

      Reply
      1. gregr Post author

        John, that must be specific to a specific OS or postgres version; I can say with absolute certainty that quotes are not required on Ubuntu 12.04 with PostgreSQL 9.3.4.

  4. Martin

    Very useful tutorial, thanks! It works on Postgresql 9.3 x64.

    One thing – it might be worth mentioning that you need to switch on SSL if you use the SSL connection (on Windows that’s not automatic…).

    Reply
  5. yunus

    My OS is Centos 5.7 and postgres-9.3. After running the backup script upon making changes related to Centos mentioned in this forum, the postgres on Slave fails to start with the following error.

    >LOG: invalid checkpoint record
    FATAL: could not locate required checkpoint record
    HINT: If you are not restoring from a backup, try removing the file “/var/lib/pgsql/9.3/data/backup_label”.
    LOG: startup process (PID 1646) exited with exit code 1
    LOG: aborting startup due to startup process failure

    Can you anyone please advise how to fix this error.

    Reply
  6. Jossafat Barón

    Hi!
    I tried this, but on the script’s step (pg_basebackup) I had to modify the postgres.conf because I had an error, changing archive_mode=on, then the script asked for an archive_command, so on the same postgresql.conf I put this: archive_command = ‘cp %p /opt/pgsql91/archive/%f’ but it launched an error: ‘Verify that your archive_command is running correctly’, what am I doing wrong? I think that my archive_command is wrong, also if I skip the archive_command the error is the same.

    Reply
    1. Ironchunk

      Has the folder of your archive owner and group labled as ‘postgres’? If not the archive_command can’t write on folders that not contain owner/group ‘postgres’.

      Reply
  7. rcpeters

    I’m wondering about
    trigger_file = ‘/tmp/postgresql.trigger’

    All files in /tmp get deleted on restart, so if the slave get rebooted for any reason it will come back up thinking it’s a master. Am I interpreting the documentation incorrectly?

    Reply
    1. John

      Absence of the trigger file doesn’t cause any action. The trigger file is supposed to be absent on the slave until you need to do a failover. At which point you create the trigger file on the slave (or some monitoring software creates the trigger file), then the slave becomes the new master. Once it becomes the master it stops caring about the trigger file, so it doesn’t matter if the trigger file disappears at some later time after the promotion.

      Reply
  8. gadekarr

    Great Work !! I used the same steps with PG 9.3 on CentOS 6.3, it worked perfectly. I just changed PG data path and used Postgres user instead of replicator. Thanks a lot for sharing your knowledge.

    Reply
  9. Divya

    This was of great help. Thanks!! Small changes that I had to do was switching SSL on in master, taking a copy of .conf files on slave before removing cluster and replacing it later, and renaming folder name ‘main’ to ‘data’ as I am using CentOS.

    Reply
  10. Adam

    Just wanted to say thank you for this guide, much more succinct than others. First time setting up replication with Postgres, so it’s appreciated.

    Reply
  11. Mobina

    Slave server not able to start. In the logfile showing

    startup process (PID 3825) exited with exit code 1
    LOG: aborting startup due to startup process failure
    LOG: database system was shut down at 2014-05-22 19:41:28 IST
    LOG: entering standby mode
    WARNING: WAL was generated with wal_level=minimal, data may be missing
    HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.
    FATAL: hot standby is not possible because wal_level was not set to “hot_standby” on the master server
    HINT: Either set wal_level to “hot_standby” on the master, or turn off hot_standby here.

    Please help me in this regard.

    Reply
  12. Alex

    Trying to get Postgresql 9.3 replication going on Mac OS X 10.9. After making the changes you outline in your article, when starting Postgresql I receive the error message:

    FATAL: could not load server certificate file “server.crt”: No such file or directory

    How does one generate a server certificate file, and where does it get installed?

    Reply
    1. bn0ir

      /var/lib/postgresql/9.x/main/server.crt and /var/lib/postgresql/9.x/main/server.key are symlinks to /etc/ssl/certs/ssl-cert-snakeoil.pem and /etc/ssl/private/ssl-cert-snakeoil.key you can create them and restart server.

      Reply
  13. Alex

    Was able to make the SSL cert. That works. But Postgres complains:

    LOG: invalid IP mask “md5″: nodename nor servname provided, or not known
    CONTEXT: line 114 of configuration file “/opt/local/var/db/postgresql93/defaultdb/pg_hba.conf”
    FATAL: could not load pg_hba.conf

    Line 114 of the pg_hba.conf file is this:

    hostssl replication replicator xxx.xxx.xxx.xxx md5

    …where the xxx’s represent the master’s IP address. Don’t know what it’s not liking…

    Reply
  14. Alex

    Got everything working, yeah! But I can’t access the database, boo!

    $ psql
    psql: FATAL: the database system is starting up
    FATAL: the database system is starting up

    Okay, so the log file says this:

    2014-05-22 13:55:04.349 PDT [ ] LOG: database system was shut down in recovery at 2014-05-22 13:53:58 PDT
    2014-05-22 13:55:04.366 PDT [ ] LOG: entering standby mode
    2014-05-22 13:55:04.449 PDT [ ] LOG: redo starts at 0/70028B0
    2014-05-22 13:55:04.449 PDT [ ] LOG: consistent recovery state reached at 0/7002950
    2014-05-22 13:55:04.449 PDT [ ] LOG: record with zero length at 0/7002950
    2014-05-22 13:55:04.851 PDT [ ] LOG: started streaming WAL from primary at 0/7000000 on timeline 1
    2014-05-22 13:57:48.622 PDT [alex ::1(49156)] FATAL: the database system is starting up
    2014-05-22 13:57:48.623 PDT [alex ::1(49157)] FATAL: the database system is starting up

    …and ps shows this:

    110 ?? S 0:00.03 /opt/local/lib/postgresql93/bin/postgres -D /opt/local/var/db/postgresql93/defaultdb
    134 ?? Ss 0:00.00 postgres: logger process
    145 ?? Ss 0:00.01 postgres: startup process recovering 000000010000000000000007
    146 ?? Ss 0:00.00 postgres: checkpointer process
    147 ?? Ss 0:00.02 postgres: writer process
    148 ?? Ss 0:00.13 postgres: wal receiver process streaming 0/70029F0

    So what could the problem be?

    Reply
  15. Hiram Pérez

    Works very well on Debian 7 (master and slave)! Thx.

    Just one detail: In Debian you don’t have sudo by default, so you need run the commands in the script one by one (I recommend use su -l postgres, for example, to use one terminal).

    Reply
  16. Pingback: Screen6 – Postgresql on AWS with puppet: High Availability

Leave a Reply