Zero to PostgreSQL streaming replication in 10 mins

I’ve found a number of articles on the web for setting up streaming replication in PostgreSQL, but none of them seemed to actually put everything together that I needed, and none of them seemed to use the new pg_basebackup in PostgreSQL 9.1 and later. So with that in mind, here are a set of steps you can use to set up streaming replication, over the internet if you wish, using an encrypted SSL connection. We’re not going to set up log archiving – we’re going to rely solely on the streaming replication for now.

I’m assuming you have a master server set up on Ubuntu 10.04 or 12.04, running PostgreSQL 9.2.x, and you have a new slave server set up on the same OS and pg version. The IP of the master is 1.2.3.4, and the IP of the slave is 5.6.7.8.

First, create the replication user on the master:

Note that we are using REPLICATION permissions, rather than creating a superuser.

Next, configure the master for streaming replication. Edit postgresql.conf (on Ubuntu, this is at /etc/postgresql/9.2/main/postgresql.conf):

We’re configuring 8 WAL segments here; each is 16MB. If you expect your database to have more than 128MB of changes in the time it will take to make a copy of it across the network to your slave, or in the time you expect your slave to be down for maintenance or something, then consider increasing those values.

Then edit the access control on the master to allow the connection from the slave; in pg_hba.conf (/etc/postgresql/9.2/main/pg_hba.conf on Ubuntu):

In this case, 5.6.7.8 is the IP address of the slave that will be connecting for replication, and hostssl means this host can only connect via SSL.

You’ll need to restart the master after making the above changes.

Now on to the slave. In the slave’s postgresql.conf, add the following:

Then restart the slave. No changes are required in the slave’s pg_hba.conf specifically to support replication. You’ll still need to make whatever entries you need in order to connect to it from your application and run queries, if you wish.

That’s all the initial setup we need to do. After you’ve done the above configuration, running the following script on the slave will copy the database over and begin replication (1.2.3.4 is the IP of the master):

That script will stop the slave, delete the old slave cluster directory, run pg_basebackup connecting to the master to copy over the databases (you’ll see the progress as it goes), create a new recovery.conf file, and start the slave. If you look at the logs after this script completes, you should hopefully see messages about it having reached a consistent recovery state.

Be careful – that script is going to delete the old database cluster on your slave, so make sure to read through it and understand what it’s doing.

At this point, you can try writing data to the master database, and you should see it mirrored over to the slave. To check the replication status, you can run the following on the master:

33 thoughts on “Zero to PostgreSQL streaming replication in 10 mins

    1. KajMagnus

      It worked great with 9.1 for me too :-) when I fixed a 9.1 streaming replication setup that had broken.
      Basically there were just two thing to do: the pg_basebackup command, and then copying back my old recovery.conf.

      Reply
  1. daniel

    Thanks for this article, very useful. Is creating the trigger file, then reverting back to master a simple task?

    Reply
    1. gregr Post author

      Well, creating the trigger file is easy enough. Reverting back to master, I believe, would require generally the same steps in reverse – set up the old master as a slave, synchronize, and fail over.

      Reply
    1. gregr Post author

      Yeah…I’m not sure why it was doing that. I just switched to a different plugin for displaying gists – hopefully this one will work better.

      Reply
  2. Marios Zindilis

    This also works on CentOS, with the minor difference that the “main” directory is named “data”. Mind that “pg_basebackup” will overwrite postgresql.conf on CentOS, so the changes you made prior to “pg_basebackup” will be lost, unless you backup your slave’s posgresql.conf and restore it afterwards. Other that than, everything works, and thanks for all your effort in writing this!

    Reply
    1. gregr Post author

      With the configuration in the article, at least on Ubuntu, fail over is pretty trivial:

      touch /tmp/postgresql.trigger

      is all it takes. That will terminate the replication, configure the slave to be writable, and will move your recovery.conf to recovery.done – at that point you should be good to go.

      Reply
  3. arunnitt

    when i access slave db server, it says “the database system is starting up”. but replication is work. how can i check data in slave db server?

    =>psql -U postgres
    psql: FATAL: the database system is starting up

    Reply
      1. mike

        Daniel & arunnitt – I’m not sure if you were able to resolve your issue, but I had the same problem. I solved this by moving my postgresql.conf and pg_hba.conf files out of the directory which gets removed in the step “Cleaning up old cluster directory”, and then moving them back before starting the server. Otherwise your slave will startup with whatever configurations are in your master’s data directory. In my case, the configurations in my master’s data directory were not actually even being used and were incorrect (i didn’t set up the master myself).

      2. Ryan Stelly

        Hello! This is occurring because the behavior of database after executing the given script makes the slave machine essentially just a back up. It is still getting the streamed XLOGs, but you can’t access it from a client in this state. If you want to instead use the slave (in a read-only manner) you need to change the hot_standby option in your postgresql.conf to on. You should be able to psql into it then.

  4. Martin

    Very useful tutorial, thanks! It works on Postgresql 9.3 x64.

    One thing – it might be worth mentioning that you need to switch on SSL if you use the SSL connection (on Windows that’s not automatic…).

    Reply
  5. yunus

    My OS is Centos 5.7 and postgres-9.3. After running the backup script upon making changes related to Centos mentioned in this forum, the postgres on Slave fails to start with the following error.

    >LOG: invalid checkpoint record
    FATAL: could not locate required checkpoint record
    HINT: If you are not restoring from a backup, try removing the file “/var/lib/pgsql/9.3/data/backup_label”.
    LOG: startup process (PID 1646) exited with exit code 1
    LOG: aborting startup due to startup process failure

    Can you anyone please advise how to fix this error.

    Reply
  6. Jossafat Barón

    Hi!
    I tried this, but on the script’s step (pg_basebackup) I had to modify the postgres.conf because I had an error, changing archive_mode=on, then the script asked for an archive_command, so on the same postgresql.conf I put this: archive_command = ‘cp %p /opt/pgsql91/archive/%f’ but it launched an error: ‘Verify that your archive_command is running correctly’, what am I doing wrong? I think that my archive_command is wrong, also if I skip the archive_command the error is the same.

    Reply
  7. rcpeters

    I’m wondering about
    trigger_file = ‘/tmp/postgresql.trigger’

    All files in /tmp get deleted on restart, so if the slave get rebooted for any reason it will come back up thinking it’s a master. Am I interpreting the documentation incorrectly?

    Reply
    1. John

      Absence of the trigger file doesn’t cause any action. The trigger file is supposed to be absent on the slave until you need to do a failover. At which point you create the trigger file on the slave (or some monitoring software creates the trigger file), then the slave becomes the new master. Once it becomes the master it stops caring about the trigger file, so it doesn’t matter if the trigger file disappears at some later time after the promotion.

      Reply
  8. gadekarr

    Great Work !! I used the same steps with PG 9.3 on CentOS 6.3, it worked perfectly. I just changed PG data path and used Postgres user instead of replicator. Thanks a lot for sharing your knowledge.

    Reply
  9. Divya

    This was of great help. Thanks!! Small changes that I had to do was switching SSL on in master, taking a copy of .conf files on slave before removing cluster and replacing it later, and renaming folder name ‘main’ to ‘data’ as I am using CentOS.

    Reply

Leave a Reply