Chain reaction with logical replication

Working with logical replication and a cluster of nodes heavily linked among each other can result in particularly disturbing results. If a logical change is received on a node that itself regenerates additional changes, it is really easy to finish with an uncontrolable chain reaction.

For example, here is an example with two nodes, running locally and listening to ports 5432 and 5433. Both nodes are using the background worker [receiver_raw] (https://github.com/michaelpq/pg_plugins/tree/master/receiver_raw) able to receive changes generated by [decoder_raw] (https://github.com/michaelpq/pg_plugins/tree/master/decoder_raw) on a given replication slot.

Both nodes have many settings in common in postgresql.conf:

wal_level = logical
shared_preload_libraries = 'receiver_raw'
max_replication_slots = 1
log_line_prefix = '%t'
receiver_raw.database = 'postgres'
receiver_raw.slot_name = 'slot'

receiver_raw.database refers to the database on which changes received from remote are applied locally, aka the database the background worker connect to. receiver_raw.slot_name is the name of the slot from which the logical changes are decoded and fetched.

However there is a subtile difference for receiver_raw.conn_string, which is the connection string used by the background worker to connect to the node generating the changes. Here it is for the first node:

receiver_raw.conn_string = 'replication=database dbname=postgres port=5433'

And for the second node:

receiver_raw.conn_string = 'replication=database dbname=postgres port=5432'

With this configuration each node connects to each other… By default receiver_raw naps 100ms after a process loop (processing a single batch of changes). And of course both nodes need to have a replication slot created with name “slot”. Let’s use at the same time a simple schema.

=# SELECT slot_name FROM pg_create_logical_replication_slot('slot', 'decoder_raw');
 slot_name 
-----------
 slot
(1 row)
=# CREATE TABLE aa (a int);
CREATE TABLE

Then using \watch with a psql client, you can see the evolution of the number of records on this table, after running on it a simple INSERT query.

=# INSERT INTO aa VALUES (1);
INSERT 0 1
=# SELECT count(*) FROM aa;
 count 
-------
     2
(1 row)
=# \watch 1
Watch every 1s	Thu Jun  5 22:24:47 2014
 count 
-------
     8
(1 row)
[... etc ...]

With an average of 6-8 INSERT queries run per second on a single node, this can blow up your disk quickly, because of the queries of course getting repeated, but as well because of the replication slots that need to keep more and more data as process loop sleeps for a customized amount of time. This gets even worse if more and more new queries are run. So the moral of the story is simple: don’t do that! Then, control the origin of the data to stop the hemoragy created by continuously-applied changes in loop.

Michael Paquier - PostgreSQL committer

Chain reaction with logical replication

Search

Social

RSS Feeds

Sponsor