Skip to main content
Version: 2.10.1

Replication

caution

Memgraph 2.9 introduced a new configuration flag --replication-restore-state-on-startup which is false by default.

If you want instances to remember their role and configuration in a replication cluster upon restart, the --replication-restore-state-on-startup needs to be set to true when first initializing the instances and remain true throughout the instances' lifetime.

When reinstating a cluster it is advised to first initialize the MAIN instance, then the REPLICA instances.

When distributing data across several instances, Memgraph uses replication to provide a satisfying ratio of the following properties, known from the CAP theorem:

  1. Consistency (C) - every node has the same view of data at a given point in time
  2. Availability (A) - all clients can find a replica of the data, even in the case of a partial node failure
  3. Partition tolerance (P) - the system continues to work as expected despite a partial network failure

In the replication process, the data is replicated from one storage (MAIN instance) to another (REPLICA instances).

info

From version 2.4 it is no longer possible to specify a timeout when registering a sync replica. To mimic this behavior in higher releases, please use ASYNC replication instead.

Related - How
to Related - Under the
Hood Related - Blog
Post

Data replication implementation basics

In Memgraph, all instances are MAIN upon starting. When creating a replication cluster, one instance has to be chosen as the MAIN instance. The rest of the instances have to be demoted to REPLICA roles and have a port defined using a Cypher query.

If you want instances to remember their role and configuration in a replication cluster upon restart, they need to be initialized with the --replication-restore-state-on-startup set to true and remain true throughout the instances' lifetime. Otherwise and by default, restarted instances will start as MAIN instances disconnected from any replication cluster.

Once demoted to REPLICA instances, they will no longer accept write queries. In order to start the replication, each REPLICA instance needs to be registered from the MAIN instance by setting a replication mode (SYNC or ASYNC) and specifying the REPLICA instance's socket address.

The replication mode defines the terms by which the MAIN instance can commit the changes to the database, thus modifying the system to prioritize either consistency or availability:

  • SYNC - After committing a transaction, the MAIN instance will communicate the changes to all REPLICA instances running in SYNC mode and wait until it receives a response or information that a timeout is reached. SYNC mode ensures consistency and partition tolerance (CP), but not availability for writes. If the primary database has multiple replicas, the system is highly available for reads. But, when a replica fails, the MAIN instance can't process the write due to the nature of synchronous replication.

  • ASYNC - The MAIN instance will commit a transaction without receiving confirmation from REPLICA instances that they have received the same transaction. ASYNC mode ensures system availability and partition tolerance (AP), while data can only be eventually consistent.

Once the REPLICA instances are registered, data storage of the MAIN instance is replicated and synchronized using transaction timestamps and durability files (snapshot files and WALs). Memgraph does not support replication of authentication configurations, query and authentication modules, and audit logs.

By using the timestamp, the MAIN instance knows the current state of the REPLICA. If the REPLICA is not synchronized with the MAIN instance, the MAIN instance sends the correct data for synchronization kept as deltas within WAL files. Deltas are the smallest possible updates of the database, but they carry enough information to synchronize the data on a REPLICA. Memgraph stores only remove actions as deltas, for example, REMOVE key:value ON node_id.

If the REPLICA is so far behind the MAIN instance that the synchronization using WAL files and deltas within it is impossible, Memgraph will use snapshots to synchronize the REPLICA to the state of the MAIN instance.

Running multiple instances

When running multiple instances, each on its own machine, run Memgraph as you usually would.

If you are exploring replication and running multiple instances on one machine, you can run Memgraph with Docker. Check Docker run options for Memgraph images to set up ports and volumes properly, if necessary.

Assigning roles

Each Memgraph instance has the role of the MAIN instance when it is first started.

Also, by default, each crashed instance restarts as a MAIN instance disconnected from any replication cluster. To change this behavior, set the --replication-restore-state-on-startup to true when first initializing the instance.

Assigning the REPLICA role

Once you decide what instance will be the MAIN instance, all the other instances that will serve as REPLICA instances need to be demoted and have the port set using the following query:

SET REPLICATION ROLE TO REPLICA WITH PORT <port_number>;

If you set the port of each REPLICA instance to 10000, it will be easier to register replicas later on because the query for registering replicas uses port 10000 as the default one.

Otherwise, you can use any unassigned port between 1000 and 10000.

Assigning the MAIN role

The replication cluster should only have one MAIN instance in order to avoid errors in the replication system. If the original MAIN instance fails, you can promote a REPLICA instance to be the new MAIN instance by running the following query:

SET REPLICATION ROLE TO MAIN;

If the original instance was still alive when you promoted a new MAIN, you need to resolve any conflicts and manage replication manually.

If you demote the new MAIN instance back to the REPLICA role, it will not retrieve its original function. You need to drop it from the MAIN and register it again.

If the crashed MAIN instance goes back online once a new MAIN is already assigned, it cannot reclaim its previous role. It can be cleaned and demoted to become a REPLICA instance of the new MAIN instance.

Checking the assigned role

To check the replication role of an instance, run the following query:

SHOW REPLICATION ROLE;

Registering REPLICA instances

Once all the nodes in the cluster are assigned with appropriate roles, you can enable replication in the MAIN instance by registering REPLICA instances, setting a replication mode (SYNC and ASYNC), and specifying the REPLICA instance's socket address. Memgraph doesn't support chaining REPLICA instances, that is, a REPLICA instance cannot be replicated on another REPLICA instance.

If you want to register a REPLICA instance with a SYNC replication mode, run the following query:

REGISTER REPLICA name SYNC TO <socket_address>;

If you want to register a REPLICA instance with an ASYNC replication mode, run the following query:

REGISTER REPLICA name ASYNC TO <socket_address>;

The socket address must be a string value as follows:

"IP_ADDRESS:PORT_NUMBER"

where IP_ADDRESS is a valid IP address, and PORT_NUMBER is a valid port number, for example:

"172.17.0.4:10050"

The default value of the PORT_NUMBER is 10000, so if you set REPLICA roles using that port, you can define the socket address specifying only the valid IP address:

"IP_ADDRESS"

Example of a <socket_address> using only IP_ADDRESS:

"172.17.0.5"

When a REPLICA instance is registered, it will start replication in ASYNC mode until it synchronizes to the current state of the database. Upon synchronization, REPLICA instances will either continue working in the ASYNC mode or reset to SYNC mode.

Listing all registered REPLICA instances

You can check all the registered REPLICA instances and their details by running the following query:

SHOW REPLICAS;

Dropping a REPLICA instance

To drop a replica, run the following query:

DROP REPLICA <name>;

MAIN and REPLICA synchronization

By comparing timestamps, the MAIN instance knows when a REPLICA instance is not synchronized and is missing some earlier transactions. The REPLICA instance is then set into a RECOVERY state, where it remains until it is fully synchronized with the MAIN instance.

The missing data changes can be sent as snapshots or WAL files. Snapshot files represent an image of the current state of the database and are much larger than the WAL files, which only contain the changes, deltas. Because of the difference in file size, Memgraph favors the WAL files.

While the REPLICA instance is in the RECOVERY state, the MAIN instance calculates the optimal synchronization path based on the REPLICA instance's timestamp and the current state of the durability files while keeping the overall size of the files necessary for synchronization to a minimum.