Skip to main content
Version: 2.10.1

Storage memory usage

Estimating Memgraph's storage memory usage is not entirely straightforward because it depends on a lot of variables, but it is possible to do so quite accurately. Below is an example that will try to show the basic reasoning.

If you want to estimate the storage memory usage, use the following formula:

StorageRAMUsage=NumberOfVertices×260B+NumberOfEdges×180B\texttt{StorageRAMUsage} = \texttt{NumberOfVertices} \times 260\text{B} + \texttt{NumberOfEdges} \times 180\text{B}

Let's test this formula on the Marvel Comic Universe Social Network dataset, which is also available as a dataset inside Memgraph Lab and contains 21,723 vertices and 682,943 edges.

According to the formula, storage memory usage should be:

StorageRAMUsage=21,723×260B+682,943×180B=5,647,980B+122,929,740B=128,577,720B125MB\begin{aligned} \texttt{StorageRAMUsage} &= 21,723 \times 260\text{B} + 682,943 \times 180\text{B} \\ &= 5,647,980\text{B} + 122,929,740\text{B}\\ &= 128,577,720\text{B} \approx 125\text{MB} \end{aligned}

Now, let's run an empty Memgraph instance on a x86 Ubuntu. It consumes ~75MB of RAM due to baseline runtime overhead. Once the dataset is loaded, RAM usage rises up to ~260MB. Memory usage primarily consists of storage and query execution memory usage. After executing FREE MEMORY query to force the cleanup of query execution, the RAM usage drops to ~200MB. If the baseline runtime overhead of 75MB is subtracted from the total memory usage of the dataset, which is 200MB, and storage memory usage comes up to ~125MB, which shows that the formula is correct.

The calculation in detail

Let's dive deeper into the memory usage values. Because Memgraph works on the x86 architecture, calculations are based on the x86 Linux memory usage.

tip

For the latest and most precise memory layout please clone Memgraph and use, e.g., pahole to discover accurate information.

Each Vertex and Edge object has a pointer to a Delta object. The Delta object stores all changes on a certain Vertex or Edge and that's why Vertex and Edge memory usage will be increased by the memory of the Delta objects they are pointing to. If there are few updates, there are also few Delta objects because the latest data is stored in the object. But, if the database has a lot of concurrent operations, many Delta objects will be created. Of course, the Delta objects will be kept in memory as long as needed, and a bit more, because of the internal GC inefficiencies.

Delta memory layout

Each Delta object has a least 104B.

Vertex memory layout

Each Vertex object has at least 112B + 104B for the Delta object, in total, a minimum of 216B.

Each additional label takes 8B.

Keep in mind that three labels take as much space as four labels, and five to seven labels take as much space as eight labels, etc., due to the dynamic memory allocation.

Edge memory layout

Each Edge object has at least 40B + 104B for the Delta object, in total, a minimum of 144B.

SkipList memory layout

Each object (Vertex, Edge) is placed inside a data structure called a SkipList. The SkipList has an additional overhead in terms of SkipListNode structure and next_pointers. Each SkipListNode has an additional 8B element overhead and another 8B for each of the next_pointers.

It is impossible to know the exact number of next_pointers upfront, and consequently the total size, but it's never more than double the number of objects because the number of pointers is generated by binomial distribution (take a look at the source code for details).

Index memory layout

Each LabelIndex::Entry object has exactly 16B.

Depending on the actual value stored, each LabelPropertyIndex::Entry has at least 72B.

Objects of both types are placed into the SkipList.

Each index object in total

  • SkipListNode<LabelIndex::Entry> object has 24B.
  • SkipListNode<LabelPropertyIndex::Entry> has at least 80B.
  • Each SkipListNode has an additional 16B because of the next_pointers.

Properties

All properties use 1B for metadata - type, size of property ID and the size of payload in the case of NULL and BOOLEAN values, or size of payload size indicator for other types (how big is the stored value, for example, integers can be 1B, 2B 4B or 8b depending on their value).

Then they take up another byte for storing property ID, which means each property takes up at least 2B. After those 2B, some properties (for example, STRING values) store addition metadata. And lastly, all properties store the value. So the layout of each property is:

propertySize=basicMetadata+propertyID+[additionalMetadata]+value.\texttt{propertySize} = \texttt{basicMetadata} + \texttt{propertyID} + [\texttt{additionalMetadata}] + \texttt{value}.

Value typeSizeNote
NULL1B + 1BThe value is written in the first byte of the basic metadata.
BOOL1B + 1BThe value is written in the first byte of the basic metadata.
INT1B + 1B + 1B, 2B, 4B or 8BBasic metadata, property ID and the value depending on the size of the integer.
DOUBLE1B + 1B + 8BBasic metadata, property ID and the value
STRING1B + 1B + 1B + min 1BBasic metadata, property ID, additional metadata and lastly the value depending on the size of the string, where 1 ASCII character in the string takes up 1B.
LIST1B + 1B + 1B + min 1BBasic metadata, property ID, additional metadata and the total size depends on the number and size of the values in the list.
MAP1B + 1B + 1B + min 1BBasic metadata, property ID, additional metadata and the total size depends on the number and size of the values in the map.
TEMPORAL_DATA1B + 1B + 1B + min 1B + min 1BBasic metadata, property ID, additional metadata, seconds, microseconds. Value od the seconds and microseconds is at least 1B, but probably 4B in most cases due to the large values they store.

Marvel dataset use case

The Marvel dataset consists of Hero, Comic and ComicSeries labels, which are indexed. There are also three label-property indices - on the name property of Hero and Comic vertices, and on the title property of ComicSeries vertices. The ComicSeries vertices also have the publishYear property.

There are 6487 Hero and 12,661 Comic vertices with the property name. That's 19,148 vertices in total. To calculate how much storage those vertices and properties occupy, we are going to use the following formula:

NumberOfVertices×(Vertex+properties+SkipListNode+next_pointers+Delta).\texttt{NumberOfVertices} \times (\texttt{Vertex} + \texttt{properties} + \texttt{SkipListNode} + \texttt{next\_pointers} + \texttt{Delta}).

Let's assume the name on average has 3B+10B=13B3\text{B}+10\text{B} = 13\text{B} (each name is on average 10 characters long). One the average values are included, the calculation is:

19,148×(112B+13B+16B+16B+104B)=19,148×261B=4,997,628B.19,148 \times (112\text{B} + 13\text{B} + 16\text{B} + 16\text{B} + 104\text{B}) = 19,148 \times 261\text{B} = 4,997,628\text{B}.

The remaining 2,584 vertices are the ComicSeries vertices with the title and publishYear properties. Let's assume that the title property is approximately the same length as the name property. The publishYear property is a list of integers. The average length of the publishYear list is 2.17, let's round it up to 3 elements. Since the year is an integer, 2B for each integer will be more than enough, plus the 2B for the metadata. Therefore, each list occupies 3×2B×2B=12B3 \times 2\text{B} \times 2\text{B} = 12\text{B}. Using the same formula as above, but being careful to include both title and publishYear properties, the calculation is:

2584×(112B+13B+12B+16B+16B+104B)=2584×273B=705,432B.2584 \times (112\text{B} + 13\text{B} + 12\text{B} + 16\text{B} + 16\text{B} + 104\text{B}) = 2584 \times 273\text{B} = 705,432\text{B}.

In total, 5,703,060B5,703,060\text{B} to store vertices.

The edges don't have any properties on them, so the formula is as follows:

NumberOfEdges×(Edge+SkipListNode+next_pointers+Delta).\texttt{NumberOfEdges} \times (\texttt{Edge} + \texttt{SkipListNode} + \texttt{next\_pointers} + \texttt{Delta}).

There are 682,943 edges in the Marvel dataset. Hence, we have:

682,943×(40B+16B+16B+104B)=682,943×176B=120,197,968B.682,943 \times (40\text{B}+16\text{B}+16\text{B}+104\text{B}) = 682,943 \times 176\text{B} = 120,197,968\text{B}.

Next, Hero, Comic and ComicSeries labels have label indices. To calculate how much space they take up, use the following formula:

NumberOfLabelIndices×NumberOfVertices×(SkipListNode<LabelIndex::Entry>+next_pointers).\texttt{NumberOfLabelIndices} \times \texttt{NumberOfVertices} \times (\texttt{SkipListNode<LabelIndex::Entry>} + \texttt{next\_pointers}).

Since there are three label indices, we have the following calculation:

3×21,723×(24B+16B)=65,169×40B=2,606,760B.3 \times 21,723 \times (24\text{B}+16\text{B}) = 65,169 \times 40\text{B} = 2,606,760\text{B}.

For label-property index, labeled property needs to be taken into account. Property name is indexed on Hero and Comic vertices, while property title is indexed on ComicSeries vertices. We already assumed that the title property is approximately the same length as the name property.

Here is the formula:

NumberOfLabelPropertyIndices×NumberOfVertices×(SkipListNode<LabelIndex::Entry>+property+next_pointers).\texttt{NumberOfLabelPropertyIndices} \times \texttt{NumberOfVertices} \times (\texttt{SkipListNode<LabelIndex::Entry>} + \texttt{property} + \texttt{next\_pointers}).

When the appropriate values are included, the calculation is:

3×21,723×(80B+13B+16B)=65,169×109B=7,103,421B.3 \times 21,723 \times (80\text{B}+13\text{B}+16\text{B})= 65,169 \times 109\text{B} = 7,103,421\text{B}.

Now let's sum up everything we calculated:

5,703,060B+120,197,968B+2,606,760B+7,103,421B=135,611,209B130MB.5,703,060\text{B} + 120,197,968\text{B} + 2,606,760\text{B} + 7,103,421\text{B} = 135,611,209 \text{B} \approx 130\text{MB}.

Bear in mind the number can vary because objects can have higher overhead due to the additional data.

Query Execution memory Usage

Query execution also uses up RAM. In some cases, intermediate results are aggregated to return valid query results and the query execution memory can end up using a large amount of RAM. Keep in mind that query execution memory monotonically grows in size during the execution, and it's freed once the query execution is done. A general rule of thumb is to have double the RAM than what the actual dataset is occupying.

Configuration options to reduce memory usage

Here are several tips how you can reduce memory usage and increase scalability:

  1. Consider removing label index by executing DROP INDEX ON :Label;
  2. Consider removing label-property index by executing DROP INDEX ON :Label(property);
  3. If you don't have properties on relationships, disable them in the configuration file by setting the -storage-properties-on-edges flag to false. This can significantly reduce memory usage because effectively Edge objects will not be created, and all information will be inlined under Vertex objects. You can disable properties on relationships with a non-empty database, if the relationships are without properties. If you need help with adapting the configuration to your needs, check out the the how-to guide on changing configuration settings.

You can also check our reference guide for information about controlling memory usage, and you inspect and profile your queries to devise a plan for their optimization.