Skip to main content
Version: 2.2.0

Indexing

Introduction

A database index is a data structure used to improve the speed of data retrieval within a database at the cost of additional writes and storage space for maintaining the index data structure.

If you have a deep understanding of your data model and use-case, you can decide which data to index and significantly improve the efficiency of data retrieval.

Index types

At Memgraph, we support two types of indexes:

  • label index
  • label-property index

Memgraph will NOT automatically index labeled data. If you want to optimize queries that fetch nodes by label you need to perform the indexing.

CREATE INDEX ON :Person;

Retrieving nodes using this query is now much more efficient:

MATCH (n :Person) RETURN n;

You can also create indexes on data with a specific combination of label and property, hence the name label-property index.

For example, if you are storing information about people and you often retrieve their age, it might be beneficial to create an index on nodes labeled as :Person with a property named age by using the following language construct:

CREATE INDEX ON :Person(age);

After creating that index, queries will be more efficient because Memgraph's query engine will not have to fetch each :Person node and check whether the property exists. Even if all nodes labeled as :Person have an age property, creating an index can still be beneficial because entries within the index are sorted by property value. By creating a label-property index, queries similar to this one will be more efficient:

MATCH (n :Person {age: 42}) RETURN n;

Indexing is also beneficial for simple queries that filter data with a WHERE clause:

MATCH (n) WHERE n:Person AND n.age = 42 RETURN n;

Bear in mind that WHERE filters could contain arbitrarily complex expressions and index based retrieval might not be used. Nevertheless, we are continually improving our index usage recognition algorithms.

Filtering based on less than or greater than comparisons will also use index based retrieval:

MATCH (n) WHERE n:PERSON and n.age < 18 RETURN n;

Information about available indexes can be retrieved using the following syntax:

SHOW INDEX INFO;

This query will return all the labels and label-property pairs that Memgraph currently indexes.

You can delete created indexes by using the following syntax:

DROP INDEX ON :Label(property);

This query instructs all active transactions to abort as soon as possible. Once all transaction have finished, it will drop the index.

Underlying implementation

The central part of our index data structure is a highly-concurrent skip list. Skip lists are probabilistic data structures that allow fast search within an ordered sequence of elements. The structure itself is built in layers where the bottom layer is an ordinary linked list that preserves the order. Each higher level can be imagined as a highway for layers below.

The implementation details behind skip list operations are well documented in the literature and are out of scope for this article. Nevertheless, we believe that it is important for more advanced users to understand the following implications of this data structure (n denotes the current number of elements in a skip list):

  • Average insertion time is O(log(n))
  • Average deletion time is O(log(n))
  • Average search time is O(log(n))
  • Average memory consumption is O(n)