Analyze graph
The ANALYZE GRAPH
will check and calculate certain properties of a graph so
the database can choose a more optimal index or MERGE
transaction.
Before the introduction of the ANALYZE GRAPH
query, the database would choose
an index solely based on the number of indexed nodes. But if the number of nodes
is the only condition, in some cases the database would choose a non-optimal
index. Once the ANALYZE GRAPH
is run, Memgraph analyzes the distribution of
property values and can select a more optimal label-property index, the one with
the smallest average property value size.
The average property value's group size directly represents the database's expected number of hits which can be used to estimate the query's cost. When the average group size is the same, the chi-squared statistic is used to measure how close the distribution of property-value group size is to the uniform distribution. The index with a distribution closest to the uniform distribution is selected.
Upon running the ANALYZE GRAPH
query, Memgraph also check the node degree of
every indexed nodes and calculates the average degree. By having these values,
Memgraph can make a more optimal MERGE
expansion and improve performance. It's
always better to perform a MERGE
by expanding from the node that has a lesser
degree than the connecting node.
The ANALYZE GRAPH;
command should be run only once after all indexes have been
created and nodes inserted in the database. In rare situations when one property
is set on many more nodes than another property, choosing an index based on
average group size and uniform distribution would be misleading. That's why the
database always selects the label-property index with >= 10x fewer nodes than
the other label-property index.
Calculate the statistic
Run the following query to calculate the statistics:
ANALYZE GRAPH;
The query will iterate over all label and label-property indices in the database and calculate the average group size, chi-squared statistic and avg degree for each one, then return the following output:
label | property | num estimation nodes | num groups | avg group size | chi-squared value | avg degree |
---|---|---|---|---|---|---|
index's label | index's property | number of nodes used for estimation | number of distinct values the property contains | average group size of property's values | value of the chi-squared statistic | average degree of the indexed nodes |
Once the necessary information is obtained, Memgraph can choose the optimal
index and MERGE
expansion. If you don't want to run the analysis on all labels,
you can specify which labels to use by adding the labels to the query:
ANALYZE GRAPH ON LABELS :Label1, :Label2;
Delete statistic
If you want the database to ignore information about the average group size, the chi-squared statistic and the average degree, the existing statistic can be deleted by running:
ANALYZE GRAPH DELETE STATISTICS;
The results will contain all label-property indices that were successfully deleted:
label | property |
---|---|
index's label | index's property |
Specific labels can be specified with the construct ON LABELS
:
ANALYZE GRAPH ON LABELS :Label1 DELETE STATISTICS;