Skip to main content
Version: 2.2.1

Transformation modules

In order to connect Memgraph to a data stream, it needs to know how to transform the incoming messages in order to consume them correctly. This is done with a transformation module.

To create a transformation module, you need to:

  1. Create a Python or a shared library file (module)
  2. Save the file into the Memgraph's query_modules directory (default: /usr/lib/memgraph/query_modules)
  3. Load the file into Memgraph either on startup (automatically) or by running a CALL mg.load(); query

Creating a transformation module

Memgraph supports user-defined transformations procedures written in C and Python that act on data received from a streaming engine. These transformation procedures are grouped into a module called Transformation module which is then loaded into Memgraph on startup or later on. A transformation module consists of a transformation, a query procedure, or both.

Currently, we support transformations for Kafka, Pulsar and Redpanda streams.

The available API references are:

Check out our how-to guides on implementing a typical transformation module if you are using Kafka or Pulsar.

Loading modules on startup

Memgraph attempts to load the modules from all *.so and *.py files it finds in the default (/usr/lib/memgraph/query_modules) directory. The *.so modules are written using the C API and the *.py modules are written using the Python API. Each file corresponds to one module. Names of these files will be mapped to module names. For example, hello.so will be mapped to the hello module and a py_hello.py script will be mapped to the py_hello module.

If you want to change the directory in which Memgraph searches for transformation modules, just change or extend the --query-modules-directory flag in the main configuration file (/etc/memgraph/memgraph.conf) or supply it as a command-line parameter (e.g., when using Docker).

caution

Please remember that if you are using Memgraph Platform image, you should pass configuration flags within MEMGRAPH environmental variable (e.g. docker run -e MEMGRAPH="--bolt-port=7687" memgraph/memgraph-platform) and if you are using any other image you should pass them as arguments after the image name (e.g., memgraph/memgraph-mage --bolt-port=7687 --query-modules-directory=path/path).

Transfer transformation module into a Docker container
If you are using Docker to run Memgraph, you will need to copy the transformation module file from your local directory into the Docker container where Memgraph can access it.

1. Open a new terminal and find the CONTAINER ID of the Memgraph Docker container:

docker ps

2. Copy a file from your current directory to the container with the command:

docker cp ./trans_module.py <CONTAINER ID>:/usr/lib/memgraph/query_modules/trans_module.py

The file is now inside your Docker container.

Utility procedures for transformations

Query procedures that allow you to gain more insight modules and transformations are written under our utility mg query module. For transformations, this module offers:

ProcedureDescription
mg.transformations() :: (name :: STRING)Lists all transformation procedures.
mg.load(module_name :: STRING) :: ()Loads or reloads the given module.
mg.load_all() :: ()Loads or reloads all modules.

For example, you can invoke mg.transformations() from mgconsole or Memgraph Lab with the following command:

CALL mg.transformations() YIELD *;

This will yield the following result:

+-------------------------------------------+-------------------------------------------------------+-------------+
| name | path | is_editable |
+-------------------------------------------+-------------------------------------------------------+-------------+
| "batch.transform" | "/usr/lib/memgraph/query_modules/batch.py" | true |
+-------------------------------------------+-------------------------------------------------------+-------------+

You can see that Memgraph has already loaded the user-defined transformation of the module batch.

To load a module (named e.g. hello) that wasn't loaded on startup (probably because it was added to Memgraph's directory once Memgraph was already running), you can invoke:

CALL mg.load("hello");

If you wish to reload an existing module, say the hello module above, use the same procedure:

CALL mg.load("hello");

To reload all existing modules and load any newly added ones use:

CALL mg.load_all();