Simple yet powerful transformations for streaming ETL
DataCater offers Python transforms as lightweight means to implement any data preparation requirement in streaming ETL pipelines. Python transforms can be developed in DataCater's Pipeline Designer or defined in the declarative YAML format. They are defined as regular Python functions, have access to entire data records, and can apply any transformation to the data before returning them.
Please see the following code listing showing a Python transform replacing placeholders in a String attribute with the content of another attribute:
def transform(value, row): return value.replace("###name###", row["name"])
Interactive previews in the Pipeline Designer
DataCater's Pipeline Designer offers full support for previewing the results of Python transforms. This does not only help engineers to detect accidental side effects as early as possible and debug the performance of Python transforms but also enables non-technical users to observe the output of Python functions, interactively validate their behavior, and even combine them with no-code functions.
Use the Python (modules) you know
DataCater ships the upstream version of CPython 3. If you already have basic knowledge of Python programming, you can get started building custom transformations instantly. DataCater offers access to all core features of Python and even makes the modules of the Python Standard Library, for instance, for processing JSON or XML structures, available to users.