DataCater offers code-based (or user-defined) transformation functions as a powerful and flexible means to implementing custom needs in data preparation.
Code-based transformations can be implemented in Python and applied to attributes of the data types boolean, double, float, int, long, or string.
They are implemented as a Python function, which takes two parameters:
The following code listing shows the structure of a code-based transformation function:
def transform(value, row): return value
While one may choose an arbitrary name for the parameters (default: value
and row
), the
function itself must be called transform
.
The returned value must be of the same type as the attribute that the transformation function is applied to.
Please see below an exemplary UDTF, which replaces substrings in an attribute of type string with the value of the attribute name:
# value is a string def transform(value, row): return value.replace("###name###", row["name"])
The current release of DataCater ships Python version 3.7.3. We use vanilla CPython.
At the moment, the following non-standard Python modules are available in code-based transformations:
Please see below an exemplary function, which uses the langdetect module for automatically detecting the language of a string value:
from langdetect import detect # value is a string def transform(value, row): return detect(value)
We use the following execution timeouts for code-based transformation functions:
In some cases, you may not want to execute all code in the pipeline designer to keep previews interactive and fast.
Using the environment variable DATACATER_ENVIRONMENT
, you can distinguish between the pipeline designer and the deployments. In the pipeline designer, it is set to preview while in the deployments it is set to production.
The following code listing shows how to access the environment variable:
import os env = os.environ['DATACATER_ENVIRONMENT'] def transform(value, row): if env == "production": # execute complex code block return value else: # fast computations to keep previews interactive return value
At the moment, code-based transformations have the following known limitations: