Data pipelines consist of a set of transformation steps. Each step can apply one transformation function to each attribute. Transformation functions can be restricted to certain attribute values by combining them with a filter.
For instance, to replace all missing values with a certain text value, one may apply the transformation Replace With Value with the filter Missing Value.
Adds a numeric attribute to another one.
Adds a constant value to a numeric attribute.
Appends an attribute of type string to another one.
Appends a constant value to an attribute of type string.
Capitalizes all words of a text value. This function transforms the first character of each word to an uppercase notation and the rest to a lowercase notation.
Casts all values from an attribute to a given data type.
Divides a numeric attribute by another one. When dividing by zero, the function will return NULL.
Divides a numeric attribute by a constant value. When dividing by zero, the function will return NULL.
Drops an attribute from the data set.
Floors a floating-point value to the nearest integer.
Calculates the hash value of a text value using a selected hash function.
Ignores a record for further processing. This transformation is often combined with a filter to ignore only selected records.
Divides a numeric attribute by another one and returns the remainder.
Divides a numeric attribute by another one and returns the remainder.
Multiplies a numeric attribute by another one.
Multiplies a numeric attribute by a constant value.
Prepends an attribute of type string to another one.
Prepends a constant value to an attribute of type string.
Removes the following punctuation characters from a text value: \t, \n, !, ", ', #, $, %, &, (, ), *, +, ,, -, ., :, ;, <, =, >, ?, @, [, ], \, ^, _, {, |, }, ~.
Renames an attribute.
Can be applied to strings holding valid JSON objects and renames keys at all levels in a possibly deeply-nested JSON structure.
Replaces values of an attribute with values of another one of the same type.
Replaces values of an attribute with a constant value.
Rounds a floating-point value upwards to the nearest integer.
Rounds a floating-point value with a given precision.
Subtracts a numeric attribute from another one.
Subtracts a constant value from a numeric attribute.
Tokenizes a text value using a given delimiter and turns it into an array of text values, where each entry is one token from the original text value. By default, this transformer uses whitespaces as delimiter.
Transforms all characters of a text value to a lowercase notation.
Transforms all characters of a text value to a uppercase notation.
Removes all leading and trailing whitespaces from a text value.
Truncates a text value after a given number of characters.
Truncates a text value after the given number of words. This transformer is implemented as follows: First, perform a whitespace tokenization. Second, consider only the first n tokens (n equals the number of words after which the original text value should be truncated). Third, join all considered words by whitespaces.
BigQuery returns repeated and nested fields as deeply-nested JSON objects. This transformation unwraps such structures into a more concise one.
Allows to implement a user-defined transformation function (UDTF) in Python.
UDTFs take the value of the attribute they are applied to
and the entire row as parameters, can perform arbitrary
operations, and return a value.
The returned value must be of the same type as the attribute that the
UDTF is applied to. However, UDTFs can return None
, which is considered as a NULL (or empty) value by DataCater.
Please see below the structure of a UDTF:
def transform(value, row): return value
Please read code transformations to learn more about user-defined transformations.