No-Code Transformations

Data pipelines consist of a set of transformation steps. Each step can apply one transformation function to each attribute. Transformation functions can be restricted to certain attribute values by combining them with a filter.

For instance, to replace all missing values with a certain text value, one may apply the transformation Replace With Value with the filter Missing Value.


Add Attribute
Double Float Int Long

Adds a numeric attribute to another one.


Add Value
Double Float Int Long

Adds a constant value to a numeric attribute.


Append Attribute
String

Appends an attribute of type string to another one.


Append Value
String

Appends a constant value to an attribute of type string.


Capitalize
String

Capitalizes all words of a text value. This function transforms the first character of each word to an uppercase notation and the rest to a lowercase notation.


Cast Data Type
All Data Types

Casts all values from an attribute to a given data type.


Divide By Attribute
Double Float Int Long

Divides a numeric attribute by another one. When dividing by zero, the function will return NULL.


Divide By Value
Double Float Int Long

Divides a numeric attribute by a constant value. When dividing by zero, the function will return NULL.


Drop Attribute
All Data Types

Drops an attribute from the data set.


Floor To Integer
Double Float

Floors a floating-point value to the nearest integer.


Hash
String

Calculates the hash value of a text value using a selected hash function.


Ignore Record
All Data Types

Ignores a record for further processing. This transformation is often combined with a filter to ignore only selected records.


Modulo Attribute
Double Float Int Long

Divides a numeric attribute by another one and returns the remainder.


Modulo Value
Double Float Int Long

Divides a numeric attribute by another one and returns the remainder.


Multiply By Attribute
Double Float Int Long

Multiplies a numeric attribute by another one.


Multiply By Value
Double Float Int Long

Multiplies a numeric attribute by a constant value.


Prepend Attribute
String

Prepends an attribute of type string to another one.


Prepend Value
String

Prepends a constant value to an attribute of type string.


Remove punctuation
String

Removes the following punctuation characters from a text value: \t, \n, !, ", ', #, $, %, &, (, ), *, +, ,, -, ., :, ;, <, =, >, ?, @, [, ], \, ^, _, {, |, }, ~.


Rename Attribute
All Data Types

Renames an attribute.


Rename Key In JSON Structure
String

Can be applied to strings holding valid JSON objects and renames keys at all levels in a possibly deeply-nested JSON structure.


Replace With Attribute
All Data Types

Replaces values of an attribute with values of another one of the same type.


Replace With Value
All Data Types

Replaces values of an attribute with a constant value.


Round Upwards To Integer
Double Float

Rounds a floating-point value upwards to the nearest integer.


Round With Precision
Double Float

Rounds a floating-point value with a given precision.


Subtract Attribute
Double Float Int Long

Subtracts a numeric attribute from another one.


Subtract Value
Double Float Int Long

Subtracts a constant value from a numeric attribute.


Tokenize
String

Tokenizes a text value using a given delimiter and turns it into an array of text values, where each entry is one token from the original text value. By default, this transformer uses whitespaces as delimiter.


Transform To Lowercase
String

Transforms all characters of a text value to a lowercase notation.


Transform To Uppercase
String

Transforms all characters of a text value to a uppercase notation.


Trim
String

Removes all leading and trailing whitespaces from a text value.


Truncate Characters
String

Truncates a text value after a given number of characters.


Truncate Words
String

Truncates a text value after the given number of words. This transformer is implemented as follows: First, perform a whitespace tokenization. Second, consider only the first n tokens (n equals the number of words after which the original text value should be truncated). Third, join all considered words by whitespaces.


Unwrap Nested Record From Google BigQuery
String

BigQuery returns repeated and nested fields as deeply-nested JSON objects. This transformation unwraps such structures into a more concise one.


User-Defined Transformation
Boolean Double Float Int Long String

Allows to implement a user-defined transformation function (UDTF) in Python.

UDTFs take the value of the attribute they are applied to and the entire row as parameters, can perform arbitrary operations, and return a value. The returned value must be of the same type as the attribute that the UDTF is applied to. However, UDTFs can return None, which is considered as a NULL (or empty) value by DataCater.

Please see below the structure of a UDTF:

def transform(value, row):
  return value

Please read code transformations to learn more about user-defined transformations.