Getting started

Data Pipelines

Source connectors
- AWS S3
- Azure Blob Storage
- Azure SQL
- CSV
- FTP/SFTP
- Google Cloud BigQuery
- Google Cloud Storage
- Google Drive
- JSON
- MySQL
- PostgreSQL
- REST/HTTP
- RSS
- Typeform
- XML
Sink connectors
- Amazon Redshift
- Apache Solr
- AX Semantics
- CSV
- Elasticsearch
- Google Cloud BigQuery
- HubSpot
- JSON
- MySQL
- PostgreSQL
- REST/HTTP
- Snowflake
- XML
Data types
Filters
Transformations
- No-Code Transformations
- Code Transformations
Joins
Deployments

No-Code Transformations

Data pipelines consist of a set of transformation steps. Each step can apply one transformation function to each attribute. Transformation functions can be restricted to certain attribute values by combining them with a filter.

For instance, to replace all missing values with a certain text value, one may apply the transformation Replace With Value with the filter Missing Value.

Add Attribute

Double Float Int Long

Adds a numeric attribute to another one.

Add Value

Double Float Int Long

Adds a constant value to a numeric attribute.

Append Attribute

String

Appends an attribute of type string to another one.

Append Value

String

Appends a constant value to an attribute of type string.

Capitalize

String

Capitalizes all words of a text value. This function transforms the first character of each word to an uppercase notation and the rest to a lowercase notation.

Cast Data Type

All Data Types

Casts all values from an attribute to a given data type.

Divide By Attribute

Double Float Int Long

Divides a numeric attribute by another one. When dividing by zero, the function will return NULL.

Divide By Value

Double Float Int Long

Divides a numeric attribute by a constant value. When dividing by zero, the function will return NULL.

Drop Attribute

All Data Types

Drops an attribute from the data set.

Floor To Integer

Double Float

Floors a floating-point value to the nearest integer.

Hash

String

Calculates the hash value of a text value using a selected hash function.

Ignore Record

All Data Types

Ignores a record for further processing. This transformation is often combined with a filter to ignore only selected records.

Modulo Attribute

Double Float Int Long

Divides a numeric attribute by another one and returns the remainder.

Modulo Value

Double Float Int Long

Divides a numeric attribute by another one and returns the remainder.

Multiply By Attribute

Double Float Int Long

Multiplies a numeric attribute by another one.

Multiply By Value

Double Float Int Long

Multiplies a numeric attribute by a constant value.

Prepend Attribute

String

Prepends an attribute of type string to another one.

Prepend Value

String

Prepends a constant value to an attribute of type string.

Remove punctuation

String

Removes the following punctuation characters from a text value: \t, \n, !, ", ', #, $, %, &, (, ), *, +, ,, -, ., :, ;, <, =, >, ?, @, [, ], \, ^, _, {, |, }, ~.

Rename Attribute

All Data Types

Renames an attribute.

Rename Key In JSON Structure

String

Can be applied to strings holding valid JSON objects and renames keys at all levels in a possibly deeply-nested JSON structure.

Replace With Attribute

All Data Types

Replaces values of an attribute with values of another one of the same type.

Replace With Value

All Data Types

Replaces values of an attribute with a constant value.

Round Upwards To Integer

Double Float

Rounds a floating-point value upwards to the nearest integer.

Round With Precision

Double Float

Rounds a floating-point value with a given precision.

Subtract Attribute

Double Float Int Long

Subtracts a numeric attribute from another one.

Subtract Value

Double Float Int Long

Subtracts a constant value from a numeric attribute.

Tokenize

String

Tokenizes a text value using a given delimiter and turns it into an array of text values, where each entry is one token from the original text value. By default, this transformer uses whitespaces as delimiter.

Transform To Lowercase

String

Transforms all characters of a text value to a lowercase notation.

Transform To Uppercase

String

Transforms all characters of a text value to a uppercase notation.

Trim

String

Removes all leading and trailing whitespaces from a text value.

Truncate Characters

String

Truncates a text value after a given number of characters.

Truncate Words

String

Truncates a text value after the given number of words. This transformer is implemented as follows: First, perform a whitespace tokenization. Second, consider only the first n tokens (n equals the number of words after which the original text value should be truncated). Third, join all considered words by whitespaces.

Unwrap Nested Record From Google BigQuery

String

BigQuery returns repeated and nested fields as deeply-nested JSON objects. This transformation unwraps such structures into a more concise one.

User-Defined Transformation

Boolean Double Float Int Long String

Allows to implement a user-defined transformation function (UDTF) in Python.

UDTFs take the value of the attribute they are applied to and the entire row as parameters, can perform arbitrary operations, and return a value. The returned value must be of the same type as the attribute that the UDTF is applied to. However, UDTFs can return None, which is considered as a NULL (or empty) value by DataCater.

Please see below the structure of a UDTF:

def transform(value, row):
  return value

Please read code transformations to learn more about user-defined transformations.

Add Attribute
Add Value
Append Attribute
Append Value
Capitalize
Cast Data Type
Divide By Attribute
Divide By Value
Drop Attribute
Floor To Integer
Hash
Ignore Record
Modulo Attribute
Modulo Value
Multiply By Attribute
Multiply By Value
Prepend Attribute
Prepend Value
Remove punctuation
Rename Attribute
Rename Key In JSON Structure
Replace With Attribute
Replace With Value
Round Upwards
Round With Precision
Subtract Attribute
Subtract Value
Tokenize
Transform To Lowercase
Transform To Uppercase
Trim
Truncate Characters
Truncate Words
Unwrap Nested Record From Google BigQuery
User-Defined Transformation

Getting started

Data Pipelines

No-Code Transformations

Add Attribute

Add Value

Append Attribute

Append Value

Capitalize

Cast Data Type

Divide By Attribute

Divide By Value

Drop Attribute

Floor To Integer

Hash

Ignore Record

Modulo Attribute

Modulo Value

Multiply By Attribute

Multiply By Value

Prepend Attribute

Prepend Value

Remove punctuation

Rename Attribute

Rename Key In JSON Structure

Replace With Attribute

Replace With Value

Round Upwards To Integer

Round With Precision

Subtract Attribute

Subtract Value

Tokenize

Transform To Lowercase

Transform To Uppercase

Trim

Truncate Characters

Truncate Words

Unwrap Nested Record From Google BigQuery

User-Defined Transformation

Jump to