- September 13th, 2022
- Upgrade Java Kubernetes client to version 15.0.1
- PostgreSQL source connector: Support array columns
- August 30th, 2022
- REST/HTTP source connector: Support offset-based pagination
- HubSpot sink connector: Support private apps
- PostgreSQL source connector: Allow to override SELECT statement used for snapshotting
- Python transforms: Provide line number for exceptions
- Reduce number of calls to Kubernetes API
- July 15th, 2022
- REST/HTTP sink connector: Support OAuth2 and time-restricted access tokens
- Kubernetes: Specify resource limits and requests for pipeline pods
- June 15th, 2022
- Introduce Streams resource
- HubSpot sink connector: Support custom objects
- Support flat file uploads of up to 250MB
- REST/HTTP source connector: Support arrays as records
- Kubernetes: Run platform service as Stateful Set for HA
- Assign pipeline deployments to correct node pool, if a
node selector is configured
- RSS source connector: Fix full re-sync by producing correct tombstone record
- May 9th, 2022
- Google Drive source connector: Support JSON and Google Sheet files
- REST/HTTP sink connector: Support PATCH verb
- Remove key
sourceSchema from YAML descriptions
- Attach YAML descriptions to pipeline deployments
- Add Python module
beautifulsoup4 to Python transforms
- April 5th, 2022
- Allow to sync DELETE events to REST sink
- Support time(stamp) values with precision
- Kubernetes deployment improvements
- Profile integers in JSON sources as longs
- Autocreate BigQuery schema with attributes from joined data source
- Get rid of ghost indices created by Elasticsearch sink
- March 8th, 2022
- Introduce self-service sign ups
- When fetching table or column names fails for a data source or data sink, show the exact error message in the frontend.
- Allow to create new pipelines from the detail page of a data source
- Remove projects from top-level navigation
- Show welcome guide to new users
- Fix sampling of headerless CSV files in the data
source connectors for AWS S3, FTP/SFTP, Google Cloud Storage, and Google Drive.
- February 7th, 2022
- Allow to export pipelines in declarative YAML format
- Move main navigation to the top
- Rely only on the Kafka Connect API for monitoring health of data
- Google Cloud Storage source connector: Extend CSV parsing options
- Fix inconsistency in casting strings to time/timestamp objects in the pipeline designer's preview
- Propagate updates in data source configs to join connectors of consuming pipelines
- January 7th, 2022
- Allow to duplicate pipelines with one button click.
- Improve debugging of failed Python transformations by
providing information on the pipeline step and attribute,
where the Python transformation failed, in the deployment log.
- Unify health checks of data sources/sinks and the associated Kafka Connect connectors.
- Allow to configure SSL/TLS-related settings of the mailer through environment variables of the platform container.
- Allow to manually recreate source connectors from the UI.
- RSS source connector: Support the enclosures tag.
- REST/HTTP source connector: Support HTTP request headers with comma-separated lists as values.
- Reduce sensitivity when detecting the health of the connectors to reduce alert fatigue.
- Rename pipeline steps to transformation steps to use a consistent naming.
- SFTP source connector: Support password-based authentication.
- FTP/SFTP source connector: Support headerless CSV files.
- PostgreSQL source connector: Support timestamp/time values with time zone information.
- PostgreSQL source connector: Fix out-of-memory errors occasionally happening when extracting data from big tables.
- Google Cloud BigQuery source connector: Escape data set and table name.
- Fix bug in detecting health of connectors: When a connector throws an exception while a connection test is being performed, it can be considered as failed.
- FTP/SFTP source connector: Fix bug in extracting attribute names from CSV header row.
- December 1st, 2021
- PostgreSQL source connector: Add support for connecting via a JDBC driver. This allows us to support PostgreSQL installations, which do not offer logical replication.
- REST/HTTP source connector: Allow to treat empty strings as NULL values.
- FTP source connector: Fix connection issue.
- FTP source connector: Fix connection issue.
- Google Drive source connector and Google Cloud Storage source connector: Fix bug in detecting primary keys, when auto-generating attribute names.
- November 5th, 2021
- In the configuration forms of source and sink connectors, explain the reasons for a failed connection tests.
- For the FTP/SFTP source connector, support full re-syncs.
- For the MySQL source connector, allow to manually specify the primary key column.
- For the REST source connector, allow to specify the timeout of HTTP requests.
- Fix UI bug in managing notification settings of project members.
- For the CSV source connector, fix a bug in specifying the number of to-be-skipped lines..
- October 11th, 2021
- For the BigQuery sink connector, support automatic creation of tables.
- Do no longer require to re-enter the password when
performing a test connection for an existing data source or
- For the REST source connector, support UNIX timestamps for capturing change information.
- For the REST source connector, use
1970-01-01T00:00:00Z. as default value for the initial
- In DataCater Self-Managed, allow to configure Kafka-related settings via the
following environment variables: KAFKA_TOPICS_CLEANUP_POLICY,
- September 1st, 2021
- Advanced validation of data source and data sink configs.
- Support sending health notifications to Slack channels (only available for projects).
- Bump maximum event size from 1MB to 3MB.
- Improve management of project settings. For instance,
project admins can now manage the individual notification settings of project
- Support automated handling of time-based access tokens for REST endpoint data source.
- Allow to drop primary key columns in pipelines.
- Redirect to project page after deleting project resources, like pipelines.
- August 2nd, 2021
- When creating or editing a connector, show the link to its
- For flat files, automatically create the attribute __datacater_file_name and fill it
with the name of the flat file.
- Truncate log messages longer than 5,000 characters.
- Enforce UTF-8 encoding for FTP/SFTP source connector.
- For REST data source, allow to extract records from a JSON
object, in addition to a JSON array. Treat all object keys as
- Fix encoding bug in CSV/JSON/XML data sinks.
- Escape HTML tags in the deployment logs.
- July 2nd, 2021
- Support timestamp formats without timezones for REST data
- Allow to use MySQL and PostgreSQL sinks in append-only
mode (configurable via configuration option insert mode).
- Show build time for deployments.
- Set default retention for Kafka topics to 1 day or 100MB.
- Install Python module feedparser.
- Treat NUMERIC and DECIMAL fields as doubles for the
JDBC-based MySQL source connector.
- June 4th, 2021
- Allow to reset the offset of the sink connector on the
- Provide the environment variable DATACATER_ENVIRONMENT to
user-defined transformation functions, which holds either
preview (Pipeline Designer)
or production (Deployment).
- In the AX Semantics sink connector, perform a commit of the sink connector offset after each processed record to prevent timeout issues.
- Allow specifying multiple collection IDs as a
comma-separated list for the AX
Semantics sink connector.
- Add the Python modules geopy and Shapely.
- Automatically create the attribute __datacater_file_name for flat file
sources, which contains the name of the uploaded CSV, JSON, or
Show a maximum of 1,500 characters in the cells of the
pipeline designer to prevent performance issues with very
long text values.
Provide the timestamp variables now, today, tomorrow, dayAfterTomorrow, yesterday, and dayBeforeYesterday in the config options of the REST endpoint source connector.
- Fix bug in re-uploading flat files in the pipeline
- May 3rd, 2021
Use HTML layouts for notification mails.
Allow to reset offsets of pipelines, which will skip all
unprocessed events in the Kafka source topic of the
Update Debezium-based connectors (MySQL source, PostgreSQL
source) to version 1.5.
Add the non-standard Python module pytz to the UDF runner.
Show only the last 200 lines of the deployment logs, by
Validate CRON expressions provided in the config for the REST source
- April 5th, 2021
Update user-defined transformations to Python 3.7.3
and pre-install the following non-standard Python modules:
langdetect, nested-lookup, nltk, numpy, requests, requests-cache, and spacy.
Allow users to unsubscribe from notifications in projects.
Support specifying sync intervals as CRON expressions for
the REST source connector.
Show the lag of the pipeline and the sink connector on the
The lag of the pipeline equals the number of records,
which have been extracted by the source connector but
have not yet been processed by the pipeline.
The lag of the sink connector equals the number of
which have been processed by the pipeline but have not
yet been published by the sink connector.
Write errors of deployments to stderr.
Allow to filter attributes of the data sink while mapping a
pipeline to a data sink.
Simplify the parsing of XML files. We recommend to use
Python's xml.etree.ElementTree module, available in the user-defined
transformations, for parsing deeply-nested XML structures.
Support DELETE as HTTP method in the REST sink connector.
Skip unparseable DDL statements in the
MySQL source connector.
Using three double quotes (""") in user-defined
transformations leads to failures in creating
Deleting pipelines might lead to a sign out of the user.
Deployment logs are not always correctly reset when
switching between pipelines.
Data sinks can be changed while a deployment is running.
- March 2nd, 2021
Allow to manage pipelines, data sources, and data sinks in
projects. Projects enable collaboration in teams and allow users to share ressources with colleagues.
Projects can be created by DataCater admin users in the
admin UI. When adding members to a projects, one may choose
between the following three roles:
A Viewer gets read access to
all project ressources,
an Editor gets, in addition to read
access, also write access to
all projects ressources, but cannot neither delete ressources nor manage the project,
an Administrator can, in addition to the permissions of the Editor, delete ressources, manage project memberships, and administrate the project.
Simplify parsing of JSON data sources: Parse JSON arrays and
objects as strings.
Include the ID of a deployment in the name of the container
to ease pipeline-level monitoring. Containers are named
where ID equals the ID
of the deployment.
- February 1st, 2021
- Improve navigation of the Pipeline Designer.
User-defined transformations can take the whole record, provided as a Python Dictionary object, as second parameter:
def transform(value, row):
return value.replace("###name###", row["name"])
Add failure reason to notification e-mails about failed connectors to speed up debugging.
Allow to configure whether source or sink connector shall
be automatically restarted in case of failures. This
configuration option is enabled by default and can be changed by
editing the respective data source or data sink.
Fix bug in applying Replace with attribute transformation
to date, time, and timestamp values.
- January 4th, 2021
New transformation functions:
Improve interactivity of creating pipelines.
Improve monitoring of Kafka Connect connectors.
Send notification via e-mail when a pipeline source or sink
Fix bug in deleting pipeline sink connectors.
- December 1st, 2020
Allow editing data sources and data sinks without
manually re-entering the password.
Trim hostnames, database names, table names of data
sources and data sinks to sanitize user input.
MySQL source connector: Do not monitor the schemas of tables
other than the monitored one. As a consequence, when
changing the table name of a MySQL data source, the data
processing of all consuming pipelines must be manually reset
for re-fetching the schema of the new table.
Show dedicated error page, when accessing unavailable
resources, e.g., data sources, or pipelines.
- November 2nd, 2020
Add support for left outer and inner joins.
When starting the DataCater application, automatically
restart all pipeline containers that are still marked as
running. This is helpful in situations, where the DataCater
application is being restarted after not being shutdown gracefully (e.g., power outage).
Automatically drop used PostgreSQL replication slots
once they are no longer needed, i.e., when deleting the
Allow naming deployments.
Add widget showing running pipelines to start page.
Allow providing custom primary keys for flat file sources (CSV,
JSON, and XML).
Publish attributes of type timestamp
with milliseconds precision to data sinks.
Fix bug in parsing primary keys from MySQL: Columns with a
uniqueness constraint were falsely detected as primary keys.
- October 1st, 2020
If the profiling of a data sink, which is performed when
assigning a certain data sink to a pipeline, fails show the
failure message in the UI.
Support configuration of the logical replication plugin to
be used for the PostgreSQL source
Support configuration of the server
timezone for the MySQL data source
connector and the MySQL data sink
Show deletions of data sources, data sinks, and pipelines in
the activity stream.
Do not empty the data sink when resetting the data
In most cases, this does not change anything, because we use
upserts and simply overwrite already-processed records.
If you made changes to the primary key between the first
execution of a data pipeline and the reset of the data
processing, you may need to
manually remove data from the data sink before resetting the
Show the processing status of ingested CSV and JSON files.
Manually create Kafka topics for data pipelines to support
pipeline-level settings for Kafka configuration options, such as the replication
Provide more information in logs of running data
Fix bug in retrieving the schema from a BigQuery table,
where the schema was edited after the initial creation.
Fix bug in processing date, time, and timestamp fields with
- September 1st, 2020
- Allow user-defined transformation functions to take
another attribute as a parameter.
- Use chunked transfer encoding for serving flat file sinks, which strongly improves the handling of large data sets.
- Return empty files when trying to download empty flat file
- Allow retrieving available tables for MySQL data source,
PostgreSQL data source, MySQL data sink, PostgreSQL data sink,
and BigQuery data sink.
- Improve connection test for PostgreSQL sink.
- Improve handling of date, time, and timestamp
- Allow more characters in attribute names: numbers, whitespaces, and German umlauts.
- Show error message when building, starting, or stopping a
- Validate attribute names.
- Show the current release name of DataCater in navigation.
- Fix bug in persisting data sink mapping.
- Fix bug in reassigning data sink connectors to pipeline.
- August 3rd, 2020
- Improve internal management of Kafka Connect connectors.
- Improve layout of the admin interface for managing user
- Remove dependency on Elasticsearch for the storage of sample data.
- Add support for managing flat files as regular data
- Improve visual feedback for successful uploads of flat
- Move management of deployments to Pipeline Designer.
- Replace calls to window.alert() with modern modal
- Add health check for data sources and data sinks.
- Fix bug in loading sample records after creating new
- July 1st, 2020
Initial release of DataCater! 🥳