Changelog


2021.12 - December 1st, 2021

New features:

New connectors:

Bug fixes:


2021.11 - November 5th, 2021

New features:

New connectors:

Bug fixes:

  • Fix UI bug in managing notification settings of project members.
  • For the CSV source connector, fix a bug in specifying the number of to-be-skipped lines..

2021.10 - October 11th, 2021

New features:

  • For the BigQuery sink connector, support automatic creation of tables.
  • Do no longer require to re-enter the password when performing a test connection for an existing data source or data sink.
  • For the REST source connector, support UNIX timestamps for capturing change information.
  • For the REST source connector, use 1970-01-01T00:00:00Z. as default value for the initial timestamp offset.
  • In DataCater Self-Managed, allow to configure Kafka-related settings via the following environment variables: KAFKA_TOPICS_CLEANUP_POLICY, KAFKA_TOPICS_PARTITIONS, KAFKA_TOPICS_REPLICATIONFACTOR, KAFKA_TOPICS_RETENTION_BYTES, and KAFKA_TOPICS_RETENTION_MS.

New connectors:


2021.09 - September 1st, 2021

New features:

  • Advanced validation of data source and data sink configs.
  • Support sending health notifications to Slack channels (only available for projects).
  • Bump maximum event size from 1MB to 3MB.
  • Improve management of project settings. For instance, project admins can now manage the individual notification settings of project members.
  • Support automated handling of time-based access tokens for REST endpoint data source.
  • Allow to drop primary key columns in pipelines.
  • Redirect to project page after deleting project resources, like pipelines.

2021.08 - August 2nd, 2021

New features:

  • When creating or editing a connector, show the link to its documentation.
  • For flat files, automatically create the attribute __datacater_file_name and fill it with the name of the flat file.
  • Truncate log messages longer than 5,000 characters.
  • Enforce UTF-8 encoding for FTP/SFTP source connector.
  • For REST data source, allow to extract records from a JSON object, in addition to a JSON array. Treat all object keys as separate records.

New connectors:

Bug fixes:

  • Fix encoding bug in CSV/JSON/XML data sinks.
  • Escape HTML tags in the deployment logs.

2021.07 - July 2nd, 2021

New features:

  • Support timestamp formats without timezones for REST data sources.
  • Allow to use MySQL and PostgreSQL sinks in append-only mode (configurable via configuration option insert mode).
  • Show build time for deployments.
  • Set default retention for Kafka topics to 1 day or 100MB.
  • Install Python module feedparser.

New connectors:

Bug fixes:

  • Treat NUMERIC and DECIMAL fields as doubles for the JDBC-based MySQL source connector.

2021.06 - June 4th, 2021

New features:

  • Allow to reset the offset of the sink connector on the Deployments page.
  • Provide the environment variable DATACATER_ENVIRONMENT to user-defined transformation functions, which holds either preview (Pipeline Designer) or production (Deployment).
  • In the AX Semantics sink connector, perform a commit of the sink connector offset after each processed record to prevent timeout issues.
  • Allow specifying multiple collection IDs as a comma-separated list for the AX Semantics sink connector.
  • Add the Python modules geopy and Shapely.
  • Automatically create the attribute __datacater_file_name for flat file sources, which contains the name of the uploaded CSV, JSON, or XML file.
  • Show a maximum of 1,500 characters in the cells of the pipeline designer to prevent performance issues with very long text values.
  • Provide the timestamp variables now, today, tomorrow, dayAfterTomorrow, yesterday, and dayBeforeYesterday in the config options of the REST endpoint source connector.

Bug fixes:

  • Fix bug in re-uploading flat files in the pipeline designer.

2021.05 - May 3rd, 2021

New features:

  • Use HTML layouts for notification mails.
  • Allow to reset offsets of pipelines, which will skip all unprocessed events in the Kafka source topic of the pipeline.
  • Update Debezium-based connectors (MySQL source, PostgreSQL source) to version 1.5.
  • Add the non-standard Python module pytz to the UDF runner.
  • Show only the last 200 lines of the deployment logs, by default.
  • Validate CRON expressions provided in the config for the REST source connector.

New connectors:


2021.04 - April 5th, 2021

New features:

  • Update user-defined transformations to Python 3.7.3 and pre-install the following non-standard Python modules: langdetect, nested-lookup, nltk, numpy, requests, requests-cache, and spacy.
  • Allow users to unsubscribe from notifications in projects.
  • Support specifying sync intervals as CRON expressions for the REST source connector.
  • Show the lag of the pipeline and the sink connector on the deployments page:
    • The lag of the pipeline equals the number of records, which have been extracted by the source connector but have not yet been processed by the pipeline.
    • The lag of the sink connector equals the number of records, which have been processed by the pipeline but have not yet been published by the sink connector.
  • Write errors of deployments to stderr.
  • Allow to filter attributes of the data sink while mapping a pipeline to a data sink.
  • Simplify the parsing of XML files. We recommend to use Python's xml.etree.ElementTree module, available in the user-defined transformations, for parsing deeply-nested XML structures.
  • Support DELETE as HTTP method in the REST sink connector.
  • Skip unparseable DDL statements in the MySQL source connector.

Bug fixes:

  • Using three double quotes (""") in user-defined transformations leads to failures in creating deployments.
  • Deleting pipelines might lead to a sign out of the user.
  • Deployment logs are not always correctly reset when switching between pipelines.
  • Data sinks can be changed while a deployment is running.

2021.03 - March 2nd, 2021

New connectors:

New features:

  • Allow to manage pipelines, data sources, and data sinks in projects. Projects enable collaboration in teams and allow users to share ressources with colleagues. Projects can be created by DataCater admin users in the admin UI. When adding members to a projects, one may choose between the following three roles:
    • A Viewer gets read access to all project ressources,
    • an Editor gets, in addition to read access, also write access to all projects ressources, but cannot neither delete ressources nor manage the project,
    • an Administrator can, in addition to the permissions of the Editor, delete ressources, manage project memberships, and administrate the project.
  • Simplify parsing of JSON data sources: Parse JSON arrays and objects as strings.
  • Include the ID of a deployment in the name of the container to ease pipeline-level monitoring. Containers are named datacater_deployment-ID, where ID equals the ID of the deployment.

2021.02 - February 1st, 2021

New features:

  • Improve navigation of the Pipeline Designer.
  • User-defined transformations can take the whole record, provided as a Python Dictionary object, as second parameter:
    def transform(value, row):
      return value.replace("###name###", row["name"])
    
  • Add failure reason to notification e-mails about failed connectors to speed up debugging.
  • Allow to configure whether source or sink connector shall be automatically restarted in case of failures. This configuration option is enabled by default and can be changed by editing the respective data source or data sink.

Bug fixes:

  • Fix bug in applying Replace with attribute transformation to date, time, and timestamp values.

2021.01 - January 4th, 2021

New connectors:

New transformation functions:

New features:

  • Improve interactivity of creating pipelines.
  • Improve monitoring of Kafka Connect connectors.
  • Send notification via e-mail when a pipeline source or sink connector fails.

Bug fixes:

  • Fix bug in deleting pipeline sink connectors.

2020.12 - December 1st, 2020

New features:

  • Allow editing data sources and data sinks without manually re-entering the password.
  • Trim hostnames, database names, table names of data sources and data sinks to sanitize user input.
  • MySQL source connector: Do not monitor the schemas of tables other than the monitored one. As a consequence, when changing the table name of a MySQL data source, the data processing of all consuming pipelines must be manually reset for re-fetching the schema of the new table.
  • Show dedicated error page, when accessing unavailable resources, e.g., data sources, or pipelines.

2020.11 - November 2nd, 2020

New connectors:

New features:

  • Add support for left outer and inner joins.
  • When starting the DataCater application, automatically restart all pipeline containers that are still marked as running. This is helpful in situations, where the DataCater application is being restarted after not being shutdown gracefully (e.g., power outage).
  • Automatically drop used PostgreSQL replication slots once they are no longer needed, i.e., when deleting the pipeline.
  • Allow naming deployments.
  • Add widget showing running pipelines to start page.
  • Allow providing custom primary keys for flat file sources (CSV, JSON, and XML).
  • Publish attributes of type timestamp with milliseconds precision to data sinks.

Bug fixes:

  • Fix bug in parsing primary keys from MySQL: Columns with a uniqueness constraint were falsely detected as primary keys.

2020.10 - October 1st, 2020

New connectors:

New features:

  • If the profiling of a data sink, which is performed when assigning a certain data sink to a pipeline, fails show the failure message in the UI.
  • Support configuration of the logical replication plugin to be used for the PostgreSQL source connector.
  • Support configuration of the server timezone for the MySQL data source connector and the MySQL data sink connector.
  • Show deletions of data sources, data sinks, and pipelines in the activity stream.
  • Do not empty the data sink when resetting the data processing. In most cases, this does not change anything, because we use upserts and simply overwrite already-processed records. If you made changes to the primary key between the first execution of a data pipeline and the reset of the data processing, you may need to manually remove data from the data sink before resetting the processing.
  • Show the processing status of ingested CSV and JSON files.
  • Manually create Kafka topics for data pipelines to support pipeline-level settings for Kafka configuration options, such as the replication factor.
  • Provide more information in logs of running data pipelines.

Bug fixes:

  • Fix bug in retrieving the schema from a BigQuery table, where the schema was edited after the initial creation.
  • Fix bug in processing date, time, and timestamp fields with the PostgreSQL sink connector.

2020.09 - September 1st, 2020

New connectors:

New features:

  • Allow user-defined transformation functions to take another attribute as a parameter.
  • Use chunked transfer encoding for serving flat file sinks, which strongly improves the handling of large data sets.
  • Return empty files when trying to download empty flat file sinks.
  • Allow retrieving available tables for MySQL data source, PostgreSQL data source, MySQL data sink, PostgreSQL data sink, and BigQuery data sink.
  • Improve connection test for PostgreSQL sink.
  • Improve handling of date, time, and timestamp attributes.
  • Allow more characters in attribute names: numbers, whitespaces, and German umlauts.
  • Show error message when building, starting, or stopping a deployment fails.
  • Validate attribute names.
  • Show the current release name of DataCater in navigation.

Bug fixes:

  • Fix bug in persisting data sink mapping.
  • Fix bug in reassigning data sink connectors to pipeline.

2020.08 - August 3rd, 2020

New features:

  • Improve internal management of Kafka Connect connectors.
  • Improve layout of the admin interface for managing user accounts.
  • Remove dependency on Elasticsearch for the storage of sample data.
  • Add support for managing flat files as regular data sources.
  • Improve visual feedback for successful uploads of flat files.
  • Move management of deployments to Pipeline Designer.
  • Replace calls to window.alert() with modern modal dialogs.
  • Add health check for data sources and data sinks.

Bug fixes:

  • Fix bug in loading sample records after creating new pipelines.

2020.07 - July 1st, 2020

Initial release of DataCater! 🥳