Extract changes from data sources in real-time
DataCater's data source connectors support change data capture (CDC). After performing an initial full snapshot, they monitor data sources for change events (INSERTs, UPDATEs, and DELETES). Once change events have been detected, CDC connectors take care of extracting them and passing the change events to pipelines consuming the data source. CDC enables data pipelines to process changes from data sources in real-time and stream them to data sinks, keeping downstream applications and data systems always up-to-date.
From database systems over web APIs to object stores
While CDC has been traditionally applied to extracting data from database systems via replication logs, DataCater goes one step further and offers CDC connectors for most data systems. Our generic REST/HTTP source connector can extract data change events from almost any web API. Connectors for object stores, like AWS S3 or Google Cloud Storage, are able to detect changes in flat files, e.g., CSV or JSON, stored in buckets of interest.
Create historical changelogs without changing data sources
The data change events provided by CDC connectors can be applied to build historical changelogs of data source systems. Since they typically don't require introducing changes to the consumed data source system, change data capture resembles a great way to extend legacy applications with a historical changelog (or audit log).
Does DataCater support bulk loads, too?
Typically, change data capture connectors are much more efficient than bulk loads and help to reduce the load on data systems. There might be rare cases, where a full bulk load is the preferred data extraction technique. Most data source connectors of DataCater support - in addition, to change data capture - full bulk loads, too.