Enhance your data pipelines by applying cloud-native principles.
Declarative data pipelines
Declarative data pipelines allow for more reliable, resilient, and reproducible deployments and faster iterations in development. DataCater offers a YAML-based representation, heavily inspired by Kubernetes' custom resource definition files, of data pipelines, which can be exported, imported, and edited through our API or the Pipeline Designer.
The following code listing shows an exemplary pipeline in DataCater's YAML format:
name: Hash emails metadata: stream-in: f21a4bc1-6f24-4a38-a17f-94e5ad0cca2a stream-out: 4d36940f-ff87-48ee-8e90-9561c3bea628 spec: steps: - kind: Field fields: email: transform: key: hash
Running data pipelines as containers
DataCater deploys immutable revisions of streaming data pipelines as non-privileged containers, using Kubernetes. DataCater relies on containers for pipeline execution for multiple reasons. First, containers allow us to apply the Self-Containment Principle and isolate data pipelines from other components, services, and the rest of our platform - an important trait when running a data pipeline platform at scale. They allow DataCater to define resource requests and limitations on the pipeline level and enable the elastic scaling of streaming data pipelines depending on the current load.
By running data pipelines as containers, DataCater Self-Managed can easily integrate with existing tools for monitoring and logging and make pipeline logs available for an investigation outside of the DataCater platform.