Cloud-Native Data Pipelines

In the last decade, the adoption of cloud computing grew exponentially. The shift from monolithic applications to microservices to cloud-native software has changed the responsibilities and capabilities in application development.

The development and the operation of software were treated as two different crafts. With the rise of cloud technologies, such as Kubernetes, the software industry merged these skill sets into one, allowing one role, DevOps, to take care of both tasks. DevOps tools were created to meet the new requirements of a You built it, you run it culture. Principles, which were at the heart of this successful shift, include containers as deployment units, declarative runtime descriptions, continuous integration and deployment, observability, and elastic scalability.

Data workers were largely untouched by this shift, and we still see clear separation in development and the operation of data-driven applications.

In this whitepaper, we outline the advantages of adopting cloud-native computing principles to the building of data pipelines and the workflow of data engineers, data scientists, and ML engineers. We aim to enable data pipeline developers to operate their data pipelines efficiently and effectively. For this, we outline how cloud-native principles apply to data workflows and where we see shortcomings in the current landscape of tools.

We present the benefits of containers for running data pipelines, demonstrate the improved transparency that can be unlocked by the Sidecar pattern, and discuss means to unlock declarative descriptions of data pipelines. The closing chapters on DataOps and Scalability summarize the impact of cloud-native principles and tooling on data workflows and how data processing must adapt to these to leverage the full potential of the cloud-native movement.

Download whitepaper for free

Name

By clicking "Download for free" you agree with the processing of your data according to the privacy policy and allow us to contact you via e-mail for marketing purposes. You can opt-out of this agreement at any time by sending an e-mail to info@datacater.io.

Cloud-Native Data Pipelines

By Hakan Lofcali

Download whitepaper for free