The Data Pipelines Blog

We regularly publish articles about all things data. Keep up to date with what we're working on!

Most popular articles

CDC with PostgreSQL

CDC with MySQL

Batch vs. Streaming

Latest Articles

Using PostgreSQL RLS with Hibernate Reactive

This guide explains how PostgreSQL Row-Level Security can be used to securely implement multitenant applications with Hibernate Reactive and the Quarkus framework.

April 05

DataCater 2023.2 is here

We are happy to announce the newest open-core release of DataCater, 2023.2, which introduces the Config resource and implements a lot of user feedback.

March 28

Change Data Capture 101: The complete guide

Everything you need to know to get started with applying change data capture to database systems and APIs.

March 07

Optimizing Apache Kafka® for High Throughput

This guide explores the most important consumer and producer properties of Apache Kafka for achieving a high throughput.

February 21

How to mask data in Redpanda with Python and DataCater

This tutorial walks you through developing custom data masking functions in Python and applying them to Redpanda topics.

February 13

Three Use Cases for Getting Started With Apache Kafka

This article introduces three use cases for getting started with Apache Kafka: log analytics, change data capture, and data validation.

January 30

Using Python vs. SQL for Data Pipelines

Learn how to decide between Python and SQL for building data pipelines.

January 18

Introducing DataCater 2023.1: Your fast lane to Streaming ETL

Build better streaming data pipelines faster with our newest open core release 2023.1.

January 16

A Beginner’s Guide to User-Defined Functions in ksqlDB

Learn everything you need to know to get started with developing and using UDFs in ksqlDB.

December 23

Using Apache Kafka in development and test environments

Learn how to set up Apache Kafka for your development and test environments.

December 15

Using Change Data Capture with Google Cloud SQL for MySQL

Use DataCater for streaming change events from Google Cloud SQL for MySQL to your data sinks.

December 02

Data Streaming with Python

We compare three different tools for streaming data with Apache Kafka and Python: kafka-python, Faust, and DataCater.

November 24

Querying change data capture events with cloud data warehouses

Learn how to build consistent snapshots of CDC events that were captured from transactional database systems.

November 03

The core of DataCater is now source-available and free

We are announcing that the core of DataCater, the real-time ETL platform, becomes free and source-available.

October 04

How DataCater helps AX Semantics’ Clients with Data Enrichment

Learn how the streaming data platform DataCater enables the clients of AX Semantics to enrich their data in real-time.

September 08

Capturing Changes in Real-Time from Google Cloud SQL PostgreSQL

Use DataCater for streaming change events from Google Cloud SQL PostgreSQL to your data sinks.

August 30

5-Minute Introduction To Streaming Data Pipelines

Everything you need to know about the concepts of streaming data pipelines.

August 02

DataCater Partner Hour Recap

On May 17, Robert Bråkenhielm from Resultify showed us how they use DataCater in their projects.

May 30

Connecting Applications to Userlist with DataCater

Use DataCater's plug & play CDC connectors to connect your app with Userlist, in real-time and without writing code.

May 09

How to extract data change events from HubSpot's CRM API

Learn how to apply change data capture for extracting data changes from HubSpot.

March 31

10 Useful Python Transforms for your Streaming Data Pipeline

Learn how to transform your data with a few lines of Python code.

March 28

DataCater introduces support for declarative data pipelines

Learn how you can declare streaming data pipelines in YAML.

February 16

Community Meetup #3: Streaming and Batching with DataCater and dbt

Check this recap for a short summary of the third DataCater community meetup.

February 14

Building Real-Time ETL Pipelines with Apache Kafka

Learn how to use Apache Kafka to implement streaming ETL.

February 11

Why Digital Agencies should think Data First

Digital agencies adapt to their customer’s needs, that’s in their DNA, but will that be enough to follow instead of anticipating?

February 10

Unlocking Streaming Data Pipelines on Google Cloud Platform

Learn how DataCater runs streaming data pipelines on GCP.

January 25

DevOps and DataOps

The ultimate goal of DataOps is to reduce the time needed for developing and deploying data pipelines.

January 18

Declarative Data Pipelines

Declarative data pipelines allow for more reliable, resilient, and reproducible deployments and faster iterations in development.

January 13

Sidecars: Observability for Cloud-Native Data Pipelines

Learn how to unlock non-intrusive observability for cloud-native data pipelines.

January 11

The Data Literacy Guide

Get a complete introduction to data literacy.

January 06

Data Pipeline Runtime Consistency with Containers

This article applies the principles of container-based application design to building and deploying data pipelines in the cloud era.

January 03

Cloud-Native Data Pipelines

Accelerate your data development by adopting cloud-native principles.

December 16

Recap of our Community Meetup #2: Streaming Spatial Data

On November 30, we ran our second community meetup with a guest talk on streaming spatial data.

December 07

Unlocking Data Silos of Legacy Applications

This article shows how to apply change data capture to unlock data silos of legacy applications, without changing their code.

October 20

Recap of our first Community Meetup

On September 28, we ran our first community meetup with more than 25 participants. Here is our recap.

September 30

How to use Change Data Capture (CDC) with Elasticsearch

Learn how to extract changes from Elasticsearch in real-time.

September 15

Data Pipelines in Content Automation

What is content automation and how does it benefit from data pipelines?

September 06

PostgreSQL Change Data Capture (CDC): The Complete Guide

This guide helps you to get started using CDC with the PostgreSQL database system.

September 02

MySQL Change Data Capture (CDC): The Complete Guide

Everything you need to know to get started using change data capture with MySQL.

August 25

Why we run Data Pipelines as Containers

Five reasons why we deploy data pipelines as containers: Ease of integration, Security, Scalability, Immutability, and Robustness.

August 19

Overcoming the Hurdles in Data Democratization

Data expert Wouter Neef from Data Booster describes five common challenges in data democratization and how to overcome them.

August 16

How to use Change Data Capture with Web APIs

Improve the efficiency and freshness of your data processing by extracting change events from Web APIs instead of performing bulk loads.

August 04

Let's make event streaming a commodity

Build the rock-solid foundation for your next-generation real-time business intelligence.

June 24

Troubleshooting The Performance of Streaming Data Pipelines

Get to know two essential performance indicators: pipeline lag and sink connector lag.

April 08

Introducing Projects to DataCater

DataCater introduces projects as a collaborative means for data teams to prepare and integrate data collectively.

March 02

Meetup: Say Goodbye To Serving Outdated Content

Learn how to keep data and content up to date without manual effort and completely automate your content production.

February 20

Batch vs. Streaming Data Pipelines

A comparison between event-based streaming data pipelines and their batch-based counterparts.

August 11

Under the Hood of DataCater

An introduction to the building blocks that make up DataCater, the platform for continuous data preparation.

July 10

Everything you need to know about Change Data Capture

Learn how to turn data stores into streams of change events.

June 22