This article shows how to apply change data capture to unlock data silos of legacy applications, without changing their code.
The business of almost every company is, at least partially, powered by outdated software applications. Introducing new features to such legacy applications is a cumbersome and risky operation, which is why most engineering departments prefer to never change a running (legacy) system.
In this article, we would like to show you how to unlock data silos of legacy applications without changing their code. We will use change data capture (CDC) to extend an existing application with an audit log, which in turn enables us to track all changes of the application data over time.
Let’s first define the context. Our agency dkd Internet Service GmbH exists for more than twenty years. Shortly after founding the company, we realized that we needed an application for managing projects, tracking time, and other daily tasks. At the time, there wasn’t the incredible range of productivity tools available as today, which is why we developed our own agency management software.
Despite being an internal software product without any external customer, our application still had to be maintained and developed. New features are comparatively expensive to implement since we were the only user (or customer) of the application. Since our agency software is getting on in years, we don’t want to add new features any longer. However, replacing this immensely important software is not a trivial task.
We often missed a versioning feature for data records and thus the logging of all activities within the software. This is often referred to as an audit log. At any point in time, we want to be able to understand how our data has changed over time. Did Freya work 30 or 40 hours a week in January?
We have been using a version control system for software development for a long time. Is there perhaps something comparable for data?
The idea was in the room to expand the code of our almost twenty-year-old software with the new functionality, the audit log. That would probably not only take a lot of time but also introduce possible new sources of error into the software. Fortunately, when we were developing our agency software, we had already decided to store the application data in a database system, MySQL. What if the software could be expanded to include versioning of the data without applying any changes to the software? Could we perhaps start directly with the MySQL database used? We, therefore, started to look for an external approach to implement the required feature.
In our research, we stumbled upon Change Data Capture (CDC). CDC enables us to capture all changes that are applied to a database and further process them. While CDC connectors are typically complicated to set up, by using the data pipeline platform DataCater, CDC can be used with little effort in a plug-and-play manner. DataCater provides CDC connectors for most data systems, including MySQL, and enables users to create connectors and build data pipelines within a few minutes. DataCater operates data pipelines in a streaming mode: Data from our agency software (or the underlying MySQL database) are not only available on a fine granularity but can also be processed in real-time. For example, we may anonymize data on the fly before loading them into a downstream data storage.
In our setup, we store the audit log in a separate MySQL database. We used the schema from the main database but extended it with a timestamp column holding the time of the change event.
Based on this setup, we now have access to all changes that have been made to individual data records over time.
DataCater enabled us to quickly and easily extend our legacy application agency software with an audit log. CDC enabled us to retrofit change tracking in our agency software without making any direct adjustments to the software code or spending a lot of money.
Since the versioning data records generated are all stored in MySQL, they can be accessed directly via SQL. However, this requires knowledge of SQL and is therefore reserved for experienced developers. Since we primarily use the programming language R for data analysis at dkd, we developed a custom R package to access the captured changes. The package can instantly return the requested information from all captured changes. Questions such as To which team did Otto belong in January? can be answered via a simple query.
The solution described in this article enabled us to add an important feature to a significantly outdated application with little investment of time and money - and without any risk of impairing the application itself. The results surprised us positively, which is why we highly recommend considering CDC for implementing audit logs or similar use cases.