Data Pipelines in Content Automation

Data pipelines connect data systems and automate data preparation. They extract data from data sources, transform the data, and load the transformed data into data sinks.

Modern streaming data pipelines completely automate this process: They can extract changes from data sources in real-time, keeping data sinks always up-to-date, whereas traditional implementations extract data every few days to weeks. While the most prominent use cases of data pipelines are analytics, data science, or traditional ETL processes, they are also very useful for other scenarios, like content automation.

In this article, we interview Jan Kaiser, managing director of our integration partner Xanevo, on the usage of data pipelines in content automation.

Q1: What is content automation and how does it benefit from data pipelines?

Jan Kaiser: Content automation uses artificial intelligence (AI) to turn data into language. Example use cases are automatically generating SEO-optimized product descriptions in e-commerce, or automating the writing of news reports. Today, there exist many established products, which offer content automation, e.g., AX Semantics.

The task of automatically generating content heavily depends on structured and available data. Let’s say we want to create product descriptions for women’s dresses and our product data is accessible through a PIM (Product Information Management) system.

Assuming we have a perfectly groomed database, we still need to transfer our product data from one system to another in order to make it available for automated content writing using NLG (Natural Language Generation).

One way to do that would be manual data exports and imports. Or perhaps a proprietary script. Either way: these are tedious processes that do not scale and are hard to maintain. Instead, we want to rely on data pipelines to transfer data from source systems to our NLG engine.

Q2: Data pipelines cannot only be used for automating the transfer of data but also the preparation of data. Why is data preparation important for content automation?

Jan Kaiser: Great question. Let me tell you a few possible scenarios that companies out there might be struggling with as well.

If present data is:

not structured well enough,
not in a format that we can process in a meaningful way,
or not standardized nor cleansed,

we are required to prepare our data and apply transformations before transferring it to another system.

With the help of data pipelines we can apply those transformations “along the way”. That’s a huge advantage because we don’t need to keep separate copies of our source data.

Q3: Can you name the most common data preparation tasks in content automation?

Jan Kaiser: Let me keep this short as many readers probably can relate in several ways:

Normalization of data attributes
Dealing with missing values or different formats for empty values
Dropping attributes and renaming attributes with technical abbreviations
Reformatting or reordering values
Aggregation of several attributes or splitting an attribute into new ones

I could probably go on but you get the gist: There is a lot to deal with before automating content or using data for any other use case ‒ let’s say we want to train machine learning models.

Q4: Most use cases apply data pipelines for sending data from data sources, such as a PIM system, to content automation tools, like AX Semantics. Can they also be useful for sending generated texts to the place of usage, e.g., an online shop or content management system?

Jan Kaiser: Actually, yes. Most commonly our clients want to have their content delivered into their PIM because they already have a workflow for updating their online shop.

Some clients don’t have these workflows in place and need aid in sending data to their CMS. For instance, we have implemented that for delivering generated product descriptions, generated meta descriptions and titles into a Shopify CMS.

Another use case that might be even more interesting is the delivery of product data to sales partners, assuming that the client is a supplier (with or without their own ecommerce store). Let’s say you have six sales partners that list your dresses in their online shop as well and you provide them with the whole digital package (product attributes, pictures, descriptions, titles).

You don’t want them to have identical product descriptions in all of their stores, do you? At least not from a SEO perspective. Using DataCater in combination with AX Semantics, we can create six different texts for each article and deliver six different formats into six different data sinks: the sales partners PIMs.

Q5: How can users of content automation best start with applying data pipelines to their use cases?

Jan Kaiser: Great question, let’s give a few insights on how to approach the practical part.

First of all I would like to put some emphasis on the “When” instead of the “How”. One should ask themselves how much their time is worth since the cost for data pipelines is fairly manageable. If valuable resources can be saved by using pipelines, do it.

Regarding the “How” I strongly recommend simply getting started. DataCater offers a very user-friendly solution. Add a pipeline, a data source & a data sink and get started immediately. Check out the 14-day free trial. As long as you were traditionally working with bulk exports and imports you should be able to replicate that process in an automated way.

Of course it is always an option to approach me as well since Xanevo provides services in content automation and data pipeline setups.

Q6: How does Xanevo help content automation users in adopting data pipelines?

Jan Kaiser: From a technical perspective there are four steps in adopting data pipelines:

Setting up the infrastructure
Connecting the data source
Connecting the data sink
Applying data transformations within a pipeline

As for the first part there is always a quick start option by using DataCater Cloud. If you are looking to use DataCater within your own infrastructure, I am sure that your team will gladly assist.

Xanevo helps from step two to four in order to connect your systems and automate your workflows. Meaning that you only need to take care of the initial setup and licensing, everything else will be taken care of by my team.

Get started with our 14-day free trial

Risk-free exploration of the DataCater platform for streaming data pipelines - no credit card required.

Are you interested in learning more about content automation, data pipelines, or just want to chat data? Please feel free to reach out to us using the following form. We’d be happy to get in touch with you!

Contact us

Name

What can we help you with?

By clicking "Send request" you agree with the processing of your data according to the privacy policy.

Data Pipelines in Content Automation

By Stefan Sprenger

Contact us