DataCater allows users to aggregate all data processed by a data pipeline in a XML file, which can be downloaded using the web UI.
The generated XML file is structured as follows:
Please have a look at the following exemplary XML file created by a pipeline with three attributes: id (type: Integer, primary key), name (type: String), and categories (type: List[String]):
<?xml version="1.0" encoding="UTF-8"?>
<records count="3" type="array">
<record>
<attribute name="id" isKey="true" type="int">1</attribute>
<attribute name="name" isKey="false" type="string">Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems</attribute>
<attribute name="categories" isKey="false" type="array_string">
<item type="string">computer science</item>
<item type="string">data</item>
</attribute>
</record>
<record>
<attribute name="id" isKey="true" type="int">2</attribute>
<attribute name="name" isKey="false" type="string">Functional Programming in Scala</attribute>
<attribute name="categories" isKey="false" type="array_string">
<item type="string">computer science</item>
<item type="string">programming</item>
</attribute>
</record>
<record>
<attribute name="id" isKey="true" type="int">3</attribute>
<attribute name="name" isKey="false" type="string">Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems</attribute>
<attribute name="categories" isKey="false" type="array_string">
<item type="string">computer science</item>
<item type="string">data</item>
<item type="string">machine learning</item>
</attribute>
</record>
</records>