XML Files

DataCater allows users to aggregate all data processed by a data pipeline in a XML file, which can be downloaded using the web UI.


Structure

The generated XML file is structured as follows:

  • The root node of the XML file is called records. It has two node attributes: count (holding the number of records) and type (always set to array).
  • For each record processed by the pipeline, the root node has a child node called record.
  • For all attributes of the pipeline, record nodes have child nodes called attribute. attribute nodes hold the value of the according attribute in their inner text. In addition, they have the following node attributes: isKey (true if the attribute is a primary key, false if not), name (name of the attribute), and type (data type of the attribute).
  • Attributes holding arrays (or lists) of values contain a set of item nodes. An item node stores its data type in the node attribute type and holds its value in the inner text.

Please have a look at the following exemplary XML file created by a pipeline with three attributes: id (type: Integer, primary key), name (type: String), and categories (type: List[String]):

<?xml version="1.0" encoding="UTF-8"?>
<records count="3" type="array">
  <record>
    <attribute name="id" isKey="true" type="int">1</attribute>
    <attribute name="name" isKey="false" type="string">Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems</attribute>
    <attribute name="categories" isKey="false" type="array_string">
      <item type="string">computer science</item>
      <item type="string">data</item>
    </attribute>
  </record>
  <record>
    <attribute name="id" isKey="true" type="int">2</attribute>
    <attribute name="name" isKey="false" type="string">Functional Programming in Scala</attribute>
    <attribute name="categories" isKey="false" type="array_string">
      <item type="string">computer science</item>
      <item type="string">programming</item>
    </attribute>
  </record>
  <record>
    <attribute name="id" isKey="true" type="int">3</attribute>
    <attribute name="name" isKey="false" type="string">Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems</attribute>
    <attribute name="categories" isKey="false" type="array_string">
      <item type="string">computer science</item>
      <item type="string">data</item>
      <item type="string">machine learning</item>
    </attribute>
  </record>
</records>
Jump to