JSON Files

DataCater supports JavaScript Object Notation (JSON) files as a data source. Users can upload JSON files using the UI of DataCater. DataCater takes care of parsing the JSON files and publishing the extracted records to a data pipeline.

When parsing a JSON file, DataCater tries to extract an array of JSON objects. Each JSON object is published as a record to the consuming data pipelines. By default, DataCater assumes that the array is located at the root level of the JSON file. If the array is not located at the root level of the JSON file, you can provide a pointer to the location of the array within the hierarchy of the JSON file (please see below).

Once a data pipeline has been built for a certain JSON file, users can upload additional files with the same structure and attributes.


Requirements

DataCater extracts records from a JSON file by parsing an array of JSON objects. DataCater assumes that each entry of the array represents one record. DataCater automatically generates a data source schema based on the keys of the JSON objects using a predefined mapping of JSON data types to DataCater data types.

While DataCater can tolerate if not all records have defined all keys (or attributes), DataCater cannot generate a valid schema if different records use different data types for the same key (or attribute) and prevents importing the JSON file.

The following listing shows a valid JSON structure, which will cause DataCater to generate a schema with three attributes (name: String, age: Long, email: String):

[
  {
    name:  'Pete',
    age:   25
  },
  {
    name:  'Julia',
    age:   37,
    email: 'pete@datacater.io'
  }
]

The following listing shows an invalid JSON structure, where records use different data types for the key (or attribute) age, which cannot be used with DataCater:

[
  {
    name:  'Pete',
    age:   25
  },
  {
    name:  'Julia',
    age:   '37',
    email: 'pete@datacater.io'
  }
]

Configuration

This source connector supports the following configuration options:

Pointer to list in JSON file

A JSON pointer to the location in the hierarchy of the JSON file that holds the JSON array with the records.

The pointer starts with a slash (/) and contains the names of the attributes leading to the location of the array within the hierarchy, delimited by a slash (/).

By default, DataCater assumes that the JSON array is located at the root level of the JSON file, in which case you do not have to provide any pointer:

[
  {
    name: 'Pete',
    age:  25
  },
  {
    name: 'Julia',
    age:  37
  }
]

For the following nested JSON structure, you may use the pointer /records/hits to guide DataCater to the array holding the records:

{
  docs: 500,
  records: {
    hits: [
      {
        name: 'Pete',
        age:  25
      },
      {
        name: 'Julia',
        age:  37
      }
    ]
  }
}

Data Types

When importing JSON files, DataCater performs the following mapping between JSON data types and DataCater data types.

JSON data type DataCater data type
Array String
Boolean Boolean
Number Double, Float, Int, or Long
Object String
String String

DataCater automatically extends the set of attributes with the attribute __datacater_file_name and fills it with the name of the uploaded JSON file.