DataCater supports comma-separated values (CSV) files as a data source. Users can upload CSV files using the UI of DataCater. DataCater takes care of parsing the CSV files and publishing the extracted records to a data pipeline.
Once a data pipeline has been built for a certain CSV file, users can upload additional files of the same structure, i.e, files with the same number of values (or columns), which are then instantly processed by the pipeline.
When uploading a CSV file to create a new data source, DataCater tries to automatically detect parser settings, like the used delimiter character. If needed, the configuration used for parsing the CSV file can be adjusted (see below).
This source connector supports the following configuration options:
The delimiter character that is used to separate different values (or columns) from each other (default: ,).
The delimiter characters that are used to separate different rows (or record) from each other (default: \r\n).
If the first row of the uploaded CSV files holds the names of the different attributes (or columns), we may skip it for reading in data (default: yes).
If the data part starts after multiple rows, we may skip the first n rows for reading in data (default: 0). This is often the case for CSV files that were generated by a spreadsheet application.
The character used to escape special characters in the CSV file (leave empty, if none is used, which is the default).
DataCater imports all columns of a CSV file as attributes of type string.
DataCater automatically extends the set of attributes with the attribute __datacater_file_name and fills it with the name of the uploaded CSV file.