Azure Blob Storage

Use change data capture to stream files from Azure Blob Storage containers to any data sink and transform them on the way.


Change Data Capture

At startup, the connector extracts data from all (matching) files from the given container. After this initial sync, it watches the container for new files and syncs only relevant changes.


Configuration

This source connector supports the following configuration options:

Container name

Name of the Blob Storage container.

Storage account name

Name of the Blob Storage account.

SAS Token

The shared access signature (SAS) token to be used for authenticating with Azure Blob Storage.

File name filter

Regular expression applied to files from the Blob Storage container. Only files with a name matching the regular expression will be extracted. Default value: .* (matches all file names).

File format

The format of the extracted files. At the moment, this connector only supports CSV files.

CSV delimiter value

Only available for the file type CSV. The character that delimits different columns (default: ,).

CSV quote character

Only available for the file format CSV. Character used for quotes (default: ").

CSV quote escape character

Only available for the file format CSV. Character used for escaping quotes (default: ").

CSV line separator

Only available for the file format CSV. String used for separating multiple lines (default: \n).

CSV comment character

Only available for the file format CSV. Character used for comments. It must appear at the beginning of a line (default: #).

Generate attribute names from CSV header row

Only available for the file type CSV. Whether to use the first row of the CSV file for extracting attribute names or not. If this option is set to false, DataCater will generate attribute names based on the index of the attribute, and name them column_1, column_2, etc.

Primary key column

Name of the attribute that uniquely identifies records, similar to a primary key in a database system.

Sync interval (s)

The interval in seconds between the synchronization of the Blob Storage container and DataCater (default: 120). When synchronizing, DataCater consumes only those files from the Blob Storage container, which have not yet been processed by the pipeline, allowing to implement change data capture to some degree.


Data Types

DataCater imports all columns of a CSV file as attributes of type String.

DataCater automatically extends the set of attributes with the attribute __datacater_file_name and fills it with the name of the file.