Use change data capture to sync flat files from FTP/SFTP servers to any data sink and transform them on the way.
At startup, the connector extracts data from all (matching) files from the given directory. After this initial sync, it watches the directory for new or updated files and syncs only relevant changes.
This source connector supports the following configuration options:
Choose between FTP and SFTP.
The hostname or IP address of the FTP/SFTP server.
The port of the FTP/SFTP server. By default, FTP uses port 21 and SFTP uses port 22.
Only available for the protocol SFTP. Choose between a password-based and a key-based authentication.
The username to use for authenticating with the FTP/SFTP server.
Only available for the protocol FTP. The password to use for authenticating with the FTP server.
Only available for the protocol SFTP. The SSH key to use for authenticating with the SFTP server.
The SSH key needs to be provided in the RSA format. OpenSSH keys need to be first converted to RSA before providing them to DataCater.
You can choose between two approaches to defining at which times DataCater should extract data from the FTP/SFTP server:
Depending on the option Sync mode, you can either specify the number of seconds or the CRON expression. By default, DataCater extracts data every hour, i.e., the default values are 3600 (seconds) and 0 */1 * * ? (CRON expression).
The directory on the FTP/SFTP server, from which DataCater should extract files.
Regular expression applied to files from the working directory. Only files with a name matching the regular expression will be extracted. Default value: .* (matches all file names).
The format of the extracted files. Choose between CSV and XML.
Only available for the file type CSV. The character that delimits different columns (default: ,).
Only available for the file type CSV. Whether to use the first row of the CSV file for extracting attribute names or not. If this option is set to false, DataCater will generate attribute names based on the index of the attribute, and name them column_1, column_2, etc.
Only available for the file type XML. The XPath to the node holding the record nodes. (default: /*).
Name of the attribute that can act as a primary key. Please make sure that this column is never NULL.
DataCater imports all columns of a CSV or XML file as attributes of type string.
DataCater automatically extends the set of attributes with the attribute __datacater_file_name and fills it with the name of the file.