Getting started

Data Pipelines

Source connectors
- AWS S3
- Azure Blob Storage
- Azure SQL
- CSV
- FTP/SFTP
- Google Cloud BigQuery
- Google Cloud Storage
- Google Drive
- JSON
- MySQL
- PostgreSQL
- REST/HTTP
- RSS
- Typeform
- XML
Sink connectors
- Amazon Redshift
- Apache Solr
- AX Semantics
- CSV
- Elasticsearch
- Google Cloud BigQuery
- HubSpot
- JSON
- MySQL
- PostgreSQL
- REST/HTTP
- Snowflake
- XML
Data types
Filters
Transformations
- No-Code Transformations
- Code Transformations
Joins
Deployments

FTP/SFTP

Use change data capture to sync flat files from FTP/SFTP servers to any data sink and transform them on the way.

Change Data Capture

At startup, the connector extracts data from all (matching) files from the given directory. After this initial sync, it watches the directory for new or updated files and syncs only relevant changes.

Configuration

This source connector supports the following configuration options:

Protocol

Choose between FTP and SFTP.

Hostname or IP

The hostname or IP address of the FTP/SFTP server.

Port

The port of the FTP/SFTP server. By default, FTP uses port 21 and SFTP uses port 22.

Authentication method

Only available for the protocol SFTP. Choose between a password-based and a key-based authentication.

Username

The username to use for authenticating with the FTP/SFTP server.

Password

Only available for the protocol FTP. The password to use for authenticating with the FTP server.

Private SSH key

Only available for the protocol SFTP. The SSH key to use for authenticating with the SFTP server.

The SSH key needs to be provided in the RSA format. OpenSSH keys need to be first converted to RSA before providing them to DataCater.

Sync mode

You can choose between two approaches to defining at which times DataCater should extract data from the FTP/SFTP server:

Sync in fixed second intervals: DataCater extracts data every X seconds (X is defined using the configuration option Sync interval).
CRON expression: DataCater extracts data at the times given by the CRON expression specified in the configuration option Sync interval.

Sync interval

Depending on the option Sync mode, you can either specify the number of seconds or the CRON expression. By default, DataCater extracts data every hour, i.e., the default values are 3600 (seconds) and 0 */1 * * ? (CRON expression).

Working directory

The directory on the FTP/SFTP server, from which DataCater should extract files.

File name filter

Regular expression applied to files from the working directory. Only files with a name matching the regular expression will be extracted. Default value: .* (matches all file names).

File type

The format of the extracted files. Choose between CSV and XML.

CSV delimiter value

Only available for the file type CSV. The character that delimits different columns (default: ,).

Generate attribute names from CSV header row

Only available for the file type CSV. Whether to use the first row of the CSV file for extracting attribute names or not. If this option is set to false, DataCater will generate attribute names based on the index of the attribute, and name them column_1, column_2, etc.

XPath root node

Only available for the file type XML. The XPath to the node holding the record nodes. (default: /*).

Name of the attribute holding the record key

Name of the attribute that can act as a primary key. Please make sure that this column is never NULL.

Data Types

DataCater imports all columns of a CSV or XML file as attributes of type string.

DataCater automatically extends the set of attributes with the attribute __datacater_file_name and fills it with the name of the file.