REST/HTTP

DataCater can stream data from REST/HTTP endpoints to any data sink and transform them on the way.

At the moment, we support only REST/HTTP endpoints that return data in the JSON format.


Configuration

This source connector supports the following configuration options:

General > URI

The URI of the REST endpoint, including the HTTP scheme, i.e., http:// or https://.

General > HTTP method

HTTP method to use for the request: GET or POST.

General > HTTP query parameters

Http query parameters to use in the request, &-separated list of =-separated pairs.

General > HTTP request headers

Http headers to use in the request, ,-separated list of :-separated pairs.

General > HTTP request body

Http body to use in the request. Only available if HTTP method POST is used.

General > JSON Pointer to records list

By default, DataCater expects an array of JSON objects, each resembling a record, at the root level of the response of the REST endpoint. For all other cases, you may provide a JSON pointer pointing to the location of the array within a possibly deepled-nested JSON structure.

If the JSON Pointer points to a JSON object instead of a JSON array, DataCater will treat and extract the values of the JSON object as records.

General > JSON Pointer to record

By default, DataCater extracts the complete JSON objects of the records list as records. For all other cases, you may provide a JSON pointer pointing to the location of the record within the JSON objects.

General > Sync mode

You can choose between two approaches to defining at which times DataCater should extract data from the REST endpoint:

  • Sync in fixed millisecond intervals: DataCater extracts data from the REST endpoint every X milliseconds (X is defined using the configuration option Sync interval).
  • CRON expression: DataCater extracts data from the REST endpoint at the times given by the CRON expression specified in the configuration option Sync interval.
General > Sync interval

Depending on the option Sync mode, you can either specify the number of milliseconds or the CRON expression. By default, DataCater extracts data every two minutes, i.e., the default values are 120000 ( milliseconds ) and */2 * * * * ( CRON expression ).

General > Name of the attribute holding the primary key

Name of the attribute that uniquely identifies records. DataCater uses the primary key attribute to detect new records. Please make sure that the colum does not hold NULL values.

Authentication > HTTP Basic Authentication username

Username to use if the REST endpoint uses HTTP Basic authentication.

Authentication > HTTP Basic Authentication password

Password to use if the REST endpoint uses HTTP Basic authentication.

Authentication > Token exchange URI

If the REST endpoint is using time-based access tokens for authentication, you can here specify an URI, which is called for generating a new token when the REST endpoint returns HTTP status code 401.

Authentication > Token exchange HTTP method

HTTP method to use for requesting a new token.

Authentication > Token exchange HTTP request headers

HTTP request headers to use for requesting a new token.

Authentication > Token exchange HTTP request body

HTTP request body to use for requesting a new token. Only available if POST is used as Token exchange HTTP method.

Authentication > JSON Pointer to token

If the token endpoint does not directly return the token string but a JSON object, you may provide a JSON pointer to locate the token string in the JSON structure.

Change Data Capture > Name of the attribute holding the timestamp of the record's most recent update

Name of the attribute that holds the timestamp when the record has been updated the last time. DataCater uses the timestamp attribute to detect data changes.

Change Data Capture > Timestamp type

Formatted timestamp, epoch millisecond, or epoch second (default: formatted timestamp).

Change Data Capture > Format of timestamp values

The format used by the REST endpoint for timestamp values (default: yyyy-MM-DD'T'HH:mm:ss[.SSS]X). Only available when using formatted timestamp as timestamp type.

Change Data Capture > Initial timestamp offset

Timestamp value at which the connector should start syncing. Should be filled out, when making use of the ${offset.timestamp} variable in your request.

Advanced > Replace pattern in key values

Regular expression pattern in value of primary key column, which shall be replaced.

Advanced > Replace with value

Value to use for replacing pattern in value of primary key column.

Advanced > HTTP request timeout (s)

Timeout of HTTP requests, specified in seconds (default: 120).

Advanced > Treat empty strings as NULL values

Whether to automatically convert empty strings to NULL values, when extracting data (enabled by default).


Request Templating

In the configuration fields HTTP query parameters, HTTP request headers, and HTTP request body, you can make use of the Freemarker template engine to build dynamic requests, which is especially useful when using change data capture.

We provide the following variables for the templating:

  • ${offset.timestamp}: Freemarker datetime variable holding the change timestamp of the most recent record.
  • ${now}: Freemarker datetime variable holding the request time.
  • ${today}: Freemarker datetime variable holding the request time with hours, minutes, and seconds set to 0 (beginning of day).
  • ${yesterday}: Freemarker datetime variable holding the request time minus 24 hours with hours, minutes, and seconds set to 0 (beginning of day).
  • ${dayBeforeYesterday}: Freemarker datetime variable holding the request time minus 48 hours with hours, minutes, and seconds set to 0 (beginning of day).
  • ${tomorrow}: Freemarker datetime variable holding the request time plus 24 hours with hours, minutes, and seconds set to 0 (beginning of day).
  • ${dayAfterTomorrow}: Freemarker datetime variable holding the request time plus 48 hours with hours, minutes, and seconds set to 0 (beginning of day).
  • ${token}: The access token retrieved from the specified token endpoint.

Data Types

When extracting data in the JSON format from REST endpoints, DataCater performs the following mapping between JSON data types and DataCater data types.

JSON data type DataCater data type
Array String
Boolean Boolean
Number Double, Float, Int, or Long
Object String
String String