The source connector for PostgreSQL can extract data change events from a PostgreSQL database system in real-time and stream them to the DataCater platform.
The PostgreSQL source connector is available in two variants:
Logical replication is available since PostgreSQL version 9.4.
PostgreSQL implements logical replication on its own without requiring the installation of plugins since version 10. Prior PostgreSQL versions require the installation of a plugin, either decoderbufs or wal2json, to support logical replication.
Create a new PostgreSQL user with the required permissions:
psql> CREATE ROLE [username] SUPERUSER LOGIN PASSWORD '[password]';
Superusers of PostgreSQL have the required permissions assigned by default.
Add the following lines to the config file of your PostgreSQL installation, typically called postgresql.conf, to enable Logical Replication:
wal_level = logical max_wal_senders = 1 # set max_replication_slots to the estimated number of pipelines consuming this data source max_replication_slots = 5
Allow the DataCater platform to connect to your PostgreSQL installation by adding an entry to the PostgreSQL configuration file pg_hba.conf.
Create a new PostgreSQL user with read-only permissions.
Allow the DataCater platform to connect to your PostgreSQL installation by adding an entry to the PostgreSQL configuration file pg_hba.conf.
This source connector supports the following configuration options:
The hostname or IP address of the machine where the PostgreSQL database system is running.
The port under which PostgreSQL is available (default: 5432).
Whether to use SSL when connecting to the PostgreSQL database or not.
The username used for authenticating with PostgreSQL.
The password of the user used for authenticating with PostgreSQL.
The name of the database that should be used as a source.
The name of the database schema that should be used as a source (default: public).
The name of the database table that should be used as a source. You may retrieve the list of tables available in the given PostgreSQL database by clicking on Fetch table names.
Whether to automatically profile the primary key column using the PostgreSQL system tables pg_attribute and pg_index or not.
Only available if the automated detection of the primary key column is disabled. Name of the attribute that uniquely identifies records. DataCater uses the primary key attribute to detect new records. Please make sure that the column does not hold NULL values.
You may choose between the Logical Replication-based and JDBC-based connector variant.
Only available for the Logical Replication-based connector.
The logical replication plugin to be used to consume change events from PostgreSQL (default: pgoutput). </p>If you are using PostgreSQL 10 or newer, please use pgoutput.
If you are using PostgreSQL 9.4 or newer, please use decoderbufs, wal2json, or wal2json Streaming.
If you are using Amazon RDS for PostgreSQL, please use wal2json for Amazon RDS or wal2json Streaming for Amazon RDS.
Only available for the JDBC-based connector.
You may choose one of the following modes for change data capture:
Only available for the JDBC-based connector.
DataCater can use a timestamp column, which stores the time of the most recent update of a record, to detect record updates. Specifying the timestamp column is required when using TIMESTAMP/INCREMENTING as Change Data Capture mode.
Only available for the JDBC-based connector.
The interval in seconds between synchronizations of the PostgreSQL table and DataCater (default: 60).
Only available for the Logical replication-based connector.
Overwrites the SELECT statement used for the initial snapshot (default: SELECT * FROM schema_name.table_name;
).
List of columns that shall not be considered for the data extraction (default: empty).
The following table shows the mapping between PostgreSQL data types and the data types used by DataCater.
PostgreSQL data type | DataCater data type |
---|---|
BIGINT | Long |
BIGSERIAL | Long |
BIT | Boolean |
BIT VARYING | String |
BOOLEAN | Boolean |
BOX | String |
BYTEA | String |
CHAR | String |
CHARACTER VARYING | String |
CHARACTER | String |
CIDR | String |
CIRCLE | String |
CITEXT | String |
DATE | Date |
DATERANGE | String |
DECIMAL | Double |
DOUBLE PRECISION | Double |
ENUM | String |
INET | String |
INT4RANGE | String |
INT8RANGE | String |
INTEGER | Int |
INTERVAL | String |
JSON | String |
JSONB | String |
LINE | String |
LSEG | String |
LTREE | String |
MACADDR | String |
MONEY | Double |
NUMERIC | Double |
NUMRANGE | String |
PATH | String |
POINT | String |
POLYGON | String |
REAL | Float |
SMALLINT | Int |
SMALLSERIAL | Int |
SERIAL | Int |
TEXT | String |
TIME | Time |
TIMETZ | Time |
TIME WITH TIME ZONE | Time |
TIMESTAMP | Timestamp |
TIMESTAMP WITHOUT TIME ZONE | Timestamp |
TIMESTAMPTZ | Timestamp |
TIMESTAMP WITH TIME ZONE | Timestamp |
TSQUERY | String |
TSRANGE | String |
TSTZRANGE | String |
TSVECTOR | String |
TXID_SNAPSHOT | String |
UUID | String |
VARCHAR | String |
XML | String |