Use change data capture to stream data from Google Cloud BigQuery tables to any data sink and transform them on the way.
Please make sure that you have created a service account in Google Cloud, which is assigned to the primitive IAM role BigQuery Data Viewer on the level of the dataset and the primitive IAM roles BigQuery Job User and BigQuery Resource Viewer on the level of the project.
This source connector supports the following configuration options:
The content of the JSON-based credentials file provided by Google Cloud for the service account. The service account must have been assigned to the primitive IAM roles BigQuery Data Viewer (dataset level), BigQuery Job User (project level), and BigQuery ResourceViewer (project level).
The e-mail address of the service account. We try to automatically extract the e-mail address from the provided Google Cloud credentials.
The name of the BigQuery project. We try to automatically extract the name of the BigQuery project from the provided Google Cloud credentials.
The name of the BigQuery dataset.
The name of the BigQuery table (or view). You may retrieve the list of tables (and views) available in the given BigQuery project and dataset by clicking on Fetch table names.
You may choose one of the following modes for change data capture:
BigQuery does not natively support the concept of primary keys. Specifying a column, which can be used for uniquely identifying a row in BigQuery, allows DataCater to detect new records. Please make sure that this colum is never NULL.
DataCater can use a timestamp column, which stores the time of the most recent update of a record, to detect record updates. Specifying the timestamp column is required when using TIMESTAMP or TIMESTAMP/INCREMENTING as Change Data Capture mode.
The interval in milliseconds between synchronizations of the BigQuery table and DataCater (default: 60000).
The following table shows the mapping between BigQuery data types and the data types used by DataCater.
BigQuery data type | DataCater data type |
---|---|
ARRAY | String |
BOOLEAN | Boolean |
BYTES | String |
DATE | Date |
DATETIME | Timestamp |
FLOAT | Double |
INTEGER | Long |
NUMERIC | Double |
STRING | String |
STRUCT | String |
TIME | Time |
TIMESTAMP | Timestamp |
DataCater extracts the BigQuery data types ARRAY and STRUCT as JSON-formatted strings.
For licensing reasons, we are not allowed to ship the official BigQuery JAR file, which is why we need to ask you to manually download the JAR file and mount it into the folder /kafka/connect/kafka-connect-jdbc of the Kafka Connect containers - Thanks for your understanding.