How to extract data change events from HubSpot's CRM API

Learn how to apply change data capture for extracting data changes from HubSpot.

...
By Stefan Sprenger

HubSpot offers a suite of software products for inbound marketing, sales, and customer service. For instance, one of their products, HubSpot CRM, can be used for managing all kinds of relationships with customers, like contact people, companies, deals, or offers. While HubSpot offers powerful ways for managing the valuable data related to the business, the products, and the customers of a company, it lacks advanced support for analytics.

In this article, we show how to use DataCater’s REST/HTTP source connector for extracting data change events from your HubSpot CRM account. Using a pipeline, we could push the extracted events to data sinks, such as Google Cloud BigQuery, Snowflake, or another REST/HTTP API.

Authentication

The HubSpot API provides multiple means for authentication, like OAuth2 or API keys, which are all supported by our REST/HTTP source connector. For ease of configuration, we use API keys in this article.

Understanding HubSpot's search endpoint

For all CRM objects, HubSpot provides a search endpoint in their API that can be used to retrieve, sort, and filter records. For instance, the following POST request retrieves companies sorted by the field hs_lastmodifieddate in ascending order:

curl \
  -XPOST \
  -H'Content-Type:application/json' \
  -d'{"sorts": [{"propertyName": "hs_lastmodifieddate", "direction": "ASCENDING"}]}' \
  https://api.hubapi.com/crm/v3/objects/companies/search/?hapikey=$YOUR_API_KEY

By default, the search endpoint returns 10 records. We can bump this number up to 100 (HubSpot doesn’t allow more) by setting the parameter limit:

curl \
  -XPOST \
  -H'Content-Type:application/json' \
  -d'{"limit": 100, "sorts": [{"propertyName": "hs_lastmodifieddate", "direction": "ASCENDING"}]}' \
  https://api.hubapi.com/crm/v3/objects/companies/search/?hapikey=$YOUR_API_KEY

The option filterGroups allows us to select only records matching specific filters (or requirements). For instance, we might use it to fetch companies that have been modified in HubSpot (attribute hs_lastmodifieddate) after 2022/01/01 00:00:00 (UNIX timestamp 1640991600):

curl \
  -XPOST \
  -H'Content-Type:application/json' \
  -d'{"filterGroups":[{"filters":[{"propertyName": "hs_lastmodifieddate","operator": "GT","value":1640991600}]}], "limit": 100, "sorts": [{"propertyName": "hs_lastmodifieddate", "direction": "ASCENDING"}]}' \
  https://api.hubapi.com/crm/v3/objects/companies/search/?hapikey=$YOUR_API_KEY

By combining these different search options, we can (1) efficiently walk over all records stored in HubSpot and (2) implement change data capture by monitoring the HubSpot API for changes.

All we need to do is to recurringly execute an API request, which sorts companies by their change time (hs_lastmodifieddate), and update the value in the filter on the attribute hs_lastmodifieddate with each request such that a request searches for all companies, which have a value in hs_lastmodifieddate that is greater than the greatest value of the prior request.

Sounds complicated to set up? Try out our REST/HTTP connector.

Setting up DataCater's REST connector

Please create a new REST source connector and fill out the fields URI, HTTP method, HTTP request headers, and HTTP request body, accordingly to the curl commands shown in the prior section:

Configuration of the REST/HTTP source connector for HubSpot.
General configuration of the REST/HTTP source connector for HubSpot.

Note that we use the variable ${offset.timestamp?datetime.iso?long} in the HTTP request body. DataCater’s REST connector automatically fills it with the largest timestamp value it has processed. We also need to fill the fields JSON Pointer to records list with /results and Name of the attribute holding the record key with id to (1) point the connector to the JSON array in the API response that holds the records and (2) let the connector know which attribute holds unique record identifiers.

To finish the connector configuration, please navigate to the Change Data Capture tab and choose updatedAt as the Name of the attribute holding the timestamp of the record’s most recent update:

CDC configuration of the REST/HTTP source connector for HubSpot.
Change data capture-related configuration of the REST/HTTP source connector for HubSpot.

That’s it! You can save the data source, create a new pipeline, and start consuming change events from HubSpot in (near) real-time.

Summary

Using HubSpot’s search endpoint, we can perform powerful search queries that sort and filter CRM records. By combining the sorts and filterGroups options, we can walk over all records stored in HubSpot and monitor HubSpot for changes.

We used DataCater’s REST/HTTP source connector to implement change data capture with HubSpot without having to track the state of the data extraction, i.e., the timestamps, ourselves.