Using PostgreSQL RLS with Hibernate Reactive

Table of Content

Introduction
Implementing Multitenancy with PostgreSQL
Challenges When Using RLS with Hibernate Reactive
Implementing RLS in Quarkus with Hibernate Reactive
- Initialize RLS in PostgreSQL
- Modify Hibernate Reactive Calls to the Database
Summary

Introduction

In the context of Software-as-a-Service (SaaS) applications, multitenancy is a concept used to separate the data of different customers, also referred to as tenants. Multitenancy improves security by restricting the access of users to their own data. In the past, multitenancy was implemented mostly by either using a dedicated database instance per tenant or by controlling the data access within the application itself. In 2016, PostgreSQL introduced row-level security (RLS) as a core feature for natively implementing multitenancy by restricting data access based on the current user’s authorization. In this article, we demonstrate how to use RLS with Quarkus and Hibernate Reactive for implementing efficient and secure multitenant applications.

Implementing Multitenancy with PostgreSQL

One Database per Tenant

The naive approach to implementing multitenancy is scaling the number of databases up based on tenants served. All data for a particular tenant are stored in a separate and dedicated database to isolate them from other customers’ data. One of the advantages of the one-database-per-tenant approach is its flexibility: Each customer database can be scaled separately based on its individual needs. The flexibility comes at a high price: This approach does not share any resources, is very expensive to operate, and is the most complex to manage. Applications need to configure a unique connection pool for each customer (database), which causes extra overhead and complexity. Another drawback that comes with this approach is that using techniques like change data capture (CDC) can require more work than when storing all tenants in the same database.

One Schema per Tenant

Following the one-schema-per-tenant approach, a dedicated schema is created for each tenant. Compared to the one-database-per-tenant approach, this mode of operation offers less security since all of your data sit inside one database instance. Also, managing schema changes can become complicated since changes must be maintained consistently across all schemas. Both methods result in a large number of database objects, which can be more complex to manage. Implementing CDC can also require more work than when storing all tenants in the same schema.

Handling Multitenancy Inside the Application with a WHERE Clause Discriminator

Another way to implement multitenancy is by handling the logic inside the application and always querying the database using a WHERE clause discriminator. This method does not isolate the data inside the database but lets the application filter the data that the respective user is allowed to access. While implementing multitenancy inside the application is fairly simple, it comes with certain limitations. Compared to the two previous approaches, this method has fewer security measures and is prone to developer-induced issues. If a database query without a correctly-configured WHERE clause slips through a code review, it could result in the unauthorized access of sensitive user data. To mitigate this risk, a global wrapper could be used within the application to automatically add the WHERE clause to all database queries. Another security issue that could be encountered is that attackers get access to the entire database once the database credentials are breached whereas the other approaches allow the use of different database users for different tenants.

PostgreSQL Row-Level Security

PostgreSQL introduced its Row-Level Security (RLS) feature with the release of version 9.5. RLS allows to define policies that determine which users are allowed to view, update, and delete certain rows in a table based on which tenant they belong to. Policies can be defined for many different criteria, such as the user’s role, the user’s identifier, or the location from which the data is being queried.

RLS offers a few advantages compared to other methods outlined in this article:

Higher security: RLS offers high levels of security by ensuring only authorized users get access to specific rows.
Higher Compliance: Regulatory requirements can easily be met by enforcing access at a row level as opposed to enforcing access at a table level and more or less “filtering” the data the user shouldn’t have access to.
Flexibility: RLS offers higher flexibility with less overhead. You don’t need to manage multiple connection pools or implement multiple subscribers when using CDC and changes to the database don’t require making repetitive changes to multiple databases or schemas.
Maintenance Costs: RLS requires less maintenance after the implementation as it operates on a single data source. This, in turn, results in a lower cost of maintenance.

Challenges When Using RLS with Hibernate Reactive

At DataCater, we put a high value on being fast and responsive. Therefore, we try to adopt reactive practices wherever possible. We use Hibernate Reactive to ensure that our API calls are non-blocking for the best user experience possible. Unfortunately, at the time being, Hibernate Reactive does not support PostgreSQL RLS out of the box. This limitation is well known within the community, with some users avoiding Hibernate Reactive for certain use cases, or changing their data storage approach altogether. The following section describes how we overcame this hurdle and implemented multitenancy with PostgreSQL RLS and Hibernate Reactive using a custom solution. In a nutshell, we manually initialize RLS in PostgreSQL and inject the tenant information into all database queries executed by Hibernate Reactive.

Implementing RLS in Quarkus with Hibernate Reactive

Initialize RLS in PostgreSQL

By default, Hibernate generates the database schema as defined by the application at startup time. Unfortunately, there is no way to let Hibernate enable RLS in PostgreSQL. To address this issue, we switched from handling the database schema with Hibernate to using a custom SQL script. The SQL script enables RLS in PostgreSQL, defines a tenant discriminator, and populates the database schema. If you already have a PostgreSQL instance running you would need to manually enable RLS and add a discriminator column (called tenant_id in the subsequent examples) to your existing tables.

In the following example, we create a PostgreSQL table (here called deployments) with the columns we need for the application and the additional column, tenant_id, which will be used by RLS as the discriminator. We set the default value of the column tenant_id to the value contained in the PostgreSQL custom setting datacater.tenant. This is the setting our application will set on calls to the database, as explained in the Modify Hibernate Reactive Calls to the Database section.

CREATE TABLE deployments (
	id          UUID NOT NULL PRIMARY KEY,
	createdat   TIMESTAMP,
	name        VARCHAR(255),
	spec        JSONB,
	updatedat   TIMESTAMP,
	tenant_id   TEXT DEFAULT current_setting('datacater.tenant')::text NOT NULL
);

The PostgreSQL instance being used must have RLS activated for the tables it is to be used on. Here is an example showing how to enable RLS and create a policy that defines the access restriction strategy:

ALTER TABLE deployments ENABLE ROW LEVEL SECURITY;

CREATE POLICY rls_policy ON deployments
  USING (tenant_id = current_setting('datacater.tenant'::text));

Modify Hibernate Reactive Calls to the Database

To utilize PostgreSQL RLS with Hibernate Reactive and the Quarkus framework, it is necessary to manually add the tenant information to each database call. We developed a wrapper around the Mutiny.SessionFactory to globally configure the process. Our wrapper overrides Hibernate’s withSession() and withTransaction() methods to initially apply tenant information to the session and transaction queries, and then apply the defined work to the same worker thread. This provides a seamless developer experience and allows these methods to be used in the same manner as the original Hibernate functions.

public <T> Uni<T> withTransaction(BiFunction<Session, Transaction, Uni<T>> work) {
  String nativeQuery = String.format("SET datacater.tenant = %s", getPrincipal());

  return sf.withTransaction(
    (session, transaction) -> {
      var update = session.createNativeQuery(nativeQuery).executeUpdate();
      return update.onItem().transformToUni(ignore -> work.apply(session, transaction));
    }
  );
}

The SecurityIdentity containing the tenant data for the active user is injected into the class, allowing us to extract a unique name for the use as a tenant ID.

@Inject SecurityIdentity si;

private String getPrincipal() {
  String principalName = si.getPrincipal().getName();

  if (principalName == null || principalName.isEmpty()) {
    throw new UnauthorizedException();
  }

  LOGGER.info(String.format("Detected principal with name := %s.", principalName));

  return principalName;
}

Summary

Multitenancy is a crucial concept utilized by SaaS providers to manage customer data securely. The decision to choose the appropriate architecture primarily depends on the expected resource consumption of the software and the number of customers to be served. Although PostgreSQL RLS is an efficient way to secure data access, it comes with some drawbacks, such as Hibernate Reactive’s lack of support, which requires a custom solution. Overall, the proper implementation of multitenancy using PostgreSQL can enhance the security and scalability of SaaS applications, thereby improving the user experience.