githubEdit

Multi-Tenant Analytics

End-to-end guide for serving multiple customers from a shared Pinot cluster with resource and data isolation.

This playbook covers serving analytics to many customers (tenants) from a single Pinot cluster. The pattern is common in B2B SaaS products where each customer gets their own dashboard but the infrastructure is shared for cost efficiency. It combines Pinot's tenant tagging, workload isolation, and application-level row filtering to deliver per-customer data isolation and fair resource allocation.

When to use this pattern

Use this playbook when:

  • You operate a SaaS product where each customer should only see their own data.

  • You want to run a single Pinot cluster (lower operational overhead) rather than one cluster per customer.

  • Different customers have different query volumes, and you need to prevent a heavy-hitter from degrading others.

  • You need to enforce data isolation at the query layer (row-level security) alongside resource isolation at the infrastructure layer.

Architecture sketch

Customer A ──▶ App backend ──▶ Broker pool A ──▶ Servers (tenant A)
Customer B ──▶ App backend ──▶ Broker pool B ──▶ Servers (shared)
Customer C ──▶ App backend ──▶ Broker pool B ──▶ Servers (shared)

          (injects tenant_id
           filter into every query)

There are two complementary isolation layers:

  1. Infrastructure isolation via Pinot tenants — assign servers and brokers to named tenants so resource-hungry customers get dedicated compute.

  2. Data isolation via application-level row filtering — the application layer injects a WHERE tenant_id = '<customer>' predicate into every query.

Data model

Store all tenants' data in one table with a tenantId dimension:

Table-per-tenant approach (for extreme isolation)

For customers with strict compliance requirements, create a separate table per tenant. This gives full isolation (separate segments, servers, retention) but increases operational complexity. Use this only when regulatory requirements demand it.

Table configuration

Why tenantId is the sorted column

When tenantId is the sorted column, all rows for a single tenant are physically adjacent in each segment. This means a WHERE tenantId = 'acme' filter skips entire data pages without scanning, giving near-instant segment pruning. If a different column is a better sort key for your queries, use an inverted index on tenantId instead.

Infrastructure isolation with Pinot tenants

Assigning servers to tenants

Tag servers when adding them to the cluster:

Then assign high-value customers' tables to the PremiumTenant:

Other customers share a SharedTenant pool. See Tenant for setup details.

Workload-based query isolation

For finer-grained control within a shared tenant, use workload-based query resource isolation to limit CPU and memory per workload class:

The application backend sets the workload class in the query option:

See Workload-Based Query Resource Isolation for configuration details.

Broker-level query quotas

Apply per-table query rate limits at the broker to prevent any single tenant from monopolizing query resources:

See Query Quotas.

Data isolation (row-level security)

Pinot does not have built-in row-level security. The standard approach is to enforce tenant filtering in the application layer that sits between the user and the Pinot broker.

Implementation pattern

Your application backend should:

  1. Authenticate the user and determine their tenantId.

  2. Inject AND tenantId = '<tenantId>' into every SQL query before sending it to the Pinot broker.

  3. Never expose the Pinot broker directly to end users.

Example backend pseudocode:

circle-exclamation

Validating isolation

Write integration tests that:

  1. Query with tenant A's filter and verify no tenant B data is returned.

  2. Attempt queries without a tenant filter and verify they are rejected by your backend.

  3. Attempt SQL injection in the tenant ID field and verify it is blocked.

Query patterns

Per-tenant dashboard aggregation

Cross-tenant admin query (internal analytics)

For your own internal dashboards, query without the tenant filter:

Restrict access to this query pattern to internal admin users only.

Operational checklist

Before go-live

Monitoring

  • Per-tenant query rate: Track via broker metrics, broken down by the workload query option. Alert if any tenant exceeds its quota.

  • Query latency by tenant: A spike for one tenant without an overall spike indicates that tenant's query pattern changed (e.g., missing time filter).

  • Segment size per tenant: If one tenant dominates the data volume, consider moving them to a dedicated tenant/server pool.

Common pitfalls

Pitfall
Fix

Tenant filter missing on one API endpoint

Add a centralized query middleware that rejects any query without tenantId in the WHERE clause

One tenant's heavy queries slow everyone

Use workload isolation and per-table query quotas. Move the heavy tenant to a dedicated server tenant

Sorted column on tenantId hurts time-range queries

Use inverted index on tenantId instead, and keep time as the sorted column if time-range performance is more critical

Tenant data leaks via JOIN queries

If using multi-stage queries with JOINs, ensure the tenant filter is applied to both sides of the JOIN

Further reading

Last updated

Was this helpful?