githubEdit

Segment Management

Assign, distribute, maintain, compact, and repair segments across your Pinot cluster.

Purpose

Segments are the fundamental storage and query unit in Apache Pinot. Every table is divided into segments, and how those segments are assigned to servers, maintained over time, and compacted directly affects query performance, storage cost, and operational resilience. This section covers the full segment lifecycle -- from initial assignment through ongoing maintenance tasks.

Segment assignment and placement

Decide how segments land on servers and how servers are selected for a table.

Page
What it covers

Balanced, replica-group, and partitioned replica-group assignment strategies

Tag-based isolation, replica-group instance partitioning, pool-based assignment, and mirroring across tables

Segment lifecycle and repair

Understand the operations available when segments need to be reset, reloaded, refreshed, or repaired.

Page
What it covers

Decision guide for choosing between reset, reload, refresh, rebalance, force commit, purge, and rollback

Step-by-step instructions for reloading segments via the Controller API or Admin Console

Rebalance

Redistribute segments after capacity changes, config updates, or tenant modifications.

Page
What it covers

When and why to rebalance -- servers, brokers, and tenants

Server rebalance API, parameters, and operational guidance

Worked examples for common rebalance situations

Broker rebalance after adding or removing broker instances

Rebalance all tables belonging to a tenant after tagging changes

Tiered storage

Move older or less-queried data to cheaper storage tiers while keeping recent data on fast disks.

Page
What it covers

Overview of tiered storage strategies

Use tag overrides to move completed segments to a different server tier

Configure multiple data directories on a single server to span storage devices

Minion tasks for segment maintenance

Automate compaction, merging, purging, and ingestion using Pinot Minion.

Page
What it covers

Automatically move data from real-time tables to offline tables (RealtimeToOfflineSegmentsTask)

Merge small segments into larger time-aligned segments with optional rollup aggregation

Batch ingestion via Minion -- read files, build segments, push to the cluster

Automatically rebuild segments when the table config or schema changes

Remove or modify records for compliance or data-quality reasons

Reclaim space by removing invalidated records from upsert-enabled tables

Merge small segments while compacting -- reduces segment count in upsert tables

Alternative merge-compact task for upsert tables

Consistent push and rollback

Guarantee atomicity when replacing offline segments and quickly revert a bad data push.

Page
What it covers

Segment lineage protocol for atomic push and one-click rollback of offline table refreshes

When to use what

Goal
Recommended action

Newly added servers have no segments

Run a rebalance

Segment stuck in ERROR state

Reset the segment, then reload if data is corrupt

Schema or index config changed

Reload all segments, or schedule a RefreshSegmentTask for full rebuild

Too many small segments

Stale records in upsert table wasting space

Need to delete specific records (GDPR)

Schedule a PurgeTask

Bad offline push needs rollback

Recent data needs fast disks, old data can be on HDDs

Configure tiered storage

Last updated

Was this helpful?