Segment Management
Assign, distribute, maintain, compact, and repair segments across your Pinot cluster.
Purpose
Segments are the fundamental storage and query unit in Apache Pinot. Every table is divided into segments, and how those segments are assigned to servers, maintained over time, and compacted directly affects query performance, storage cost, and operational resilience. This section covers the full segment lifecycle -- from initial assignment through ongoing maintenance tasks.
Segment assignment and placement
Decide how segments land on servers and how servers are selected for a table.
Balanced, replica-group, and partitioned replica-group assignment strategies
Tag-based isolation, replica-group instance partitioning, pool-based assignment, and mirroring across tables
Segment lifecycle and repair
Understand the operations available when segments need to be reset, reloaded, refreshed, or repaired.
Decision guide for choosing between reset, reload, refresh, rebalance, force commit, purge, and rollback
Step-by-step instructions for reloading segments via the Controller API or Admin Console
Rebalance
Redistribute segments after capacity changes, config updates, or tenant modifications.
When and why to rebalance -- servers, brokers, and tenants
Server rebalance API, parameters, and operational guidance
Worked examples for common rebalance situations
Broker rebalance after adding or removing broker instances
Rebalance all tables belonging to a tenant after tagging changes
Tiered storage
Move older or less-queried data to cheaper storage tiers while keeping recent data on fast disks.
Overview of tiered storage strategies
Use tag overrides to move completed segments to a different server tier
Configure multiple data directories on a single server to span storage devices
Minion tasks for segment maintenance
Automate compaction, merging, purging, and ingestion using Pinot Minion.
Automatically move data from real-time tables to offline tables (RealtimeToOfflineSegmentsTask)
Merge small segments into larger time-aligned segments with optional rollup aggregation
Batch ingestion via Minion -- read files, build segments, push to the cluster
Automatically rebuild segments when the table config or schema changes
Remove or modify records for compliance or data-quality reasons
Reclaim space by removing invalidated records from upsert-enabled tables
Merge small segments while compacting -- reduces segment count in upsert tables
Alternative merge-compact task for upsert tables
Consistent push and rollback
Guarantee atomicity when replacing offline segments and quickly revert a bad data push.
Segment lineage protocol for atomic push and one-click rollback of offline table refreshes
When to use what
Newly added servers have no segments
Run a rebalance
Segment stuck in ERROR state
Reset the segment, then reload if data is corrupt
Schema or index config changed
Reload all segments, or schedule a RefreshSegmentTask for full rebuild
Too many small segments
Schedule a MergeRollupTask or UpsertCompactMergeTask
Stale records in upsert table wasting space
Schedule an UpsertCompactionTask
Need to delete specific records (GDPR)
Schedule a PurgeTask
Bad offline push needs rollback
Recent data needs fast disks, old data can be on HDDs
Configure tiered storage
Last updated
Was this helpful?

