> For the complete documentation index, see [llms.txt](https://docs.pinot.apache.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.pinot.apache.org/build-with-pinot/ingestion/upsert-dedup/offline-table-upsert.md).

# Offline Table Upsert

Pinot supports upsert on `OFFLINE` tables in builds that include [PR #17789](https://github.com/apache/pinot/pull/17789).

Use it for batch corrections, replays, and late-arriving records.

For a full overview of upsert features (comparison columns, delete columns, TTL, metadata management), see the main [Upsert](/build-with-pinot/ingestion/upsert-dedup/upsert.md) page. This page covers the OFFLINE-specific configuration and differences.

## How offline upsert works

Pinot keeps one row per primary key.

For duplicate keys, Pinot keeps the row with the greatest comparison value.

If you do not set `comparisonColumns`, Pinot uses the table time column.

Offline upsert replaces full rows.

It does not merge partial rows.

## Configure offline upsert

{% stepper %}
{% step %}

#### Define a primary key

Add `primaryKeyColumns` to the schema.

```json
{
  "schemaName": "orders",
  "primaryKeyColumns": ["order_id"]
}
```

{% endstep %}

{% step %}

#### Enable upsert on the offline table

Set `tableType` to `OFFLINE`.

Set `upsertConfig.mode` to `FULL`.

```json
{
  "tableName": "orders_OFFLINE",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "timeColumnName": "event_time",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "30",
    "replication": "3"
  },
  "upsertConfig": {
    "mode": "FULL",
    "comparisonColumns": ["event_time"]
  }
}
```

{% endstep %}

{% step %}

#### Ingest or replace segments

Generate and upload offline segments as usual.

Pinot applies upsert semantics when it loads those segments.

Use append-style uploads for incremental corrections.

Use refresh-style uploads when replacing an existing batch.
{% endstep %}
{% endstepper %}

## When to use it

Use offline upsert when updates arrive in files.

Use it for daily corrections.

Use it for backfills.

Use it for replaying snapshots into offline segments.

## Differences from real-time upsert

Offline upsert does not consume a stream.

It does not require low-level consumers.

It does not depend on stream partitioning.

It fits batch ingestion and segment replacement workflows.

For stream-based updates, use [Stream ingestion with Upsert](/build-with-pinot/ingestion/upsert-dedup/upsert.md).

## Operational notes

Changing the primary key needs a full rebuild.

Changing comparison columns also needs a full rebuild.

Reload alone is not enough for these changes.

If you use a hybrid table, avoid overlapping offline and realtime time ranges.

## Related topics

* [Batch Ingestion](/build-with-pinot/ingestion/batch-ingestion/batch-ingestion.md)
* [Backfill Data](/build-with-pinot/ingestion/batch-ingestion/backfill-data.md)
* [Create and update a table configuration](/operate-pinot/deployment/setup-table.md)
* [Stream ingestion with Upsert](/build-with-pinot/ingestion/upsert-dedup/upsert.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.pinot.apache.org/build-with-pinot/ingestion/upsert-dedup/offline-table-upsert.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.