# Offline Table Upsert

Pinot supports upsert on `OFFLINE` tables in builds that include [PR #17789](https://github.com/apache/pinot/pull/17789).

Use it for batch corrections, replays, and late-arriving records.

For a full overview of upsert features (comparison columns, delete columns, TTL, metadata management), see the main [Upsert](https://docs.pinot.apache.org/build-with-pinot/ingestion/upsert-dedup/upsert) page. This page covers the OFFLINE-specific configuration and differences.

## How offline upsert works

Pinot keeps one row per primary key.

For duplicate keys, Pinot keeps the row with the greatest comparison value.

If you do not set `comparisonColumns`, Pinot uses the table time column.

Offline upsert replaces full rows.

It does not merge partial rows.

## Configure offline upsert

{% stepper %}
{% step %}

#### Define a primary key

Add `primaryKeyColumns` to the schema.

```json
{
  "schemaName": "orders",
  "primaryKeyColumns": ["order_id"]
}
```

{% endstep %}

{% step %}

#### Enable upsert on the offline table

Set `tableType` to `OFFLINE`.

Set `upsertConfig.mode` to `FULL`.

```json
{
  "tableName": "orders_OFFLINE",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "timeColumnName": "event_time",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "30",
    "replication": "3"
  },
  "upsertConfig": {
    "mode": "FULL",
    "comparisonColumns": ["event_time"]
  }
}
```

{% endstep %}

{% step %}

#### Ingest or replace segments

Generate and upload offline segments as usual.

Pinot applies upsert semantics when it loads those segments.

Use append-style uploads for incremental corrections.

Use refresh-style uploads when replacing an existing batch.
{% endstep %}
{% endstepper %}

## When to use it

Use offline upsert when updates arrive in files.

Use it for daily corrections.

Use it for backfills.

Use it for replaying snapshots into offline segments.

## Differences from real-time upsert

Offline upsert does not consume a stream.

It does not require low-level consumers.

It does not depend on stream partitioning.

It fits batch ingestion and segment replacement workflows.

For stream-based updates, use [Stream ingestion with Upsert](https://docs.pinot.apache.org/build-with-pinot/ingestion/upsert-dedup/upsert).

## Operational notes

Changing the primary key needs a full rebuild.

Changing comparison columns also needs a full rebuild.

Reload alone is not enough for these changes.

If you use a hybrid table, avoid overlapping offline and realtime time ranges.

## Related topics

* [Batch Ingestion](https://docs.pinot.apache.org/build-with-pinot/ingestion/batch-ingestion/batch-ingestion)
* [Backfill Data](https://docs.pinot.apache.org/build-with-pinot/ingestion/batch-ingestion/backfill-data)
* [Create and update a table configuration](https://docs.pinot.apache.org/operate-pinot/deployment/setup-table)
* [Stream ingestion with Upsert](https://docs.pinot.apache.org/build-with-pinot/ingestion/upsert-dedup/upsert)
