# Batch Ingestion

Batch ingestion builds Pinot segments outside the cluster and pushes them into Pinot after the data is already shaped. Use it when the data changes in larger chunks, when you need deterministic backfills, or when the pipeline already produces files or segment artifacts.

The most important design choice is not the framework, but the output contract: what the schema looks like, what the table expects, and where the segments land.

## Common batch paths

Spark-based ingestion.

Hadoop-style distributed ingestion.

Backfill jobs for historical ranges.

Dimension tables and other specialized offline loads.

## What to decide early

Decide on the file format, the deep-storage target, and the segment push workflow before you optimize the job itself. Most batch ingestion problems come from mismatched assumptions at those boundaries.

## Learn more

The original step-by-step batch docs live in [Import Data](/build-with-pinot/ingestion.md) and [Data Ingestion Overview](/build-with-pinot/ingestion.md).

## What this page covered

This page covered when to choose batch ingestion and the main design decisions that shape it.

## Next step

Read [Stream Ingestion](/build-with-pinot/ingestion/stream-ingestion.md) if the source system is a live event stream.

## Related pages

* [Ingestion](/build-with-pinot/ingestion.md)
* [Stream Ingestion](/build-with-pinot/ingestion/stream-ingestion.md)
* [Formats and Filesystems](/build-with-pinot/ingestion/formats-filesystems.md)
* [Original Batch Docs](/build-with-pinot/ingestion/batch-ingestion/batch-ingestion.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/build-with-pinot/ingestion/batch-ingestion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.