# Spark Pinot Connector Write Model

{% hint style="warning" %}
Caution: This feature is experimental and the API may change in future releases.
{% endhint %}

Spark Connector also has experimental support for writing Pinot segments from Spark DataFrames. The current implementation writes OFFLINE segment tar files to the destination path supplied via `.save(path)` (or the equivalent `path` option), and the schema of the DataFrame should match the schema of the Pinot table. The examples below use `append` mode.

```
// create sample data
val data = Seq(
  ("ORD", "Florida", 1000, true, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("NYC", "New York", 20, true, 1722025994),
)

val airports = spark.createDataFrame(data)
  .toDF("airport", "state", "distance", "active", "ts")
  .repartition(2)

airports.write.format("pinot")
  .mode("append")
  .option("table", "airlineStats")
  .option("segmentNameFormat", "{table}_{startTime}_{endTime}_{partitionId:03}")
  .option("invertedIndexColumns", "airport")
  .option("noDictionaryColumns", "airport,state")
  .option("bloomFilterColumns", "airport")
  .option("rangeIndexColumns", "distance")
  .option("timeColumnName", "ts")
  .option("timeFormat", "EPOCH|SECONDS")
  .option("timeGranularity", "1:SECONDS")
  .save("myPath")
```

`.save("myPath")` provides the required `path` option automatically. The writer reads the options below and uses them to build Pinot segment metadata and indexes before pushing `tar.gz` segment files to the target filesystem.

### Connector Write Parameters

| Configuration          | Description                                                                                                                                                                                                                                                       | Required               | Default Value                  |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------- | ------------------------------ |
| `table`                | Pinot table name used for generated segment metadata and schema translation.                                                                                                                                                                                      | Yes                    | -                              |
| `path`                 | Destination directory for generated segment tar files. Calling `.save("...")` sets this option automatically. The current implementation pushes to local filesystems and HDFS.                                                                                    | Yes                    | -                              |
| `segmentNameFormat`    | Segment name template. Supports `{table}`, `{partitionId}`, `{startTime}`, and `{endTime}` placeholders, plus zero-padding such as `{partitionId:03}`. `startTime` and `endTime` are populated from the minimum and maximum values of a numeric `timeColumnName`. | No                     | `<tableName>-{partitionId:03}` |
| `invertedIndexColumns` | Comma-separated list of columns that should use Pinot inverted indexes in generated segments.                                                                                                                                                                     | No                     | Empty                          |
| `noDictionaryColumns`  | Comma-separated list of columns that should disable dictionary encoding in generated segments.                                                                                                                                                                    | No                     | Empty                          |
| `bloomFilterColumns`   | Comma-separated list of columns that should use Pinot bloom filters in generated segments.                                                                                                                                                                        | No                     | Empty                          |
| `rangeIndexColumns`    | Comma-separated list of columns that should use Pinot range indexes in generated segments.                                                                                                                                                                        | No                     | Empty                          |
| `timeColumnName`       | Pinot time column name to emit in the generated schema and segment metadata. When set, `timeFormat` and `timeGranularity` must also be set.                                                                                                                       | No                     | None                           |
| `timeFormat`           | Pinot date-time format for `timeColumnName`, for example \`EPOCH                                                                                                                                                                                                  | SECONDS\`.             | Conditionally required         |
| `timeGranularity`      | Pinot granularity for `timeColumnName`, for example `1:SECONDS`.                                                                                                                                                                                                  | Conditionally required | None                           |

### Validation And Behavior Notes

* `table` and `path` are required. The writer rejects requests that omit either option.
* `segmentNameFormat` cannot be an empty string.
* If `timeColumnName` is set, both `timeFormat` and `timeGranularity` must also be set.
* `tableType` is not part of the current write option contract. The writer builds OFFLINE segments regardless of any `tableType` option passed to Spark.

For more details, refer to the implementation at `org.apache.pinot.connector.spark.v3.datasource.PinotDataWriter` and `org.apache.pinot.connector.spark.common.PinotDataSourceWriteOptions`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pinot.apache.org/build-with-pinot/connectors-clients-apis/processing-connectors/spark-pinot-connector/spark-pinot-connector-write-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
