For the complete documentation index, see llms.txt. This page is also available as Markdown.

Spark Pinot Connector Write Model

Spark Connector also has experimental support for writing Pinot segments from Spark DataFrames. Currently, only append mode is supported and the schema of the DataFrame should match the schema of the Pinot table.

// create sample data
val data = Seq(
  ("ORD", "Florida", 1000, true, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("NYC", "New York", 20, true, 1722025994),
)

val airports = spark.createDataFrame(data)
  .toDF("airport", "state", "distance", "active", "ts")
  .repartition(2)

airports.write.format("pinot")
  .mode("append")
  .option("table", "airlineStats")
  .option("tableType", "OFFLINE")
  .option("segmentNameFormat", "{table}_{partitionId:03}")
  .option("invertedIndexColumns", "airport")
  .option("noDictionaryColumns", "airport,state")
  .option("bloomFilterColumns", "airport")
  .option("timeColumnName", "ts")
  .save("myPath")

For more details, refer to the implementation at org.apache.pinot.connector.spark.v3.datasource.PinotDataWriter.

Was this helpful?