Spark Pinot Connector Write Model

Spark Connector also has experimental support for writing Pinot segments from Spark DataFrames. Currently, only append mode is supported and the schema of the DataFrame should match the schema of the Pinot table.

// create sample data
val data = Seq(
  ("ORD", "Florida", 1000, true, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("ORD", "Florida", 1000, false, 1722025994),
  ("NYC", "New York", 20, true, 1722025994),
)

val airports = spark.createDataFrame(data)
  .toDF("airport", "state", "distance", "active", "ts")
  .repartition(2)

airports.write.format("pinot")
  .mode("append")
  .option("table", "airlineStats")
  .option("tableType", "OFFLINE")
  .option("segmentNameFormat", "{table}_{partitionId:03}")
  .option("invertedIndexColumns", "airport")
  .option("noDictionaryColumns", "airport,state")
  .option("bloomFilterColumns", "airport")
  .option("timeColumnName", "ts")
  .save("myPath")

For more details, refer to the implementation at org.apache.pinot.connector.spark.v3.datasource.PinotDataWriter.

Last updated

Was this helpful?