Apache Pinot Docs
Search…
0.4.0
0.4.0 release introduced the theta-sketch based distinct count function, an S3 filesystem plugin, a unified star-tree index implementation, migration from TimeFieldSpec to DateTimeFieldSpec, etc.

Summary

0.4.0 release introduced various new features, including the theta-sketch based distinct count aggregation function, an S3 filesystem plugin, a unified star-tree index implementation, deprecation of TimeFieldSpec in favor of DateTimeFieldSpec, etc. Miscellaneous refactoring, performance improvement and bug fixes were also included in this release. See details below.

Notable New Features

  • Made DateTimeFieldSpecs mainstream and deprecated TimeFieldSpec (#2756)
    • Used time column from table config instead of schema (#5320)
    • Included dateTimeFieldSpec in schema columns of Pinot Query Console #5392
    • Used DATE_TIME as the primary time column for Pinot tables (#5399)
  • Supported range queries using indexes (#5240)
  • Supported complex aggregation functions
    • Supported Aggregation functions with multiple arguments (#5261)
    • Added api in AggregationFunction to get compiled input expressions (#5339)
  • Added a simple PinotFS benchmark driver (#5160)
  • Supported default star-tree (#5147)
  • Added an initial implementation for theta-sketch based distinct count aggregation function (#5316)
    • One minor side effect: DataSchemaPruner won't work for DistinctCountThetaSketchAggregatinoFunction (#5382)
  • Added access control for Pinot server segment download api (#5260)
  • Added Pinot S3 Filesystem Plugin (#5249)
  • Text search improvement
    • Pruned stop words for text index (#5297)
    • Used 8byte offsets in chunk based raw index creator (#5285)
    • Derived num docs per chunk from max column value length for varbyte raw index creator (#5256)
    • Added inter segment tests for text search and fixed a bug for Lucene query parser creation (#5226)
    • Made text index query cache a configurable option (#5176)
    • Added Lucene DocId to PinotDocId cache to improve performance (#5177)
    • Removed the construction of second bitmap in text index reader to improve performance (#5199)
  • Tooling/usability improvement
    • Added template support for Pinot Ingestion Job Spec (#5341)
    • Allowed user to specify zk data dir and don't do clean up during zk shutdown (#5295)
    • Allowed configuring minion task timeout in the PinotTaskGenerator (#5317)
    • Update JVM settings for scripts (#5127)
    • Added Stream github events demo (#5189)
    • Moved docs link from gitbook to docs.pinot.apache.org (#5193)
  • Re-implemented ORCRecordReader (#5267)
  • Evaluated schema transform expressions during ingestion (#5238)
  • Handled count distinct query in selection list (#5223)
  • Enabled async processing in pinot broker query api (#5229)
  • Supported bootstrap mode for table rebalance (#5224)
  • Supported order-by on BYTES column (#5213)
  • Added Nightly publish to binary (#5190)
  • Shuffled the segments when rebalancing the table to avoid creating hotspot servers (#5197)
  • Supported inbuilt transform functions (#5312)
    • Added date time transform functions (#5326)
  • Deepstore by-pass in LLC: introduced segment uploader (#5277, #5314)
  • APIs Additions/Changes
    • Added a new server api for download of segments
      • /GET /segments/{tableNameWithType}/{segmentName}
  • Upgraded helix to 0.9.7 (#5411)
  • Added support to execute functions during query compilation (#5406)
  • Other notable refactoring
    • Moved table config into pinot-spi (#5194)
    • Cleaned up integration tests. Standardized the creation of schema, table config and segments (#5385)
    • Added jsonExtractScalar function to extract field from json object (#4597)
    • Added template support for Pinot Ingestion Job Spec #5372
    • Cleaned up AggregationFunctionContext (#5364)
    • Optimized real-time range predicate when cardinality is high (#5331)
    • Made PinotOutputFormat use table config and schema to create segments (#5350)
    • Tracked unavailable segments in InstanceSelector (#5337)
    • Added a new best effort segment uploader with bounded upload time (#5314)
    • In SegmentPurger, used table config to generate the segment (#5325)
    • Decoupled schema from RecordReader and StreamMessageDecoder (#5309)
    • Implemented ARRAYLENGTH UDF for multi-valued columns (#5301)
    • Improved GroupBy query performance (#5291)
    • Optimized ExpressionFilterOperator (#5132)

Major Bug Fixes

  • Do not release the PinotDataBuffer when closing the index (#5400)
  • Handled a no-arg function in query parsing and expression tree (#5375)
  • Fixed compatibility issues during rolling upgrade due to unknown json fields (#5376)
  • Fixed missing error message from pinot-admin command (#5305)
  • Fixed HDFS copy logic (#5218)
  • Fixed spark ingestion issue (#5216)
  • Fixed the capacity of the DistinctTable (#5204)
  • Fixed various links in the Pinot website

Work in Progress

  • Upsert: support overriding data in the real-time table (#4261).
    • Add pinot upsert features to pinot common (#5175)
  • Enhancements for theta-sketch, e.g. multiValue aggregation support, complex predicates, performance tuning, etc

Backward Incompatible Changes

  • TableConfig no longer support de-serialization from json string of nested json string (i.e. no \" inside the json) (#5194)
  • The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap) (#5371):
    1
    void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
    2
    void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
    3
    void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
    Copied!
Last modified 2yr ago