# 0.4.0

### Summary

0.4.0 release introduced various new features, including the theta-sketch based distinct count aggregation function, an S3 filesystem plugin, a unified star-tree index implementation, deprecation of TimeFieldSpec in favor of DateTimeFieldSpec, etc. Miscellaneous refactoring, performance improvement and bug fixes were also included in this release. See details below.

### Notable New Features

* Made DateTimeFieldSpecs mainstream and deprecated TimeFieldSpec (#2756)
  * Used time column from table config instead of schema (#5320)
  * Included dateTimeFieldSpec in schema columns of Pinot Query Console #5392
  * Used DATE\_TIME as the primary time column for Pinot tables (#5399)
* Supported range queries using indexes (#5240)
* Supported complex aggregation functions
  * Supported Aggregation functions with multiple arguments (#5261)
  * Added api in AggregationFunction to get compiled input expressions (#5339)
* Added a simple PinotFS benchmark driver (#5160)
* Supported default star-tree (#5147)
* Added an initial implementation for theta-sketch based distinct count aggregation function (#5316)
  * One minor side effect: DataSchemaPruner won't work for DistinctCountThetaSketchAggregatinoFunction (#5382)
* Added access control for Pinot server segment download api (#5260)
* Added Pinot S3 Filesystem Plugin (#5249)
* Text search improvement
  * Pruned stop words for text index (#5297)
  * Used 8byte offsets in chunk based raw index creator (#5285)
  * Derived num docs per chunk from max column value length for varbyte raw index creator (#5256)
  * Added inter segment tests for text search and fixed a bug for Lucene query parser creation (#5226)
  * Made text index query cache a configurable option (#5176)
  * Added Lucene DocId to PinotDocId cache to improve performance (#5177)
  * Removed the construction of second bitmap in text index reader to improve performance (#5199)
* Tooling/usability improvement
  * Added template support for Pinot Ingestion Job Spec (#5341)
  * Allowed user to specify zk data dir and don't do clean up during zk shutdown (#5295)
  * Allowed configuring minion task timeout in the PinotTaskGenerator (#5317)
  * Update JVM settings for scripts (#5127)
  * Added Stream github events demo (#5189)
  * Moved docs link from gitbook to docs.pinot.apache.org (#5193)
* Re-implemented ORCRecordReader (#5267)
* Evaluated schema transform expressions during ingestion (#5238)
* Handled count distinct query in selection list (#5223)
* Enabled async processing in pinot broker query api (#5229)
* Supported bootstrap mode for table rebalance (#5224)
* Supported order-by on BYTES column (#5213)
* Added Nightly publish to binary (#5190)
* Shuffled the segments when rebalancing the table to avoid creating hotspot servers (#5197)
* Supported inbuilt transform functions (#5312)
  * Added date time transform functions (#5326)
* Deepstore by-pass in LLC: introduced segment uploader (#5277, #5314)
* APIs Additions/Changes
  * Added a new server api for download of segments
    * **/GET** /segments/{tableNameWithType}/{segmentName}
* Upgraded helix to 0.9.7 (#5411)
* Added support to execute functions during query compilation (#5406)
* Other notable refactoring
  * Moved table config into pinot-spi (#5194)
  * Cleaned up integration tests. Standardized the creation of schema, table config and segments (#5385)
  * Added jsonExtractScalar function to extract field from json object (#4597)
  * Added template support for Pinot Ingestion Job Spec #5372
  * Cleaned up AggregationFunctionContext (#5364)
  * Optimized real-time range predicate when cardinality is high (#5331)
  * Made PinotOutputFormat use table config and schema to create segments (#5350)
  * Tracked unavailable segments in InstanceSelector (#5337)
  * Added a new best effort segment uploader with bounded upload time (#5314)
  * In SegmentPurger, used table config to generate the segment (#5325)
  * Decoupled schema from RecordReader and StreamMessageDecoder (#5309)
  * Implemented ARRAYLENGTH UDF for multi-valued columns (#5301)
  * Improved GroupBy query performance (#5291)
  * Optimized ExpressionFilterOperator (#5132)

### Major Bug Fixes

* Do not release the PinotDataBuffer when closing the index (#5400)
* Handled a no-arg function in query parsing and expression tree (#5375)
* Fixed compatibility issues during rolling upgrade due to unknown json fields (#5376)
* Fixed missing error message from pinot-admin command (#5305)
* Fixed HDFS copy logic (#5218)
* Fixed spark ingestion issue (#5216)
* Fixed the capacity of the DistinctTable (#5204)
* Fixed various links in the Pinot website

### Work in Progress

* Upsert: support overriding data in the real-time table (#4261).
  * Add pinot upsert features to pinot common (#5175)
* Enhancements for theta-sketch, e.g. multiValue aggregation support, complex predicates, performance tuning, etc

### Backward Incompatible Changes

* TableConfig no longer support de-serialization from json string of nested json string (i.e. no `\"` inside the json) (#5194)
* The following APIs are changed in AggregationFunction (use TransformExpressionTree instead of String as the key of blockValSetMap) (#5371):

  ```
  void aggregate(int length, AggregationResultHolder aggregationResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
  void aggregateGroupBySV(int length, int[] groupKeyArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
  void aggregateGroupByMV(int length, int[][] groupKeysArray, GroupByResultHolder groupByResultHolder, Map<TransformExpressionTree, BlockValSet> blockValSetMap);
  ```
