stringtypes. As seen in the
colA, dictionary encoding will save significant amount of space for duplicated values. On the other hand,
colBhas no duplicated data. Dictionary encoding will not compress much data in this case where there are a lot of unique values in the column. For
stringtype, we pick the length of the longest value and use it as the length for dictionary’s fixed length value array. In this case, padding overhead can be high if there are a large number of unique values for a column.
sortedColumnwhen generating segment internally. For offline push, input data needs to be sorted before running Pinot segment conversion and push job.