colA, dictionary encoding saved a significant amount of space for duplicated values.
colBhas no duplicated data. Dictionary encoding will not compress much data in this case where there are a lot of unique values in the column. For the
stringtype, we pick the length of the longest value and use it as the length for the dictionary’s fixed-length value array. The padding overhead can be high if there are a large number of unique values for a column.
sortedColumnwhen generating segments - you don't need to pre-sort the data.