Histograms for strongly skewed columns

On a recent OTN thread, I learned a nice trick by J. Lewis that allows to circumvent certain problems with histograms.

Histograms were designed to solve the problem of estimating cardinality for skewed columns (i.e. where some values occur much more frequently than the others). For columns with low number of distinct values (NDV) Oracle collects a frequency histogram, which can be thought of as a set of two one-dimensional arrays:  one containing all possible values, the other containing their frequency (i.e. how many rows have this value). However, if sample size is small, then Oracle can miss rare values, and they won’t be reflected in the histogram. As a result, the cardinality estimates for those values will be wrong (depending on version Oracle will either set it to either 1 or to half of the frequency for the rarest value found).  A detailed explanation of the issues with examples can be found in blog posts by J. Lewis and R. Geist.

Continue reading “Histograms for strongly skewed columns”