This post has already been read 1347 times!

SQL Server’s query optimizer uses distribution statistics to determine how it’s going to satisfy your SQL query. These statistics represent the distribution of the data within a column, or columns. The Query Optimizer uses them to estimate how many rows will be returned from a query plan.

Distribution statistics are created automatically when you create an index. If you have enabled the automatic creation of statistics (the default setting of the AUTO_CREATE_STATISTICS database setting ) you’ll also get statistics created any time a column is referenced in a query as part of a filtering clause or JOIN criteria.

Density

Density is the easiest of the two to understand. Density is a ratio that shows just how many unique values there are within a given column, or set of columns. The formula is quite easy:

Density = 1 / Number of distinct values for column(s)

 	
SELECT 1.0 / COUNT(DISTINCT MyColumn)
FROM dbo.MyTable;

You can see the density for compound columns too. You just have to modify the query to get a distinct listing of your columns first:

 

SELECT 1.0 /  COUNT(*)
FROM (SELECT DISTINCT FirstColumn,
	SecondColumn
FROM dbo.MyTable) AS DistinctRows;

Data Distribution

The data distribution represents a statistical analysis of the kind of data that is in the first column available for statistics. That’s right, even with a compound index, you only get a single column of data for data distribution.

The rows represent the way the data is distributed within the column by showing a pieces of data describing that distribution:

RANGE_HI_KEY This is the top value of the step represented by this row within the histogram.
RANGE_ROWS This number shows the number of rows within the step that are greater than the previous top value and the current top value, but not equal to either.
EQ_ROWS This number shows the number of rows within the step that are greater than the previous top value and the current top value, but not equal to either.
DISTINCT_RANGE_ROWS These are the distinct count of rows within a step. If all the rows are unique, then the RANGE_ROWS and the DISTINCT_RANGE_ROWS will be equal.
AVG_RANGE_ROWS This represents the average number of rows equal to a key value within the step.

 

DBCC SHOW_STATISTICS

To see the current status of your statistics you use the DBCC statement SHOW_STATISTICS. The output is in three parts:

  • Header: which contains meta-data about the set of statistics
  • Density: Which shows the density values for the column or columns that define the set of statistics
  • Histogram: The table that defines the histogram laid out above

You can pull individual pieces of this data out by modifying the DBCC statement.

Source of this article : https://www.simple-talk.com/sql/learn-sql-server/statistics-in-sql-server/

Leave a Reply

Post Navigation