What are SQL Server Statistics?
Statistics store information about the distribution of data in your tables. The optimizer uses this metadata to estimate the selectivity of a given statement, and create the optimal query plan. To manage SQL Server efficiently, you really need a good understanding of the statistics. In this post I will just provide some information that may help you to better understand, and manage your statistics
Statistics are automatically created on each of the index columns, when an index is created, and they are created automatically by the optimizer, when AUTO_CREATE_STATISTICS is enabled. Auto-created stats names include the column name and object ID in hexadecimal format, like this: _WA_Sys_00000002_79FD19BE
Statistics can also be created manually, via the CREATE STATISTICS statement... though I, personally, have never had a need. Generally speaking, the optimizer typically does quite fine.
That's how they are created, but where are they stored? Statistics are stored in the sysindexes table. We can also find additional information within the sys.stats and sys.indexes metadata views.
By default, the statistics are updated automatically. This occurs when 20% of the data in a table changes. It's important to know that this threshold can sometimes be a little high on the larger tables. For this reason, it's not a bad idea to create a custom routine specific to the larger tables alone. Or, even to just update the stats manually on those larger tables, on a periodic basis.
How do you know if your statistics are being updated? Run this statement to see if AUTO UPDATE is enabled:
EXEC sp_helpdb YourDBName
The status line in the result-set should include 'IsAutoUpdateStatistics'. If it doesn't, your statistics are not being updated.... ouch. You'll want to correct that with this next statement, but remember -- if stats haven't been updated for a while, this 1st run can be pretty timely, and invasive. Be sure to do this after hours:
USE YourDBName;
-- Enable Auto Update of Statistics
ALTER DATABASE YourDBName
SET AUTO_UPDATE_STATISTICS ON;
GO
-- Update Statistics for whole database
EXEC sp_updatestats
GO
So, how does SQL Server actually use the statistics? The optimizer uses this metadata to estimate how many rows each operator in the query plan will operate on. That estimation is used to calculate the cost of each operator, and identify the cheapest query plan for the statement you are running. Outdated, inaccurate or missing statistics can dramatically impact this process. If the optimizer cannot accurately estimate this cost, you are going to see substantial performance degradation.
Performance not quite what it should be? Well... when were your statistics last updated? Real easy to check:
SELECT
OBJECT_NAME(object_id) [Table],
STATS_DATE(object_id, stats_id) [LastUpdated],
name [Statistic]
FROM
sys.stats
WHERE
OBJECT_NAME(object_id) NOT LIKE 'sys%'
ORDER BY
STATS_DATE(object_id, stats_id) DESC
TIP: Keep that one in your quick list. Somebody says 'why is it so slow?'.... start by
checking the stats.
Alright, so how do you read your stats? I've listed several different methods below. Run them on your system, take a look at your output. This should help to better familiarize yourself with the statistics overall.
-- using sys.partitions and sys.indexes
SELECTOBJECT_NAME(p.object_id) [Table],
p.index_id [IndexID],
i.name [Index],
p.rows [#Rows]
FROM
sys.partitions p INNER JOIN sys.indexes i
ON p.object_id = i.object_id
AND p.index_id = i.index_id
WHERE
p.object_id = OBJECT_ID(N'Production.WorkOrder');
-- using sys.dm_db_index_physical_stats
SELECT
OBJECT_NAME(ips.object_id) [Table],
ips.index_id [IndexID],
i.name [Index],
ips.record_count [#Rows]
FROM
sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks2012'),OBJECT_ID(N'Production.WorkOrder'),NULL,NULL,'DETAILED') ips JOIN sys.indexes i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE
ips.index_level = 0;
-- using sysindexes, sysobjects, sysusers, and information_schema.tables
SELECT
schemas.name [Schema],
tbls.name [Table],
i.name [Index],
i.id [TableID],
i.indid [IndexID],
i.GroupID,
i.rowmodctr [ModifiedRows],
(SELECT MAX(rowcnt)
FROM sysindexes i2
WHERE i.id = i2.id
AND i2.indid < 2) [RowCnt],
CONVERT(DECIMAL(18,8),CONVERT(DECIMAL(18,8),i.rowmodctr) /
CONVERT(DECIMAL(18,8),
(SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) ) )
[ModifiedPct],
stats_date( i.id, i.indid ) [LastStatsUpdate],
'False' [Processed]
FROM
sysindexes i INNER JOIN sysobjects tbls
ON i.id = tbls.id INNER JOIN sysusers schemas
ON tbls.uid = schemas.uid INNER JOIN information_schema.tables tl
ON tbls.name = tl.table_name
AND schemas.name = tl.table_schema
AND tl.table_type = 'BASE TABLE'
WHERE
0 < i.indid
AND i.indid < 255
AND table_schema <> 'sys'
AND i.rowmodctr <> 0
AND (SELECT MAX(rowcnt) FROM sysindexes i2 WHERE i.id = i2.id AND i2.indid < 2) > 0
-- using sys.stats and sys.dm_db_stats_properties
SELECT
st.object_id [TableID],
OBJECT_NAME(st.object_id) [Table],
st.name [Index],
STATS_DATE(st.object_id, st.stats_id) [LastUpdated],
modification_counter [RowsModified]
FROM
sys.stats st CROSS APPLY sys.dm_db_stats_properties(st.object_id, st.stats_id) sp
WHERE
STATS_DATE(st.object_id, st.stats_id) <= DATEADD(DAY,-1,GETDATE())
AND modification_counter > 0
AND OBJECTPROPERTY(st.object_id, 'IsUserTable' ) = 1
That's it for now. It's pretty simple -- if your statistics are out of date, or missing, the optimizer is not able to make optimal query plan choices. This will impede query performance. In some cases very dramatically. I have given you a lot of choices for viewing your statistics, and I have even sampled an actual statistic update. I encourage you to take a look at Ola Hallengren's database maintenance solution, which includes a good piece for maintaining SQL Server statistics. See here:
Another good reference is at Brent Ozar's site, the sp_Blitz script. Among many other things, this procedure will tell you where your auto stats updates are disabled. I encourage you to take a look at this page, but -- don't stop with the statistics. You can use sp_Blitz to perform a very fast, yet hugely informative health check on your systems. Check it out.
Lastly, take a look here for more in depth information regarding the statistics:
http://msdn.microsoft.com/en-us/library/ms190397.aspx