Archive for the ‘Statistics’ Category

Should you update statistics after index rebuild?

Pop Quiz Hot Shot: Should you update statistics after index rebuild?

The short answer is NO! This is because index rebuild will update the statistics with a full scan. If you update the statistics manually after the index rebuild, you will be updating it using the default, which is a sampled scan. This is a common mistake!

When not explained correctly, this concept can even throw the very seasoned or senior DBAs into loops of confusions.  Here, I will redirect you to one of the most trusted sources and authorities for SQL Server, Paul Randal. In his posting “Rebuilding Indexes and Updating Statistics,” Paul writes:

The problem I’ve been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you’re left with sampled statistics. You’ve wasted resources doing the sampled scan AND you’ve lost the ‘free’ full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don’t lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what’s the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don’t get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

What can you do? Well, you can start writing your customized script for index and update statistics maintenance or you can take a look at Ola Hallengren’s “SQL Server Index and Statistics Maintenance” script. I would opt for the latter. Why reinvent the wheels or sacrifice the precious time you could spent on other also very interesting topics of SQL Server such as performance tuning or deep dive research into SQL Server Replication?

REFERENCES:

Rebuilding Indexes and Updating Statistics
http://www.sqlskills.com/blogs/paul/search-engine-qa-10-rebuilding-indexes-and-updating-statistics/

SQL Server Index and Statistics Maintenance
http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

When and how to update statistics?

SQL Server collects statistical information about indexes and data columns stored in the databases. The statistics are utilized by the SQL Server Query Optimizer internally to select the most efficient plan for data retrieval and updates.

Statistics varies in 3 different types: index statistics, column statistics, and user defined statistics. Index statistics are created when an index is created. Column statistics are created by the SQL Server Query Optimizer and they start with _WA_Sys_. User defined statistics are created by the database administrators using the CREATE STATISTICS statements.

There are 3 different database options that can control the statistics operations. AUTO_CREATE_STATISTICS, AUTO_UPDATE_STATISTICS, and AUTO_UPDATE_STATISTICS_ASYNC.

  • AUTO_CREATE_STATISTICS: This option is on by default. When set to on, the query optimizer creates statistics on data columns to improve estimates for the query plans for the SQL Server engine. These statistics are the ones that have the names start with _WA_Sys.
  • AUTO_UPDATE_STATISTICS: This option is on by default. When set to on, the query optimizer will update the statistics when they are out-of-date when they are used by queries.
  • AUTO_UPDATE_STATISTICS_ASYNC: This option is off by default. ASYNC is the abbreviation for asynchronous. An easy way to remember this option is that when this option is off, synchronous takes effect. If the statistics are out-of-date, the query optimizer will update the statistics and then compile and execute query. Whereas, if asynchronous is on, the optimizer will use the existing statistics even if they are out-of-date.

To find out if these options are on or off, you can run the following query:

-- querying the auto create, auto update, auto update stats async on options
select name as database_name
,is_auto_create_stats_on
,is_auto_update_stats_on
,is_auto_update_stats_async_on
from sys.databases

Updating statistics will help queries compile with up-to-date statistics. But, it will cause queries to recompile. Query recompile takes times. Time is money. As much as you hate the answer to your question of “When should I update statistics manually?” The correct answer is it depends on your application or system. Usually, you can consider updating statistics when your queries are running slow, not enough inserts or updates to trigger the auto update stats, and after the index reorganize maintenance. If the same query response time is slow as compared to yesterday or last week, you can update the statistics before moving on your checklist of troubleshooting steps. When the auto update stats database option is left on by default, it will update statistics when there are 500 rows plus 20 percent changes in the table rows. If there are not enough changes, then a manual or scheduled update statistics operation will be needed. Index rebuild will automatically update statistics as it will drop and re-create the index. But, index reorganization will not. You can schedule a SQL Server Agent job to update the statistics.

To view the last date the statistics were updated for all tables in all databases for a SQL Server instance, you can run the following query:

-- Querying the statistics info for all tables in all databases

-- create a temp table to store the statistics info for all tables in all databases
IF OBJECT_ID('TEMPDB..#Stats') IS NOT NULL
DROP TABLE #Stats

CREATE TABLE #Stats
(
	ID int identity (1,1)
	,Database_Name VARCHAR(123)
	,Table_Name NVARCHAR(MAX)
	,stats_name NVARCHAR(MAX)
	,stats_lastupdate_date_time DATETIME
)

-- load the statistics info into the temp table
INSERT INTO #Stats
EXEC sp_MSforeachdb 'USE ?
IF DB_NAME() NOT IN(''master'',''msdb'',''tempdb'',''model'')
SELECT db_name() as database_name
	,tb.name AS table_name
	,st.name AS stats_name
    ,STATS_DATE(st.object_id, st.stats_id) AS statistics_update_date
FROM sys.stats st
	JOIN sys.tables tb
		ON st.object_id = tb.object_id
WHERE tb.type = ''U''
ORDER BY db_name(), STATS_DATE(st.object_id, st.stats_id) desc
'
-- query the statistics info temp table
SELECT st.Database_Name
,st.Table_Name
,st.stats_name
,st.stats_lastupdate_date_time
FROM #Stats st

You can run the sp_updatestats system stored proc to update the statistics on all user-defined and internal tables in the current database. You can schedule a SQL Server Agent job to run on a specific time interval to automate this process. To update statistics on all user databases, you can use the sp_msforeachdb stored proc as in the following:

EXECUTE sp_msforeachdb 'USE [?]
IF DB_NAME() NOT IN(''master'',''msdb'',''tempdb'',''model'')
EXEC sp_updatestats'

Enjoy and have a Happy New Year!

REFERENCES:
UPDATE STATISTICS
http://msdn.microsoft.com/en-us/library/ms187348(v=sql.105).aspx

sp_updatestats
http://msdn.microsoft.com/en-us/library/ms173804(v=sql.105).aspx

Using Statistics to Improve Query Performance
http://msdn.microsoft.com/en-us/library/ms190397(v=sql.105).aspx

STATS_DATE()
http://msdn.microsoft.com/en-us/library/ms190330(v=sql.105).aspx