Why not rebuild all indexes at once?
On my SQL Master Academy trainings for novice and experienced DBA we discuss a lot why and how to perform index rebuilds. And there are constantly a couple of questions about index rebuilds:
- Do I need index rebuild at all?
- Why not rebuild all the indexes of the database as one task?
Let’s discuss those and give some advices and scripts to use.
Do I need index rebuild at all?
We perform index rebuilds because indexes fragment over time AND because eventually that causes performance problems (or could potentially cause performance problem). And this is essential here – both conditions should be true in order to decide for index rebuild.
Once created a BTree structure doesn’t stay permanent, it becomes fragmented during the modifications. Inserts, updates and deletes fragment indexes in two ways: logically and physically. Logical fragmentation means that object pages are not as contingent as they should be according to their ordering, i.e. the pages are out of order. Physical fragmentation means that there are empty spaces inside the pages (the Swiss cheese pages). The main reason leaf level index pages are fragmented is because of page splits occurring on inserts and some updates (which is a topic for a separate post). Deletes are causing physical fragmentations.
How the fragmentation affects performance?
Index fragmentation affects IO performance when reading the object. It causes:
- Reading more pages than usual because of additional waste space inside pages. Index become larger, takes more time to read the same amount of records and more memory to store increased amount of pages
- Converting to random IO instead of sequential and not able to use read-ahead reads optimizations is a common result of logical fragmentation. If the disk has to jump to different sectors in order to read pages instead of read them sequentially as a block then the performance suffers.
Do we have to prove a performance problem then in order to plan for index rebuilds?
Well that is an arguable question. Sometimes it is hard to prove it because the index fragmentation could add to specific performance issue rather than be the primary reason of performance problem. Most of the times rather than proofing that the performance problem is because of index fragmentation, I would go directly with setting up very precise filtered rebuilds, only those that are needed, because fragmentation of those indexes could potentially cause performance problem and I would like to avoid them beforehand. My advice is usually to make the right configurations and the right maintenance for the database in order to easily isolate further problems in performance.
Why not rebuild all the indexes of the database as one task?
For sure it is easy to just use Rebuild Index task in the Maintenance plan rather then write code and rebuild just specific indexes. Why make our day harder then?
There are important reasons for that.
Rebuilding index even offline is an operation that takes a lot of resources including:
- CPU. The high CPU usage is typical for index rebuild operations. From SQL Server 2005 Enterprise edition you can use MAXDOP for index rebuilds, but reducing the MAXDOP will not lower the overall resource needed, just the contention.
- IO operations and database space because of temporary space:
- Temporary space is minimal in case of offline clustered index rebuild (when no metadata changes). In this case no sort is performed.
- Non-clustered index rebuild needs temporary space for SORT operations in TEMPDB (SORT_IN_TEMPDB option is ON) or in User DB (SORT_IN_TEMPDB is OFF).
- In case of online clustered index rebuild an additional temporary space is needed for mapping index in tempdb (SORT_IN_TEMPDB is ON) or in user db (SORT_IN_TEMPDB is OFF) and for the version store (always in tempdb)
- Creation of clustered index either online or offline always performs SORT and need space for that either in TempDB or in User DB depending on the SORT_IN_TEMPDB setting.
- Online Non-clustered index create/rebuild doesn’t need temporary space.
- The amount of space for SORT, mapping index depends on the size of the index key i.e the number and the size of the index key fields.
- You can find detailed information about index space requirements in the whitepaper: TEMPDB Capacity Planning and Concurrency Considerations for Index Create and Rebuild
- Memory – every page that is processed is stored in memory buffer first
- Transaction Log – index rebuild is an intensive log operation, there are huge number of the log records generated during index rebuilds especially when the database is in full recovery model. It is a minimal log operation in SIMPLE and BULK_LOGGED recovery models. The operation of an index rebuild is transactional and log cannot be truncated until the operation completes. That is why it is important to perform a log backup (FULL recovery model) before index rebuilds and to be sure that the log is configured properly and has enough space to grow (http://msdn.microsoft.com/en-us/library/ms184246.aspx)
Beside amount of resources for index rebuilds, the respective object is not accessible during the operation. Online index rebuilds could be a lot more resource and time consuming.
Filtered Index rebuild
That is why it is important to filter the scope of the index rebuild every time. Here I will give you an example of script for filtered index rebuild that you can use in your maintenance tasks. We don’t care about heaps, also about tables that are rarely used or not used at all, or about tables that are small as number of records/pages. So this is our first and very important filtering. We also need to filter only those indexes that have fragmentation.
Here is the script logic:
- Perform index object filtering for rebuild into a temp table
- Using sys.dm_db_index_physical_stats find leaf level of indexes (index_level=0) for large objects (recordcount filter) with fragmentation metrics reaching the threshold of 30%
- Find index names joining with sys.indexes
- Find object names using object_name() function
- Load those object_id(s) and index_id(s) in a temp table
- Find schema name joining the temp table with information_schema.tables view
- Build a cursor with respective ALTER INDEX REBUILD statement
CREATE PROCEDURE [dbo].[FilteredIndexRebuild]
SELECT ps.object_id, object_name(ps.object_id) AS Tablename, ps.index_id, idx.name AS idxname
FROM sys.dm_db_index_physical_stats(db_ID(), NULL, NULL, NULL, 'detailed') ps
JOIN sys.indexes idx
ON ps.object_id=idx.object_id AND ps.index_id=idx.index_id
AND ps.index_id BETWEEN 1 AND 255
DECLARE @tablename VARCHAR(255)
DECLARE @idxname VARCHAR(255)
DECLARE @table_schema VARCHAR(255)
DECLARE @cmd NVARCHAR(500)
DECLARE IndexListCursor cursor fast_forward FOR
SELECT i.Tablename, i.idxname, s.TABLE_SCHEMA FROM
#idx i JOIN INFORMATION_SCHEMA.TABLES s
FETCH NEXT FROM IndexListCursor INTO @tablename, @idxname, @table_schema
WHILE @@FETCH_STATUS = 0
SET @cmd = 'ALTER INDEX ' + @idxname + ' on '+@table_schema+'.'+@tablename + ' rebuild'
FETCH NEXT FROM IndexListCursor INTO @tablename, @idxname, @table_schema
DROP TABLE #idx
Some warnings on this code: The DMF sys.dm_db_index_physical_stats is very IO intensive especially when working in DETAILED mode, so expect it to take time. Better test the DMF alone when your database is not having an intensive workload. Beside that the DMF has very slow responce time in DETAILED mode if other parameters are NULLs (as in code here) in SQL Server 2008 RTM. That is why I highly recommend you to test the DMF alone first before implementing the filtered indexes procedure. If you encounter slow responce time of the DMF with DETAILED mode and NULLs parameters, then you hav two options:
- run the code and change the mode to LIMITED (you dont have page density col in this mode, so just remove it from the procedure and rebuild indexes based only on logical fragmentation)
- use cursor to pass the table names to the DMF to avoid using NULL for all tables. Take into account that if you decide to use MS_ForEachTable it uses a cursor behind the scenes (and it is undocumented proc).
If you want to additionally filter indexes based on their usage before the rebuild cursor execution, then you could use sys.dm_index_usage_stats and user_updates filed. Here is the query:
SELECT object_id, index_id
WHERE database_id=db_id() AND user_updates>0
BUT take into account that the sys.dm_index_usage_stats returns data cumulated from the last SQL Server start!
Don’t forget to test the scripts for your environment and before implementation!
Good luck and thanks for reading! 🙂