Be careful when scheduling an online index rebuild

Online index rebuild is a feature available from SQL Server 2005 Enterprise Edition. While it does sound useful and attractive to use you should be careful considering it as part of your database maintenance plan. If you use it as a regular index rebuild operation then you could end up with serious locking and blocking and you cannot be sure and define the approx time of index rebuilds, it could be different every time.

In one of my customers the online index rebuild of the database took between 2 and 12 hours! So we decide to refine the task and go for offline index rebuilds for most of the indexes.

Why is that happening and what is behind the online index rebuilds.

The process of rebuilding an index in general is a process of re-creating the B-tree index structure. When it is offline, then the table for which the index is rebuild is not available even for reading. This is because the process holds an exclusive lock on the table or indexed view during the operation.

During the online index rebuild you can read the table most of the time during the operation. And the important word here is not just reading but MOST of the time, and not during the WHOLE time of rebuilding. There are periods of time (how short they are it depends) while you cannot access the table.

The beginning of the operation holds a S(hared) lock of the table, which generally complements to other DML operations (other reads in particular). At this moment you can encounter locking if other sessions are updating (such holding X lock on keys/pages and IX lock on table) the table data. Then your index rebuild will wait to start until your session completes the updates. This is the first period you can encounter blocking.

During the build operation everything seems fine as we are allowed to perform DML operations to the objects without lock waits and conflicts (the green arrow on the scheme). Then at the end of the index operation it is trying to hold the SCH_M (schema modification lock) in order to “replace” the structure and complete the process. This lock is not compatible with any other locks, even S lock, meaning that during this period the table is actually offline. The rebuild operation will wait until all the sessions and their transactions complete and release locks in order to obtain its SCH_M lock. It could take hours. And vice versa, your session will wait if the rebuild index currently holds SCH_M lock, this period is generally short.

As you can see the online index operation is partially online. It takes more time when your update workload is heavy. The main reduction in online index operations performance and duration are in creating an index due to additional sort operation, and in rebuilding a cluster index due to maintaining a temporary mapping index, and the larger size of the clustered index.

The last but not least is taking into account the transaction log growth (depending on the recovery model) during the index rebuilds, as well as overall increasing of the resource usage (increasing memory consumption, IO , CPU and tempdb usage) during the online index operations. During test that Microsoft published the Log-to-Data Ratio using FULL recovery model while performing rebuilding a clustered index is 1.34 comparing to 0.14 in BULK_LOGGED or SIMPLE recovery model.

All that leads to the following recommendations:

If possible always try to define the maintenance window for the database
Select critical tables that are fragmented and define offline index rebuilds for them. Those are tables having high level of updates or reads. You can use sys.dm_db_index_physical stats in order to gain the average fragmentation and page density level. Test if their offline index rebuilds fit in the maintenance window as a duration. Generally you don’t have to pay attention to heaps and to small tables.
Perform the remaining rebuilds online if needed
Manage Transaction log effectively. Leave enough space to grow. Configure properly as initial size and growth settings, don’t leave defaults (see default fragmentaion on my blogpost http://blogs.technet.com/b/magi/archive/2009/05/22/fragmentation-the-database-performance-killer.aspx). If your database is in FULL recovery model, then take a log backup before index maintenance operation.

If you want to go deeper in how online index operations work and to read the test cases I highly recommend you this whitepaper: http://technet.microsoft.com/en-us/library/cc966402.aspx

Good Luck and thanks for reading! 🙂