SQL Server 2014 Managed Lock Priority for Online Index Rebuild, PART 1
Can I have predictable Online Index Rebuilds?
As I already wrote in my post “Be careful with scheduling online index rebuilds” the problem with online index rebuild (except the need of huge amount of resources) is unpredictability and blocking. In my customer’s case the online index rebuild take between 2 and 12 hours.
It is because the online index rebuild (OIR) acquires an exclusive table Sch-M lock during the DDL operation impacting the database workload running concurrently and using the affected table. In case of OIR two locks are required, a table S-lock in the beginning of the DDL operation and a Sch-M lock at the end. In order to execute the DDL statement for OIR, all active blocking transactions running on a particular table must be completed. Conversely, when the DDL for OIR is executed first, this will block all new transactions that require locks on the affected table. Although the duration of the lock for OIR is very short, waiting for all open transactions on a given table to complete and blocking the new transactions to start, may significantly affect the throughput, causing workload slow down or timeout, and significantly limiting an access to the underlying table(s). This has an impact for 24X7 Mission Critical workloads that focus on maximizing throughput and availability with short or non-existent maintenance window.
The new option in SQL Server 2014 called Managed Lock Priority seems to provide at least first steps in getting to the solution of the problem (or creating additional problems?). The new feature allows a DBA to manage the S-lock/Sch-M lock for OIR. The DBA now have the ability to specify one of different manageability options to handle the DDL locks of OIR:
- Enforce the OIR by killing all blockers immediately or after a specified wait time (MAX_DURATION =n [minutes])
- Wait for blockers and after the wait time (MAX_DURATION) expires place the lock in the regular lock queue ( as it does today)
- Wait for blockers and after the wait time expires (MAX_DURATION) exit the OIR without any action
In all 3 cases if during the MAX DURATION wait time there are no blocking activities, the OIR lock will be executed immediately and the DDL statement will be completed without any waits.
As this sounds like a solution of the problem, “killing blockers” i.e. user sessions and transactions, doesn’t look to me like a very friendly action, especially for a 24*7 mission critical workload, which is an usual candidate for OIR. At least I wouldn’t be so exited about that without knowing what exactly is happening in every scenario and without being sure that I can have all the processes information as transparent as possible. Killing (exiting) a resource intensive online index rebuild on the other hand could take forever and create additional overhead. As this new feature brings more questions than answers it really grabbed my attention. So I decided to dive into it and to try to define when and how to use it.
Lets start with the first option of giving a precedence of OIR over user transactions i.e. killing all transactions that prevent OIR from start/finish. I am going to write about the other options in next post (look for part 2 soon).
I am starting a new transaction on the AdventureWorks2012DW database (session 51):
Another session (56) is trying to perform an online index rebuild operation and is waiting 1min at low priority for the user transaction to complete. After reaching the MAX_DURATION of 1 min, the session 56 rolls back the user transaction (because it is logged as a member of sysadmin role) and kills the user session 51, which is then disconnected. It completes the OIR operation after that. The new option works just as expected and described in the SQL Engine blog.
So at the end we have a “kill process” message in the SQL Server Error Log, preceded by two messages – the staring of an ALTER INDEX REBUILD with a wait_with_low_priority option, and the ABORT_AFTER_WAIT = BLOCKERS message notifying that the max duration has expired and the sessions holding lock on the specific object id will be killed:
The blocking session IDs have different numbers in some screenshots because of reconnecting sessions during the tests.
That’s all you have without any additional configurations. Not bad at all. Let’s try to find out the surroundings. How the locks look like during the wait_at_low_priority operation?
There are two new lock statuses in addition to GRANT and WAIT. Those are LOW_PRIORITY_WAIT and ABORT_BLOCKERS. The LOW_PRIORITY_WAIT appears when the ALTER INDEX with Manage lock priority starts and we can see it for a while because there are blocking transactions. When the MAX DURATION expires but there are still locks preventing the process to continue, the lock transforms to ABORT_BLOCKERS, meaning that the process which holds it is going to kill the blocking user transactions:
We have new wait types showing the detailed lock wait that is occurring during the Priority OIR operation. We can see them by querying the sys.dm_os_waiting_tasks during the MAX_DURATION time (see row 2).
The LCK_M_IS_LOW_PRIORITY shows that a session holds a LCK_M lock on the object on which a low priority index rebuild operation (during its MAX_DURATION time) is trying to obtain an IS lock, which on the other hand is a first lock of the OIR on its path of obtaining the S lock. So the rule of reading the wait type – it shows the blocking lock of the user transaction on the first place, then the alter index lock type that is waiting with on a low priority.
There are about 42 new wait types available in the sys.dm_os_wait_stats cumulating every single lock wait during the whole Managed lock OIR operation. Depending on those types you can easy diagnose at what stage of the OIR operation most waits occur.
If we have waits then we could have blocking in case we want to define and monitor it. And if you setup the Blocked Process Report event either using tracing or xEvents you will see the event raising:
It is interesting that we don’t see the blocking has something connected to the OIR Lock Priority wait as part of the MAX_DURATION time interval. I assume that they just haven’t make the changes to the event yet. But it is definitely true that we can use the standard ways of detection of blocking in order to monitor the Manage Lock waits.
That’s great. But imagine that we have a scheduled OIR as part of a job and we want to know how the operation completes, how many transactions has been rolled back or some other info.
How we could get that info?
One way to get that is to parse the error log, using the respective procedures as shown below, load it to a table and have a list of all the killed processes during the OIR operation:
Or to perform a monitoring based on locks and waits using the sys.dm_tran_locks and sys.dm_exec_waiting_tasks, joined with sys.dm_exec_sessions so we can record the activity during the OIR operation. I would really like to have smarter way of performing that.
OK, let’s see some more detailed info and what we can use further in our monitoring. We have tree new xEvents:
- ddl_with_wait_at_low_priority – it raises an event when a Manage Lock Priority operation is starts
- lock_request_priority_state– it shows the lock status change of the OIR process during the whole operation
- process_killed_by_abort_blockers – it raises an event when a process has been killed by the Managed lock operation
If I create an xEvent session choosing all tree new events with some additional actions as sql_text (I am using the xEvent Session Wizard, and the Watch Live Data GUI), I can perfectly track the OIR lock acquiring process:
The ddl_with_wait_at_low_priority starts the OIR (1). The lock_request_priority_state starts with Sch_S lock while trying to read the table schema (2), then obtaining the IS (Intent Shared) lock on the table (3 and 4). You can see that the time interval between 3 and 4 is exactly the MAX_DURATION time of 1 minute. If the low priority Period expires (4) and the process still cannot obtain the IS lock, (because in this case the object has an X lock held by a transaction), the Abort Blockers state of the event is raising (5) and the process 51 is killed (6). After that the process obtains consequentially all the needed locks – IS (7), S (8) and Sch_M (9). Note that there are only Low Priority Period Started event states, no Expired events, as the process just continues without any more waits. If the OIR process meets another user transaction while trying to put the Sch_M lock, then a new MAX_DURATION time interval starts, and we will see a Low Priority Period Expired after the time expires, probably yet another Abort blocker and a killed process.
The whole info is pretty fine if I want to dive into the transaction that is performing the OIR and to understand the lock acquiring during the operation. But I still think that the it is more important to have an information not just which session ID has been killed but what this session has been doing, how much time it has been active before the OIR killed it, what statement has been broken, the host name, the user name, the app name, etc. There is no obvious way of getting that information. The problem is that you know the session ID after it has been killed, but you have to obtain the data about it just before the killing occurs, probably during the ABORT_BLOCKERS wait type. The first think that comes to my mind is to use the event notification – the obvious way of not just catching an event but take an action like perform some querying of sys.dm_exec_sessions, sys.dm_exec_connections, or whatever I would like to see as an info. BUT there is no event notifications (yet?) relevant to those new Manage Lock events. That’s really something that is missing, and I really hope that they will manage to add some events in the next CTP or most important in the RTM.
While searching for other less obvious ways of catching that information, I am adding an xEvents to my existing xEvent session in order to try to find transactions that are changing their status near the time of process killing. I managed to achieve some good results with the sql_transaction event, with additional sql_text action:
Here along with the previous monitoring of the lock acquiring process I have all the transactions that are performing by both sessions and some system level transactions that are building the new index structure. You can easy track them in the result above. They are really interesting to see and explore as they present a detailed view of the online rebuild index operation, but it is kind of out of scope and I realize that the post becomes long enough. The important information that I was going to catch about the user transaction is shown on the row below the process_killed_by_abort_blockers event. It is the transaction of the session_id 55, with a transaction_id 343974 (the number is different because of couple of tests that I performed). So that is what I was looking for!
Obviously its time to precise the info, to define an xEvent Session that will be more useful to implement in a production instance and will have only the necessary information. Remember, I wanted to be able to catch all the killed sessions by the Managed Lock Priority OIR and all the important details about them, so when I choose to run the OIR by a scheduled job, I want to have a “report” of what happened with my user transactions and which of them were affected.
First of all I need only two events in my xEvent Session, because they give me all the info I need: the process_killed_by_abort_blockers and the sql_transaction event.
I need additional actions as: client_app_name, client_host_name, database_id (for sql_transaction), session_id, sql_text and username. Of course you can choose different actions or you can add more if you like, but take into account that they add some extra weight in the xEvent session.
I need to have a very strict filtering on sql_transactions event, so I am limiting the event escalation only for the the specific database, only to user sessions (>50) and only to transactions with status Rollback:
In order to have a definite prove that an sql_transaction event of a Rollback transaction is a result of a process_killed_by_abort_blockers event (beside the match of session IDs) I enabled the causality tracking (Track how events are related to one another check box). When causality tracking is enabled, each event that is fired has a unique activity ID across the system. When one task causes work to be done on another, the activity ID of the parent is sent to the child task. The child task outputs the parent’s activity ID the first time it fires an event.
I define the asynchronous file target as it is more appropriate option for continuous offline monitoring and analysis.
Here is the xEvent session DDL:
CREATE EVENT SESSION [KilledByOIR] ON SERVER
ADD EVENT sqlserver.process_killed_by_abort_blockers(
WHERE ([sqlserver].[database_id]=(6) AND [sqlserver].[session_id]>=(50))),
ADD EVENT sqlserver.sql_transaction(
WHERE ([package0].[equal_uint64]([sqlserver].[database_id],(6)) AND [package0].[greater_than_equal_uint64]([sqlserver].[session_id],(50)) AND [transaction_state]=(2)))
ADD TARGET package0.event_file(SET filename=N'KilledByOIR') --this will create the file in default Log folder location
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)
When I simulate the OIR with an user session again, I can see only event couples of process_killed_by_abort_bockers and the respective sql_transaction that has been killed.
The attach_activity_id_xfer_quid col value of the sql_transaction has the same guid as its abort_blockers process above it as shown by the causality tracking, the session_id is the same as the killed_process_id. So it is clear now which process has been killed which session and all the session details are also available.
Here is the xQuery for reading the file target:
n.VALUE('(@name)', 'varchar(50)') AS event_name,
DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP),
n.VALUE('(@timestamp)', 'datetime2')) AS [TIMESTAMP],
n.VALUE('(action[@name="sql_text"]/value)', 'varchar(max)') AS [sql_text],
n.VALUE('(action[@name="attach_activity_id"]/value)','uniqueidentifier') AS [action_id],
n.VALUE('(action[@name="attach_activity_id_xfer"]/value)','uniqueidentifier') AS [action_xfer],
n.VALUE('(data[@name="object_id"]/value)', 'bigint') AS [object_id],
n.VALUE('(data[@name="transaction_id"]/value)', 'int') AS [transaction_id],
n.VALUE('(data[@name="killed_process_id"]/value)','bigint') AS [killed_process_id],
n.VALUE('(data[@name="database_id"]/value)','int') AS [database_id],
n.VALUE('(action[@name="client_app_name"]/value)','nvarchar(4000)') AS [client_app_name],
n.VALUE('(action[@name="client_hostname"]/value)','nvarchar(4000)') AS [client_hostname],
n.VALUE('(data[@name="duration"]/value)','int') AS [duration],
n.VALUE('(action[@name="session_id"]/value)','int') AS [session_id],
n.VALUE('(data[@name="transaction_state"]/value)','nvarchar(4000)') AS [transaction_state],
n.VALUE('(action[@name="username"]/value)', 'nvarchar(4000)') AS [username]
SELECT CAST(event_data AS xml) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\KilledByOIR*.xel',
CROSS apply event_data.nodes('event') AS q(n)
The option of killing blockers and enforcing OIR gives you the most predictability of the index maintenance process, but at the highest price and with having the highest affect on the workload. You still have some level of unpredictability, because you don’t know which transactions will be rolled back, you can’t define precedence for specific transactions, some transaction could take longer to rollback.
Use the option when you know you have the less intensive workload, at the time it is no so critical and in case you can afford breaking user sessions. Here are some additional recommendations:
- use OIR only for indexes on objects that are really critical, for others perform offline index rebuild wherever possible
- Never rebuild all the indexes of the database using maintenance plan rebuild index task, use filtered index rebuild instead of that
- If you decide to use the OIR precedence option, then implement the proposed xEvent session in order to have information of what has been killed during the ALTER INDEX operation. It’s always important to
- ensure that transactions have retry logic
Some questions and expectations will remain for the next CTP and the final RTM release. Event notifications are probably the most important of them.