Your browser (Internet Explorer 7 or lower) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.

X

Update with WRITE on varcharmax column leads to time-outs

The following interesting case came out from our SQL User Group member Ilian Yovchev who works for David Holding company.

They have updated their text filed types to varcharmax data type according to the SQL Server roadmap and deprecation plans. They use update with WRITE to insert data in those new fields.

But they faced time outs during some updates and on query executions against the table with varcharmax field type. They have traced those time outs and found that the reason behind them is sync stats update performing on varcharmax col. Strange, isn’t it? Well, not exactly as synch stats update (Asych Stats Update=OFF in db level) can potentially lead to query time out.

Let’s see the specifics of this case. The database auto-update stat is ON, auto-create stats is also ON, asynch stats update is OFF, no query is searching in its WHERE clause the varcharmax column. Here is the scenario:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CREATE TABLE dbo.[E_MAILS]
(
[MailID] [INT] IDENTITY(1,1) NOT NULL,
[DateCreated] datetime NOT NULL,
[DateSent] datetime NULL,
[FROM] nvarchar(250) NOT NULL,
[TO] nvarchar(250) NOT NULL,
[DATA] nvarchar(MAX) NULL,
[RetryCount] [INT] NULL,
[STATUS] nvarchar(MAX) NULL,
[DateLastRetry] datetime NULL,

CONSTRAINT [PK_E_MAILS] PRIMARY KEY([MailID])
)
GO

-- Insert record
INSERT INTO dbo.[E_MAILS] ([DateCreated], [FROM], [TO])
VALUES (GETDATE(), N'abc@example.com', N'abc@example.com')

-- Get record ID
DECLARE @varMailID INT
SELECT @varMailID = SCOPE_IDENTITY()

-- Set Data to empty string (UPDATE .WRITE does not work with NULL values)
UPDATE dbo.[E_MAILS]
SET [DATA] = N''
WHERE ([MailID] = @varMailID)

Stats that are created just as expected for the PK column which is the clustered index key:

Now let’s insert text data in data column

1
2
3
4
5
6
7
8
9
--insert text

DECLARE @varMailID INT
SELECT @varMailID = SCOPE_IDENTITY()

-- Append text to Data
UPDATE dbo.[E_MAILS]
SET [DATA].WRITE(N'abcdef', 0, NULL)
WHERE ([MailID] = @varMailID)

Here is the stats created automatically after the last update statement:

You didn’t expected this stats to be created, especially after update statement? Nor did I! Let’s check the details. There is nothing that could be said about this stats yet.

I inserted some rows and checked again as stats was automatically updated reaching their thresholds:

It definitely String index stats on data column.

Actually the problem is not on the stats itself, although it is really unexpected to have it cteated at all, because the column is not listed in the WHERE clause of the update statement. The problem comes when the stats have to be updated, synchronously by default because of the database setting. Then the overall query time outs – query is trying to execute, but stats update threshold is reached, plan is invalidated, update stats is starting taking a bit of time to complete and such query execution times out. The annoying here is that the stats most probably will not be used in 90% of the queries even if queries are referencing this column in where clause. So it is not worthy to have it.

The problem appears on SQL Server versions prior to 2012. If  you try this in SQL Server 2012 the stats on varcharmax col is not created automatically.

There is a similar connect item:

http://connect.microsoft.com/SQLServer/feedback/details/683303/string-summary-slow-statistics-creation-on-varbinary-max-column

The difference is that in our case the stats are produced on the varcharmax column and on update statement without referencing the col in the WHERE clause.

The work around is the following:

  1. Set Asynch stats update to ON at the database level. You probably will say: is it worthy/correct to switch database level setting because of one stats not behaving corectly!? And  you will be right! But this option is actually a good one for the overall database performance. There will be probably some cases when you would like to have a plan based on last stats, but they are comparatively less than the cases when synch stats updates could potentially harm your queries and lead to time outs. You can at least monitor time outs if any and decide yourself. Beside that, even if you dont have time outs at all, synch stats updates could possibly cause some delays in query executions. So I suggest you at least try and measure  your database performance with this option set to ON. I am not going to go into further details here as I am preparing another comprehensive posts on statistics (it becomes longer and longer every day and I have to really post it before it becomes unreadable 🙂
  2. The second path to go is to control stats update on lower granularity level – per stats, not per database. We have an option to do that using NORECOMPUTE clause on UPDATE or CREATE stats. In order to do that effectively there must be a way to exclude the specific stats from the automatic stats updating (synch or asynch doesn’t matter). As Paul White proposed, the good way of doing this is to create this stats manually beforehand instead of waiting  for SQL Server to create it, such having control over all of its parameters:
1
2
3
4
5
CREATE STATISTICS stats_Data

ON dbo.E_MAILS (DATA)

WITH NORECOMPUTE;

In this case it means deleting the _WA stat first and creating stats_data. Having the stats on the data col already in place SQL Server will not going to create it, and because we have specified NORECOMPUTE, it will not be updated no matter if synch or asynch update is set on database level. If we want it to be up to date then we can schedule update stats on table for example.

Actually both work around options are effective every time we have such a time out problem with specific stats or specific queries, not just when updating a varcharmax col. If we have queries that time outs because of synch stats (trace them in order to be sure) then first option is actually better. If this is a specific query that cause problems that second approach could work, and if the stats are important then you can have manual control and perform a scheduled update.

That’s all folks! Thanks for reading 🙂