Find and tune inefficient query executions
Or a way to find cardinality errors in your cached execution plans
If SQL Server can’t accurately predict estimated number of rows for a query plan operator, then a discrepancy appears in the estimated v/s actual number of rows in the query plan. QP creates a query plan and choses a specific operator because it expects that the query returns some number of rows, but actually when the plan executes, a lot more (or far less) number of rows go trough the operator. This is called cardinality error, because SQL Server makes an error on estimating cardinality of an operator. The result of cardinality error in almost every case is that the query executes with suboptimal execution plan. Of course it means that query has performance problems and delays in some executions.
For example: If we look for Amazon orders for a customer in Varna then we expect to find 100 rows of overall of 10000 in the orders table. On the other side If we use the same plan (plan is cached and re-used, which is mostly the goal) to find the orders for customers in Chicago, then there will be for sure a lot more of them, 20000 for example. On the first execution, the QP uses Nested Loop Join and Key Lookup (bookmark operator) to produce those 100 rows. But the next execution will re-use the same operator to do that and it will result in tremendous number of logical IOs and potential slow execution.
Hash and Sort warning events which I blogged about are directly connected to cardinality errors. They are resut of specific cases when “expensive” hash and sort operators are used in the query plan.
Main Reasons of facing cardinality errors are:
- statistics issues
- bad parameter sniffing
Using specific programing constructs (variables, comparing columns of the same table in the WHERE clause and other described in the link http://msdn.microsoft.com/en-us/library/ms181034(v=sql.105).aspx ) also results in cardinality assumptions and errors. But the common root cause is data distribution in tables.
We can easy find a cardinality error problem when we analyze specific query plan trough SSMS. We just need to execute the query with different parameters, show the actual execution plan, and monitor the query plan operators – their estimated v/s actual number of rows. Here is an example:
CREATE PROCEDURE SalesByCustomer
SELECT * FROM [Sales].[SalesOrderHeader]
Let’s test different stored procedure executions. First execution compiles the stored procedure and returns 2 rows:
SET statistics IO ON
EXEC SalesByCustomer 11005 --Key Lookup, 2 rows, Logical reads 8
Second execution re-uses procedure plan, but processes 10K rows instead of 2. As a result of using same execution plan with the bookmark operator it generates 30K logical reads, which is far more than it would do if table scan was executed – the table has 689 data pages.
EXEC SalesByCustomer 2935 -- Key Lookup, 10001 rows, logical reads 30032
Here is the cardinality error pattern of the query plan operators which can be seen by the discrepancy between estimated and actual number of rows (2.15789 v/s 10001):
We have an option of course to setup different ways of capturing problems beforehand. We can catch Hash and Sort warnings if we have enabled those trace events in advance. We can use Event Sessions, Event Notifications, xEvents, etc. We need to setup all those in advance in order to catch the workload execution for a specific time period.
But is there a way to find out if the currently cached queries have been executed inefficiently having potential cardinality error problems?
We don’t have a direct way to find this out as we don’t have a cardinality error event. All the ways we have are indirect ways of searching for a pattern of execution discrepancies.
The common pattern of cardinality error is: There are (big) differences in number of rows and resources costs for different executions of the same query plan. So we need to find out and compare the min and max values of the resources that has been spent for the executions per query plan (query hash).
The same method can be used not only for analyzing currently cached queries of course, but in case we capture workload for further analysis. When I speak about cached queries I am referring to the loved sys.dm_exec_query_stats DMV, especially in SQL Server 2012, where the DMV has some new very useful columns like total number of rows, min number of rows and max number of rows. If we find big differences between min and max IOs, min and max CPU costs and min and max number of rows, then those queries are candidates for further monitoring and optimizations because they are facing cardinality errors.
Using the example stored procedure, we can find that this specific object has cardinality error pattern. Here is the code for querying the sys.dm_exec_query_stats:
WITH TopQueries (QueryHash, QueryText, PlansNumber,
MinReads, MaxReads, TotalElapsedTime,
MinElapsedTime, MaxElapsedTime, MinRows, MaxRows)
SELECT qs.query_hash AS QueryHash, txt.text AS QueryText, SUM(qs.plan_generation_num) AS SumPlanN,
SUM(qs.execution_count) AS SumExexCount,
SUM(qs.total_worker_time) AS SumTotalWorkerTime, -- in order to find out the most expencive
MIN(qs.min_worker_time) AS MinWorkerTime,
MAX(qs.max_worker_time) AS MaxWorkerTime,
SUM(qs.total_logical_reads) AS TotalReads,
MIN(qs.min_logical_reads) AS MinReads,
MAX(qs.max_logical_reads) AS MaxReads,
SUM(qs.total_elapsed_time) AS TotalElapsedTime,
MIN(qs.min_elapsed_time) AS MinElapsedTime,
MAX(qs.max_elapsed_time) AS MaxElapsedTime,
MIN(qs.min_rows) AS MinRows,
MAX(qs.max_rows) AS MaxRows
FROM sys.dm_exec_query_stats qs
CROSS apply sys.dm_exec_sql_text(qs.sql_handle) AS txt
WHERE qs.execution_count > 1
GROUP BY qs.query_hash, txt.text
SELECT QueryHash,QueryText,PlansNumber,SumExecCount, SumTotalWorkerTime,
MinWorkerTime, MaxWorkerTime, MaxWorkerTime-MinWorkerTime AS DiffWokerTime,
MaxReads, MinReads, MaxReads-MinReads AS DiffNumberOfReads,
MinElapsedTime, MaxElapsedTime, MaxElapsedTime-MinElapsedTime AS DiffElapsedTime,
MinRows, MaxRows, MaxRows-MinRows AS DiffNumberOfRows
ORDER BY TotalReads DESC
Here are the results in our case:
Differences are really high and we can definitely say that this query plan has experienced some cardinality problems in some of it’s executions. You can see how high the Max values are comparing to Min values.
You can run this CTE in SQL Server 2008 and 2008 R2 but just remove number of rows columns.
What we should do further when we manage to isolate potential statements having cardinality errors? Here are some recommendations:
- Take care of regular statistics updates for tables that are referenced by the statements. Here the stat is not the problem, but if they are out of date they could lead to the same problem (part of another post coming soon)
- Use hints or plan guides for those procedures. RECOMPLE hint is the common solution. It can be set either:
- in CREATE PROC body, which guarantees that the procedure will always be recompiled
- in EXEC …with RECOMPILE, which is set for specific (or every) execution. The specific execution is not cached.
- as an option on statement level OPTION(RECOMPILE). Every execution is cached, not the whole procedure is recompiled but just the statement inside
If you can’t change the code you can use plan guide in order to force the hint. Here is an example:
@name = N'ResolveCardinalityErrorOfSalesByCustomer',
@stmt = N'select * from [Sales].[SalesOrderHeader]
@TYPE = N'OBJECT',
@module_or_batch = N'SalesByCustomer',
@params = NULL,
@hints = N'OPTION (RECOMPILE)';
You can see the plan guide object in the object browse in database Node:
When the plan is executed using a plan guide you can find it inside the plan properties.
Now every time the statement is found in the SalesByCustomer procedure it will be recompiled, such creating an optimal plan for every execution.
That’s all folks. Thanks for reading! 🙂