What causes ad-hoc to run with bad plan and its resolution

Category: sql server transactsql

Question

Daizy Desai on Wed, 16 Jul 2014 16:31:25


One of our Company Application funcnalities uses the ad-hoc query that returns top first row from the result sets .There has been several times since this ad-hoc query ran with a bad execution in last 8 months .There is no specifice time and date or day when this query run with bad execution plan ,however whenever this query ran with a bad execution plan on SQL ,that happed always early in the morning , before 6am .The setting of database parameterization is set to forced .There is SQL maintenance reindexing job that runs each night at 2am that changes the recovedry model of our production database from Full to simple and changing back from simple to full after the reindexing operation .That causes the SQL plans to flush each night .The next day morning , when the application ad-hoc query runs on SQL first time , SQL generates a new plan for that query . Some strange reason , SQL genarates a good plan for that ad-hoc query most of the times ,but sometime this query picks up a bad plan .Whenever it happen,the application function becomes exetremly slow and it seems that the rest of the fuctionalities work fine meantime .

Our application vender is aware of this issue and this issue will be fixed in the application next release .Developer is converting the ad-hoc query into stored procedure to fix this issue .Application new relase will not available for next 2-3 months ,meantime we still have to deal with the ad-hoc query bad execution plan issue . Usually ,when the ad-hoc query run with the bad execution plan , I retrieve the query bad plan and just remove the plan for that particular query from cache .Once the bad plan is removed from cache , the perfomance gets back to normal .

As my understanding, this issue is related to parameter sniffing .but anyone thinks there are other factors responsible for this issue such as the network problem , the resource intensive report ran or something else .

Has anyone have step by step troubleshooting steps that could help me with my investigation and what areas  do I need to look into in order to investigate what could caused the ad-hoc query to pick up bad plan ?

Any help will be apprecited .

Daizy

 

Replies

Erland Sommarskog on Wed, 16 Jul 2014 22:08:24


First I like to clear up some misconceptions:

There is SQL maintenance reindexing job that runs each night at 2am that changes the recovedry model of our production database from Full to simple and changing back from simple to full after the reindexing operation .That causes the SQL plans to flush each night .

The change of the recovery model does not flush the cache. But reindexing the table which produces new statistics does.

Changing from recovery model from full to simple and back, sounds like hazardous to me. Does a full backup run after the recovery model has been restored? Else you are in fact running in simple recovery all day, and you can't do recovery to point in time. Rather than switching to simple recovery you should switch to bulk_logged recovery, so that you don't break the log chain.

So over to your problem. Apparently, the first time the query runs in the morning, it runs with atypical parameters; this is not uncommon. You get a plan which is good for this initial run, but not for regular execution. The fact that you run with forced parameterisation aggrevates the situation; It could in fact be a different query that runs, but which is parameterised to the same string.

Re-writing the query into a stored procedure may or may not help. If the same procedure is used in the morning, you will get the same effect.

If you look at the XML for the bad plan, you can see the values the plan was sniffed for at the very end of the XML. Maybe that can give you a clue where this query come from. If this initial query can be changed, you may be able to resolve the problem. Changed here means here as little as adding a space, so that it hashes to a different value.

Daizy Desai on Thu, 17 Jul 2014 21:50:00


Thanks Erland for your reply.

When we upgraded our Production Database to SQL server 2008R2 enterprise Edition. We ran into the transaction log file to grow too large during the reindexing job activities at night .To prevent the transaction log file to grow big , we had decided to change the recovery model of database from FULL to Simple before the reindexing job starts and the recovery model of Database  is changed from Simple to Full after the reindexing job finishes . We also have the differential backup job setup right after the recovery model change back to full so the transaction log backup chain does not break.

 Changing the database recovery model does flush the SQL plans for specific Database. Please review this link http://sqlblogcasts.com/blogs/steveh/archive/2010/09/02/changing-recovery-model-removes-plans-from-procedure-cache-for-a-database.aspx

Back to my problem ,The ad-hoc query triggers by application using sp_executesql command .Since the SQL cache flushes each night , SQL generate a new plans when this query run first time the next day in the morning .Whenever we had an issue with this query with bad execution plan, I found the creation time of plan for this query always for the same day and time is early in the morning .

I have pretty much all required info regarding this query , but what other areas or investigation do I need to do for the query parameter sniffing issue ?

As Erland said  adding space in query ,I don’t know adding space means just add additional space in the query so it hashes a different values .Please advice .

Thanks ,

Daizy

Erland Sommarskog on Thu, 17 Jul 2014 22:39:11


When we upgraded our Production Database to SQL server 2008R2 enterprise Edition. We ran into the transaction log file to grow too large during the reindexing job activities at night .To prevent the transaction log file to grow big , we had decided to change the recovery model of database from FULL to Simple before the reindexing job starts and the recovery model of Database is changed from Simple to Full after the reindexing job finishes . We also have the differential backup job setup right after the recovery model change back to full so the transaction log backup chain does not break.

The transaction log chain breaks when you set the recovery to simple.

Yes, if you take a diff backup, you can apply transaction logs from that point if the diff backup is good. If the diff backup is bad, you will lose all updates after you set recovery to simple. And, yes, when you hardware starts to deteoriate, you should not be surprise that the last backup is not good.

Again, please stop setting recovery to simple, unless you have a very relaxed RPO in your SLA. Use bulk-logged recovery instead.

By the way, you do take a t-log backup before you change the recovery mode, don't you?

 Changing the database recovery model does flush the SQL plans for specific Database. Please review this linkhttp://sqlblogcasts.com/blogs/steveh/archive/2010/09/02/changing-recovery-model-removes-plans-from-procedure-cache-for-a-database.aspx

Hmmm. I don't have the time to verify this right now. Anyway, reindexing is good enough to flush the cache of most plans.

I have pretty much all required info regarding this query , but what other areas or investigation do I need to do for the query parameter sniffing issue ?

Have you researched what values the initial plan is sniffed for?

As Erland said  adding space in query ,I don?t know adding space means just add additional space in the query so it hashes a different values .Please advice .

Obviously, you need access to the application code. The other alternative is to set up an early morning job that runs after this first execution and which forces a recompilation; either by evicting this specific plan or more brutally runs sp_recompile on one of the tables in the query.

Jingyang Li on Fri, 18 Jul 2014 03:33:59


When you switch recovery mode from full to simple, you are done when you finishing switch. But when you change to FULL, you need to take a full backup to initial your backup chain with t-log and/or diff log backups.

Daizy Desai on Fri, 18 Jul 2014 20:48:43


Thanks Erland for your reply .

I start to using bulk-logged recovery .I do not have Transaction back before recovery chnage . Do I still need to take transaction log backup ever though if I change the recovery to Bulk -logged ?? 

Are there any ways to make sure the differential backup is good or bad ? I have restored Full, diffrential and all t-log backups on test server without any issue . This is enough to ensure the backups are good ??

I do have a list of parameter values whenever the query initial plab was sniffed . I checked the exection plan for this query using the parameter sniffed values and it seems that it picks up expensive plan if the plan is first time is created for that query .

Setup automated job means using the plan guide option .Right ?

Thanks

Daizy

Erland Sommarskog on Fri, 18 Jul 2014 21:36:33


I start to using bulk-logged recovery .I do not have Transaction back before recovery chnage . Do I still need to take transaction log backup ever though if I change the recovery to Bulk -logged ?? 

If you want to be able to recover to a point in time before you switch to bulk_logged recovery, yes.

As long as you don't perform any minimally logged operations, there is no difference between bulk_logged and full recovery. But if a transaction-log backup includes a minimally logged operation, that particular backup can only be applied in whole; you cannot stop at any point in the backup, not even one that comes before the minimally logged operation. (Reindexing is one such as example.)

Are there any ways to make sure the differential backup is good or bad ? I have restored Full, diffrential and all t-log backups on test server without any issue . This is enough to ensure the backups are good ??

That's a good plan. Running DBCC CHECKDB is the ultimate test. Depending on time, you may not want to do this every day, but let's say once a week.

Setup automated job means using the plan guide option .Right ?

That is not what I had in mind, but a plan guide may in fact be an excellent idea. I have some text about this here:
http://www.sommarskog.se/query-plan-mysteries.html#planguides

Shanky_621 on Fri, 18 Jul 2014 22:59:47


Rather than switching to simple recovery you should switch to bulk_logged recovery, so that you don't break the log chain.

Erland this is correct but you still loose point in time recovery if in bulk logged recovery model you perform bulk logged operation so NO changing to bilk logged recovery model is not an option here

Daizy,

You should not rebuild indexes using Maintenance plan instead create your own intelligent script or may be use Ola Hallengren script.

Online index rebuild produces less logs than Offline I did a test to prove this

http://social.technet.microsoft.com/wiki/contents/articles/24420.curious-case-of-logging-in-online-and-offline-index-rebuild-in-full-recovery-model.aspx

You should be patched to latest Service pack and CU because Online index rebuild Might increase database size after rebuild as a bug it added 14 bytes extra but if you are on latest SP no worries



Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

My TechNet Wiki Articles