What it took to stump the optimizer

Sometime ago I was working with a Microsoft support technician on a ticket related to some database issues on SCOM. The technician wanted to run a stored procedure to clear out some of the backlog we were seeing with a batch job.

When he ran the procedure (via Query Analyser) – we encountered an error as below. Basically an error that is wee bit rare, the optimizer telling us that it is stumped and the query is too big for it to process. The technician moved on to other ways of resolving the issue as fixing this problem involved touching some procedures that came with the product and was beyond the scope of our call. But after we were done, I was intrigued to find out what caused the optimizer to give up.scomoptimiser

I went to Line 177 of the specified stored procedure and found a statement that looked pretty small on the outside:

DELETE dbo.[CategoryOverride]

FROM dbo.[CategoryOverride] CO

JOIN #BaseManagedEntitiesToPurge BMETP

ON CO.[InstanceContext] = BMETP.[BaseManagedEntityId]

The table CategoryOverride had about 300 rows..and  on checking what went into the temp table – it wasn’t all that large either, only around 100 rows. Looking at the estimated plan from the optimizer gave me nothing – a perfectly ordinary plan with an equijoin on a clustered index. So what was causing the problem?

I looked to see if the table CategoryOverride had any triggers on it. None. Then I went on to looking at keys. The table had one foreign key defined on it.

scomo1

I decided to script this key and see what it showed me.

ALTER TABLE [dbo].[CategoryOverride] WITH CHECK ADD CONSTRAINT [FK_CategoryOverride_ManagementPack] FOREIGN KEY([ManagementPackId])

REFERENCES [dbo].[ManagementPack] ([ManagementPackId])

ON DELETE CASCADE

GO

There was my culprit – an ON DELETE CASCADE. So many other tables is this touching? A quick look at the dependancies on this table showed me no less than 14. For one record to be deleted – there were dependancies on 14 tables. I picked one record on the main table and counted how many dependant records were there to clean up. It was about 6000. So the delete statement was far from a benign straightforward delete – it had a huge amount of data to clean up and the optimizer could not generate a plan across 14 tables some of them having more cascade deletes on them.

Moral of the story – never assume the simplicity of a statement based on looking at it, or what the query optimizer tells you. The truth may be much deeper. If I had the choice of fixing it I would remove the cascade delete and go on to cleaning up data step-by-step in each of the dependant tables.