tim laqua dot com Thoughts and Code from Tim Laqua

23May/093

Maintaining a Type 1 Slowly Changing Dimension (SCD) using T-SQL

A few days ago, one of our SSIS packages that maintained a Type 1 Slowly Changing Dimension (SCD) of about 1 million rows crept up to 15 minutes of runtime. Now this doesn't sound too bad, but this is part of our hourly batches, so 15 minutes is 25% of our entire processing window. The package was using the Slowly Changing Dimension Wizard transformation - we were doing the standard OLEDB Source (which basically represented how the SCD "should" look) and then sending it to the SCD transform and letting it figure out what needed to be inserted and updated. One option was to switch to lookups instead of the SCD wizard to speed things up, maybe even some fancy checksum voodoo for the updates (see http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/ for an example). Then after thinking about it a little more - why are we sending a million rows down the pipeline every hour? We know only a small percentage of these are new - and another small percentage needs to be updated. Well, we can just write a quick SQL query to get us just those sets and the package would be much more efficient!

Wait a tick - why would we give the rows to SSIS if all it is going to do insert one set and update the other? Let's just do it all in T-SQL:

23May/090

Parsing QueryString (GET) Variables From a URL using T-SQL

First of all - I know, don't do this. The application that put the URL in the column in the first place is MUCH better at handling URLs and ideally, you would just add columns for the GET variables you're after and have the application put those in. Parsing them out after the fact from a VARCHAR field is insane.

So now we're here and we need to do that thing that we said we shouldn't do. My approach is to use a Table Valued Function that only returns one column - the value of the variable or NULL if it can't find the variable in the querystring.

15May/090

Quick Analysis of Cached Query Plans in SQL Server 2005

I run this one pretty frequently when we need to figure out what procs are killing a complex ETL process and what exactly about them is making the server cry. So basically, if it's on the development server, I'll do a DBCC FREEPROCCACHE and a DBCC DROPCLEANBUFFERS, run the entire set of ETLs, then run this query and then dig deeper in to the query plans that look suspect (high *Scan counts usually, sometimes lots of Hash Matches or Merge Joins). On a production server, the clearing of the proc cache and dropcleanbuffers can be problematic so I'll often just run the query after a scheduled ETL run. If you want to see the query plans mapped out visually, click on the query_plan value and SSMS will open up the XML. Then save that XML file as a .sqlplan file. Once you have that, close the XML and then open the .sqlplan file.

7Oct/082

Locating All Checked Out Excel Documents in SharePoint (WSS 3.0)

SELECT 
	DirName,
	LeafName,
	tp_Email, 
	CheckoutDate, 
	IsCheckoutToLocal
FROM     
	AllDocs d WITH(nolock) 
	INNER JOIN Webs w WITH(nolock) ON d.WebId = w.Id
	INNER JOIN Sites s WITH(nolock) ON w.SiteId = s.Id
	INNER JOIN UserInfo u WITH(nolock) ON (s.Id=u.tp_SiteID AND u.tp_ID=CheckoutUserId)
WHERE
	d.Type <> 1 
	AND (LeafName LIKE '%.xls' OR LeafName LIKE '%.xlsx') 
	AND (LeafName NOT LIKE '%template%')
	AND DeleteTRansactionId = CAST(0 AS VARBINARY)
	AND IsCurrentVersion = 1
	AND IsCheckoutToLocal = 1
ORDER BY
	tp_Email ASC
21Jul/082

Dynamic MDX queries in TSQL

Say you want to run the same MDX query for each row in a given rowset. I needed to do this for alerting purposes, where there were different alert thresholds for different attribute values in a given dimension attribute. After struggling with passing a variable to the query argument of the OPENROWSET command, I finally found the documentation that clearly stated that the query argument CAN'T be a variable. Or a concatination of a string and a variable. I still don't understand why... but the suggested workaround is to construct a giant TSQL string and run it using the EXEC command.

Ok - but how do we get the results of the query? Basically, the only way to do this is to create a temporary table in the current scope and do an INSERT INTO that temp table in your giant TSQL query. It all ends up looking something like this: