UPDATE 2012-03-16: Please also take a look at Slowly Changing Dimensions with MD5 Hashes in SSIS which we have determined to be fastest, most efficient approach to maintaining Type 1 dimensions.
A few days ago, one of our SSIS packages that maintained a Type 1 Slowly Changing Dimension (SCD) of about 1 million rows crept up to 15 minutes of runtime. Now this doesn't sound too bad, but this is part of our hourly batches, so 15 minutes is 25% of our entire processing window. The package was using the Slowly Changing Dimension Wizard transformation - we were doing the standard OLEDB Source (which basically represented how the SCD "should" look) and then sending it to the SCD transform and letting it figure out what needed to be inserted and updated. One option was to switch to lookups instead of the SCD wizard to speed things up, maybe even some fancy checksum voodoo for the updates (see http://blog.stevienova.com/2008/11/22/ssis-slowly-changing-dimensions-with-checksum/ for an example). Then after thinking about it a little more - why are we sending a million rows down the pipeline every hour? We know only a small percentage of these are new - and another small percentage needs to be updated. Well, we can just write a quick SQL query to get us just those sets and the package would be much more efficient!
Wait a tick - why would we give the rows to SSIS if all it is going to do insert one set and update the other? Let's just do it all in T-SQL:
First of all - I know, don't do this. The application that put the URL in the column in the first place is MUCH better at handling URLs and ideally, you would just add columns for the GET variables you're after and have the application put those in. Parsing them out after the fact from a VARCHAR field is insane.
So now we're here and we need to do that thing that we said we shouldn't do. My approach is to use a Table Valued Function that only returns one column - the value of the variable or NULL if it can't find the variable in the querystring.
I have always felt that the traditional approach to getting Agent Job status by creating a table and populating it with output from xp_sqlagent_enum_jobs was incredibly crude. So I poked around a bit and came up with a solution using a Table Valued Function that queries sysjobactivity to determine the status of a given job. The reason I use a Table Valued Function is because we have a few tables that contain a list of jobs that need to be started at certain points in a given process (and they may or may not already be running) - with the TVF, you can simply CROSS APPLY the function and bob's your uncle.
I run this one pretty frequently when we need to figure out what procs are killing a complex ETL process and what exactly about them is making the server cry. So basically, if it's on the development server, I'll do a
DBCC FREEPROCCACHE and a
DBCC DROPCLEANBUFFERS, run the entire set of ETLs, then run this query and then dig deeper in to the query plans that look suspect (high *Scan counts usually, sometimes lots of Hash Matches or Merge Joins). On a production server, the clearing of the proc cache and dropcleanbuffers can be problematic so I'll often just run the query after a scheduled ETL run. If you want to see the query plans mapped out visually, click on the query_plan value and SSMS will open up the XML. Then save that XML file as a .sqlplan file. Once you have that, close the XML and then open the .sqlplan file.