tim laqua dot com Thoughts and Code from Tim Laqua


Uncivil UNIONs

If you asked any SQL Server Developer or DBA what UNION does, they would tell you it combines two result sets. Then if you were to ask them if they knew it removed duplicates without the ALL argument, they would say yes. That's academic, but I dare you to go look at their and your code and see how often UNION without the ALL argument is used. UNION is evil. Why do I say that? Because DISTINCT is evil and blocking operations are evil so it follows that UNION is evil.

Would you be OK with the following query as the source for a fact table?

  SELECT DISTINCT [c1],[c2] FROM [t1]
  SELECT DISTINCT [c1],[c2] FROM [t2]
) a

No? Then why are you OK with this?

SELECT [c1],[c2] FROM [t1]
SELECT [c1],[c2] FROM [t2]

Personally, I tend to use UNION to combine data from two different sources. For example, if I wanted to combine the sales transactions from two JDE instances - the query might look like:

SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]

Each line here in a sales detail represents a line on an order, so assuming you selected all the PK columns in your query, a UNION won't remove any rows within either set, but when combined - it's possible the same PK could exist in both systems. Now you could split hairs and say it's highly unlikely the rest of the row would be identical, but that's not the point - it's wrong to ever DISTINCT a set of transactions intended for creating a fact table. Sure, in this particular case you could add a "SourceSystem" column that would stop that, but then you still have to accept that the UNION is blocking and expensive. Just because a DISTINCT doesn't modify results doesn't mean it didn't do any work - it actually did a great deal of work. Now consider that we're usually not selecting 2 columns, we're selecting 10-20. Do you want to wait for it to figure out all the rows are distinct when you already knew that? Further, if they're not distinct you probably screwed up a JOIN somewhere.

As far as coding standards regarding UNION go, I would suggest never using UNION without the ALL clause and doing your DISTINCT elsewhere if that's actually what you intended. Does that seem silly? Yes - but I would wager you won't ever use the 2nd one outside of building Dimensions.

SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]
  SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
  SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]

And don't get me started on unioning subqueries with GROUP BY's - Why don't you just throw half your processers in the garbage, light tempdb on fire and put all your data in heaps.

Tagged as: , 3 Comments

SQL Schema Source Control 2.0

SQL Schema Source Control is, at it's core, just another database scripting app. Where this app differs from others is that the sole purpose of scripting out the database is to check it in to source control (sorry, it still only supports SVN).

I think this may have once gone under the name "SQL2SVN," but was rebranded before being checked in to Codeplex for the first time. This is a re-write of most of the application to be more modular and extensible in the future. Additional features found in this release are:

Tagged as: , Continue reading

Monitoring Failed Report Server Subscriptions

While we love to empower our users as much as possible, we still need to pay attention to them and what they're up to. The first place I usually see users running amok as Microsoft BI adoption grows in organization is the Report Subscriptions in SSRS. They go nuts with them. I was at one company where 50% of the report subscriptions went to users outside of the company - that means our customers were depending on these subscriptions! Which brings up an interesting licensing debacle... but that's another story. Needless to say, when people set up a subscription, they expect it to work. If it doesn't, we REALLY need to let someone know.

Here, we have a procedure to monitor for Reporting Services subscription failures. Specifically, email subscription failures. When a failed subscription is detected, an email is sent to both the subscription owner and a digest of failures is sent to the specified admin group.

Tagged as: , , Continue reading

Locating Rampant Database Growth

Every now and then you get a helpful alert from one of your database servers about disk space being low (you're monitoring that, right?), or a bunch of autogrowth alerts (you're monitoring that too, right?) - but what happens when you get these for a database that you don't expect growth in? Further, what happens when that database is growing rampantly (say like 1GB/hr in my case) and it's a canned database from a 3rd party product? This time it was the database that SolarWinds uses for collection - and apparently it was collecting a lot of something that it wasn't collecting before.

  1. you send out an email asking the end users of the system (IT in this case) if anything changed
  2. adjust so said system stops trying to fill up your drives

Unfortunately, there's often quite a few possibilities for "why" a database is growing - and when it's a canned product, you don't always have the best understanding of why it does what it does when it does it. As a Database Admin, you can help diagnose the problem by letting everyone know what exactly is growing:


Script sp_configure Output To Migrate Settings Between SQL Servers

Sometimes when setting up a new server, all you want is for it to work and be configured exactly like the old server. In some cases, the new server is almost identical in every way already. If you've worked much with SQL Server, you know very well that there are a billion switches and knobs and every time you do this, you forget at least one. I had a similar situation recently and figured I'd be lazy about it and just turned the output of sp_configure in to a script I could execute on the new server:



I think most people setup the drives on their Development servers to match their Production servers - this is so that restores go smoothly and files go where they're supposed to, things match up nicely, etc. Unfortunately, when you create a full backup all the backup file contains is the logical name of all the files - no physical paths. This means that if the database doesn't exist on the destination server yet, the engine has absolutely no idea where to put the files or what to name them.

The following script can be executed in the context of the source (original) database that was backed up to script out the MOVE statements if your plan is to put the files in the same place on the destination server.


Clone Analysis Services Partitions with PowerShell

Most of us with large Analysis Services cubes partition our cubes by month or year or some other time-based slice and we have all, at one point or another, developed some way to create partitions for new months on-demand. Often, the solution to this seems to be a C# console application or SSIS package using AMO to create a new partition based off an existing partition. The problem I see with this is that maintaining it requires opening up the project or package, making changes, re-compiling, deploying, testing, deploying to production, verifying, etc. It also requires that whoever is going to maintain it is comfortable with C#.

To simplify the maintenance and get rid of the "black box" factor that utility apps like this tend to have, I put together a PowerShell script to do the same thing and a stored procedure to call the script. Really, it doesn't matter what you use as you're most likely using an almost identical chunk of code to get your new partition created - my argument is that using PowerShell instead of C# or SSIS reduces the cost of maintenance, improves readability, and facilitates better understanding throughout your team.


Trending SQL Server Agent Job Duration by Hour

Earlier today I noticed a SQL Server Agent job taking a little longer than usual (or what I thought was longer than usual). Let's face it, we're not staring at the Job Activity monitor all day, so unless you've written a report to monitor job run times - on occasion you ask yourself "is that a normal run time for this thing?" The job I was curious about happened to be a job that runs throughout the day and should only have real work to do once or twice an hour - and it should run roughly the same amount of time on any given business day for a given hour (i.e. at 1:00 PM on any given business day, this thing should do the same amount of work).

So I came up with the following query to PIVOT the run duration on the hour the job executed:


Determining when RESTORE DATABASE command will complete (SQL Server 2008)

Ah, I see you just started restoring that 1TB monster and now everyone wants to know when it's going to be finished, where you're at in the process, etc. Fear not, Microsoft is very good at making up fictional numbers for us to use as rough estimates! I usually add 10-20% on top of these estimates just incase the database gremlins wander by to ruin your day again. Or incase you encounter "storage issues."

	 percent_complete AS [PercentComplete]
	,estimated_completion_time/1000.0/60.0 AS [RemainingMinutes]
	,total_elapsed_time/1000.0/60.0 AS [ElapsedMinutes]
	,(estimated_completion_time+total_elapsed_time)/1000.0/60.0 AS [TotalMinutes]
	,DATEADD(MILLISECOND, estimated_completion_time, GETDATE()) AS [EstimatedTimeOfCompletion]
	,st.text AS [CommandSQL]
from sys.dm_exec_requests r
	cross apply sys.dm_exec_sql_text(r.sql_handle) st

Determining how long a database will be IN RECOVERY (SQL Server 2008)

So, your MSSQL service crashed in the middle of a big transaction? Or you bumped the service while it was rolling back some gigantic schema change (like say a column add on a 800 million row table)? Well, as you prepare your resume in preparation for the fallout from this debockle, you can use the following query to see how much time you have left. Or, I should say, how much time it thinks you have left... which seems to swing wildly up and down... microsoft math ftw.

DECLARE @DBName VARCHAR(64) = 'databasename'
DECLARE @ErrorLog AS TABLE([LogDate] CHAR(24), [ProcessInfo] VARCHAR(64), [Text] VARCHAR(MAX))
exec sys.xp_readerrorlog 0, 1, 'Recovery of database', @DBName
	,SUBSTRING([Text], CHARINDEX(') is ', [Text]) + 4,CHARINDEX(' complete (', [Text]) - CHARINDEX(') is ', [Text]) - 4) AS PercentComplete
	,CAST(SUBSTRING([Text], CHARINDEX('approximately', [Text]) + 13,CHARINDEX(' seconds remain', [Text]) - CHARINDEX('approximately', [Text]) - 13) AS FLOAT)/60.0 AS MinutesRemaining
	,CAST(SUBSTRING([Text], CHARINDEX('approximately', [Text]) + 13,CHARINDEX(' seconds remain', [Text]) - CHARINDEX('approximately', [Text]) - 13) AS FLOAT)/60.0/60.0 AS HoursRemaining
FROM @ErrorLog ORDER BY [LogDate] DESC

Update - Sql Server 2012 uses a different stored procedure to read the error log:

DECLARE @DBName VARCHAR(64) = 'databasename'
DECLARE @ErrorLog AS TABLE([LogDate] CHAR(24), [ProcessInfo] VARCHAR(64), [TEXT] VARCHAR(MAX))
EXEC master..sp_readerrorlog 0, 1, 'Recovery of database', @DBName
	,SUBSTRING([TEXT], CHARINDEX(') is ', [TEXT]) + 4,CHARINDEX(' complete (', [TEXT]) - CHARINDEX(') is ', [TEXT]) - 4) AS PercentComplete
	,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0 AS MinutesRemaining
	,CAST(SUBSTRING([TEXT], CHARINDEX('approximately', [TEXT]) + 13,CHARINDEX(' seconds remain', [TEXT]) - CHARINDEX('approximately', [TEXT]) - 13) AS FLOAT)/60.0/60.0 AS HoursRemaining
FROM @ErrorLog ORDER BY [LogDate] DESC