tim laqua dot com Thoughts and Code from Tim Laqua

2Feb/123

Uncivil UNIONs

If you asked any SQL Server Developer or DBA what UNION does, they would tell you it combines two result sets. Then if you were to ask them if they knew it removed duplicates without the ALL argument, they would say yes. That's academic, but I dare you to go look at their and your code and see how often UNION without the ALL argument is used. UNION is evil. Why do I say that? Because DISTINCT is evil and blocking operations are evil so it follows that UNION is evil.

Would you be OK with the following query as the source for a fact table?

SELECT DISTINCT *
FROM
(
  SELECT DISTINCT [c1],[c2] FROM [t1]
  UNION ALL
  SELECT DISTINCT [c1],[c2] FROM [t2]
) a

No? Then why are you OK with this?

SELECT [c1],[c2] FROM [t1]
UNION
SELECT [c1],[c2] FROM [t2]

Personally, I tend to use UNION to combine data from two different sources. For example, if I wanted to combine the sales transactions from two JDE instances - the query might look like:

SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
UNION
SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]

Each line here in a sales detail represents a line on an order, so assuming you selected all the PK columns in your query, a UNION won't remove any rows within either set, but when combined - it's possible the same PK could exist in both systems. Now you could split hairs and say it's highly unlikely the rest of the row would be identical, but that's not the point - it's wrong to ever DISTINCT a set of transactions intended for creating a fact table. Sure, in this particular case you could add a "SourceSystem" column that would stop that, but then you still have to accept that the UNION is blocking and expensive. Just because a DISTINCT doesn't modify results doesn't mean it didn't do any work - it actually did a great deal of work. Now consider that we're usually not selecting 2 columns, we're selecting 10-20. Do you want to wait for it to figure out all the rows are distinct when you already knew that? Further, if they're not distinct you probably screwed up a JOIN somewhere.

As far as coding standards regarding UNION go, I would suggest never using UNION without the ALL clause and doing your DISTINCT elsewhere if that's actually what you intended. Does that seem silly? Yes - but I would wager you won't ever use the 2nd one outside of building Dimensions.

SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
UNION ALL
SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]
 
--OR
 
SELECT DISTINCT * FROM
(
  SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
  UNION ALL
  SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]
)

And don't get me started on unioning subqueries with GROUP BY's - Why don't you just throw half your processers in the garbage, light tempdb on fire and put all your data in heaps.

Tagged as: , 3 Comments
11Jan/121

Hiding SSRS Schedule Jobs In SSMS

Everybody hates these things. If you're in the Activity Monitor or browsing via the SSMS tree view, these GUID jobs that represent SSRS subscriptions are really just none of our concern. Sure, I can admit that I've seen people manually fire these to re-send a given subscription, but you can just do that using the AddEvent proc in a query window. Personally - I don't want to see these... usually...

Connect to the database instance you want to filter the agent jobs out on
Browse to Databases > System Databases > msdb > Programmability > Stored Procedures > System Stored Procedures
Right-click on dbo.sp_help_category and select Modify...

At the top, change the definition of @where_clause to NVARCHAR(MAX)

19
  DECLARE @where_clause   NVARCHAR(MAX)

At the bottom, add in a few lines to append the @where_clause variable with a predicate that filters out the Report Server category when it's you from your workstation (so you can still see that category from another machine if you need to).

94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
  SELECT @cmd = @cmd + N'FROM msdb.dbo.syscategories '
 
  SET @where_clause += N'
  AND
	CASE
		WHEN 
			name = ''Report Server'' 
			AND (
				SELECT RTRIM(nt_username) + ''|'' + RTRIM(hostname) 
				FROM sys.sysprocesses 
				where spid = @@spid) = ''tlaqua|TIMSLAPTOP''  THEN 0
		ELSE 1
	END = 1 '
 
  -- Execute the query
  EXECUTE (@cmd + @where_clause + N'ORDER BY category_type, name')
 
  RETURN(@@ERROR) -- 0 means success
END

So, what on earth are we doing here? First, replace my nt_username with yours and replace my hostname with yours. From my less-than-exhaustive trial and error testing, it seems that when either the SSMS Jobs node is expanded or the Job Activity Monitor fetches jobs, two DB calls are made - one to fetch categories and another to fetch jobs. I tried filtering out the jobs portion originally and that yielded some errors. So I'm assuming it's trying to marry the categories and the jobs internally, and it expects there to be jobs for each category the first query returned. By not returning the Report Server category at all, the resulting merged list of jobs doesn't contain any jobs belonging to that category (logically an INNER JOIN).

Sure, this is a dirty hack, but I don't mind.

Update (2012-01-11)
Here is the predicate for hiding those jobs from ALL SSMS clients:

96
97
98
99
100
101
102
103
104
105
106
  SET @where_clause += N'
  AND
	CASE
		WHEN 
			name = ''Report Server'' 
			AND (
				SELECT program_name 
				FROM sys.sysprocesses 
				where spid = @@spid) = ''Microsoft SQL Server Management Studio''  THEN 0
		ELSE 1
	END = 1 '
Tagged as: , 1 Comment
18Nov/110

SSIS, BIDS and 64bit System DSNs: Who’s on first?

After an hour or so of changing settings, package configurations, protection levels, passwords, connection strings, etc - you might just still have yourself one of those dreaded red boxes (and a headache). And you thought inserting data using ODBC was easy...

You might even have yourself some logging that indicates an error like this:
[AS400 [19]] Error: ADO NET Destination has failed to acquire the connection {43E6AE37-24E8-46F6-8AB0-689DB6531167}. The connection may have been corrupted.

Assuming you are using a System DSN for your connection (this article is based on an ODBC DSN using the iSeries Access ODBC Driver to write to DB2, but it applies to any ODBC driver with 32bit and 64bit versions where a System DSN is used for the connection).

Before going on, you should understand and have checked the PackageProtectionLevel property in your package. To get any connection that requires a password to run in BIDS, you can't use DontSaveSensitive (unless you're using package configurations to enter the password or an explicit connection string). This is by far the most common reason connections fail to validate - but there's many resources out there already on that situation, so I won't go in to it.

Now that we know it SHOULD work - why doesn't it? In short, because you don't have a 64-bit version of the System DSN you selected. What sense does that make? Let me show you.

Tagged as: , Continue reading
19Oct/112

Pattern for Conditionally Sending SSRS Subscriptions

MailAlert

This is a pretty common request - send the report if there's an issue, don't send it if there's not an issue. This is a simple pattern for doing that using Data Driven Subscriptions. Here, we're not using the data driven subscription to dynamically populate recipient and parameter information, we're just using it to suppress the sending of the report if the report has no value.

To do this, we only need one thing - a query that returns one row if the report should be sent and zero rows if the report should not be sent. Here is an example - Let's say I want to send a report out if a JDE Address Book record with an AT1 of 'DC' has been updated after a certain julian date.

First, write a query to return rows meeting that criteria:

Tagged as: , Continue reading
3Jul/112

Consuming an Authenticated JSON Feed with SSIS

json

For this post, we'll look in to one option for consuming a JSON data feed from a service provider's API. In my case, the provider is AtTask - a web based project management service. This sort of assignment/request is pretty common as various corners of your organization simply want to use the best tool for the job. Far too infrequently does anyone really look at things like how it connects to your current systems. They say things like "we have a well documented API and a strong user community" - yeah, I've heard that a few thousand times. JSON is neat if you're using JQuery. It's horrible if you're trying to suck data down in to a SQL Server table. Which is what we're going to do.

First, we wander over to the AtTask API Documentation and figure out how to authenticate. Notice that every request needs a sessionID - and that sessionID is created by a request to the login url. This means we'll have to make at least two requests - one to nab a sessionID and another to actually get the data that we want.

Tagged as: , , Continue reading
8Jun/110

Revisiting Embedded Text Qualifiers and the SSIS Flat File Connection Manager

200px-Gnome-face-angry

To quickly summarize the problem at hand, the Flat File Connection Manager doesn't support embedded Text Qualifiers. The reason this is a big issue is because the most commonly used text qualifier is a double quote (") and that's also a commonly used punctuation mark. That means if you have a CSV to import and lines that looks like this:

"one","1","two","2","th""r""ee","3","asdf"
"""left","right""","mid""dle","w"",tf","w,""tf",",""omg","omg"","

You would expect to import

one 1 two 2 th"r"ee 3 asdf
"left right" mid"dle w",tf w,"tf ,"omg omg",

That's not what you get, you get an error. I've seen a few different approaches to working around this known issue:

What I'd like to explore in this post is using regular expressions to transform the entire file in to something that can be natively consumed by SSIS. There are two options for dealing with this:

  • Tokenizing the quotation marks found within the cells
  • Change the text qualifier

I chose to change the text qualifier because SSIS natively supports multiple character text qualifiers. This will allow me to pick something ridiculous that is incredibly unlikely to appear in the data naturally. Changing the text qualifier also means we won't have to add any extra transformations as we would if we tokenized the embeded quotation marks (to turn them back in to quotation marks).

20Apr/110

Reporting Services Source Control 1.0

rsscLogo

Reporting Services Source Control is an application to create copies of all the reports stored in your Report Manager and commit them to source control (SVN is the only supported source control in the 1.0 release). The need for this arises for two reasons:

  • First, we empower our users to create reports using Report Builder and those reports usually only exist the ReportServer database.
  • Second, Report Builder has matured so well that it is no longer very beneficial to create a Report Server project using Business Intelligence Development Studio (BIDS).

So now we have intellectual property generated by both end users and developers sitting in Report Manager with no version control whatsoever. Reporting Services Source Control addresses this by scripting out reports, subscriptions, and other properties and committing them to source control.

Tagged as: , , Continue reading
19Apr/110

Monitoring Failed Report Server Subscriptions

FinkCommander_in_use

While we love to empower our users as much as possible, we still need to pay attention to them and what they're up to. The first place I usually see users running amok as Microsoft BI adoption grows in organization is the Report Subscriptions. They go nuts with them. I was at one company where 50% of the report subscriptions went to users outside of the company - that means our customers were depending on these subscriptions! Which brings up an interesting licensing debacle... but that's another story. Needless to say, when people set up a subscription, they expect it to work. If it doesn't, we REALLY need to let someone know.

Here, we have a procedure to monitor for Reporting Services subscription failures. Specifically, email subscription failures. When a failed subscription is detected, an email is sent to both the subscription owner and a digest of failures is sent to the specified admin group.

Tagged as: , , Continue reading
18Apr/110

Parallel Cube Processor 1.0

codeplex-logo_3

Parallel Cube Processor (PCP) is an application designed to process multiple Analysis Services databases at the same time. I originally wrote this application with the assumption that it would be blazing fast and essentially warp space and time. As it turns out, SSAS appears to be going as fast as possible at all times, so giving it more stuff to do doesn't really make the whole shebang faster - just different. Play around with it, watch some performance counters - it's an interesting conundrum. The application can handle parallelism using two methods.

The first method is XMLAParallel - in this mode, the application constructs an XMLA command to process the specified number of databases inside of a Parallel element. Only one XMLA command is issued at a time so the applicatoin will not submit another command until the longest processing job completes in each batch.

Tagged as: , , Continue reading
15Apr/112

Dealing With Long Running SSAS Queries using Powershell

powershellFeatured

So, your Analysis Services instance keeps eating all the memory on your server despite proper memory configuration? This happens when queries get too big for their britches - SSAS memory limits are basically soft limits. Whatever an active query requires in terms of resources, SSAS will attempt to provide it. Even if it has to page the entire operating system out to disk to do it. You can mitigate this with good cube design, better aggregations, user training, etc - but it's bound to happen. When your cube gets big enough, one of your users will blow the entire server up.