tim laqua dot com Thoughts and Code from Tim Laqua

11Jun/120

SQL 2012: XMLA Change Connection Command Missing

I'm fairly sure this has to do with the options I chose when I first opened SSMS, where I asked it to import my settings. Regardless of why it happened, my Change Connection command is gone from my context menu when editing XMLA files:

I use this quite a bit as we have many environments and most of those environments also have staging environments. Here's now to manually add it (and add it to add it to any other context menus you want)

15Mar/122

Automating PowerPivot Data Refresh in Sharepoint 2010

Of course we want to do this, it's a fundamental requirement. Dear Sharepoint, please refresh my PowerPivot when the data is ready, not on some arbitrary schedule. Until this functionality is built-in, people will continue to hack away at it to figure out how to make this happen. Is this supported? Certainly not - so continue at your own risk.

So the method we'll be using here is just to mimic what Sharepoint does when you check the box to "Also refresh as soon as possible" in the schedule configuration page. To accomplish this we open the profiler, connect it to whatever instance our Sharepoint PowerPivot database is hosted on and filter you TextData to "%Schedule%" or filter to just the Sharepoint PowerPivot database (SP2010_PowerPivot_Service_Application in our case) - then open up your test PowerPivot schedule configuration, check the box, click OK, wait for the schedule history to come back up and then stop the trace. Now you know you've got what you need, you just have to find it:

27Feb/124

Slowly Changing Dimensions with MD5 Hashes in SSIS

We recently moved away from the 3rd party Checksum component (and all 3rd party components) in SSIS and I wanted to share the pattern we settled on for maintaining our Type 1 Slowly Changing Dimensions (SCDs). There are two things we wanted to address with our new pattern. First, our previous implementation wasn't performing as well as we needed it to or generating reliable checksums. The second was that we wanted to get away from dependencies on custom assemblies in general. To illustrate the pattern, we're going to build a SCD package off the Adventure Works DW DimCustomer table and skip over the actual source of the business keys and attributes by selecting directly from the completed dimension for now.

First, we assume that our dimension already exists (and we were using some other checksum or MERGE to maintain it). We have to add a column to store the MD5 hash:

ALTER TABLE dbo.DimCustomer ADD
	MD5 VARCHAR(34) NOT NULL DEFAULT ''

Second, we need a staging table to store updated/changed rows. Script out the current dimension as a CREATE, remove all unneeded constraints and indexes, and create a staging table as a heap:

CREATE TABLE [dbo].[Staging_DimCustomer_UpdatedRows](
	[CustomerKey] [int] NOT NULL,
	[GeographyKey] [int] NULL,
	[CustomerAlternateKey] [nvarchar](15) NOT NULL,
	[Title] [nvarchar](8) NULL,
	[FirstName] [nvarchar](50) NULL,
	[MiddleName] [nvarchar](50) NULL,
	[LastName] [nvarchar](50) NULL,
	[NameStyle] [bit] NULL,
	[BirthDate] [datetime] NULL,
	[MaritalStatus] [nchar](1) NULL,
	[Suffix] [nvarchar](10) NULL,
	[Gender] [nvarchar](1) NULL,
	[EmailAddress] [nvarchar](50) NULL,
	[YearlyIncome] [money] NULL,
	[TotalChildren] [tinyint] NULL,
	[NumberChildrenAtHome] [tinyint] NULL,
	[EnglishEducation] [nvarchar](40) NULL,
	[SpanishEducation] [nvarchar](40) NULL,
	[FrenchEducation] [nvarchar](40) NULL,
	[EnglishOccupation] [nvarchar](100) NULL,
	[SpanishOccupation] [nvarchar](100) NULL,
	[FrenchOccupation] [nvarchar](100) NULL,
	[HouseOwnerFlag] [nchar](1) NULL,
	[NumberCarsOwned] [tinyint] NULL,
	[AddressLine1] [nvarchar](120) NULL,
	[AddressLine2] [nvarchar](120) NULL,
	[Phone] [nvarchar](20) NULL,
	[DateFirstPurchase] [datetime] NULL,
	[CommuteDistance] [nvarchar](15) NULL,
	[MD5] [varchar](34) NOT NULL)

Now in to SSIS - We will be building:

  1. Execute SQL Task to Truncate our Staging table(s)
  2. Data Flow Task to Insert new rows and Stage updated rows
  1. OLE DB Source to retrieve our source data
  2. Script Component to Generate Row Numbers
  3. Conditional Split to Evenly Distribute Rows
  4. Script Component to Generate MD5 Hashes
  5. Union All to Squish it all back together
  6. Lookup to get the existing MD5 Hash (if it exists)
  7. Conditional Split to separate Unchanged and Changed rows
  8. RowCount Transformation
  9. OLE DB Destination for Changed rows
  10. OLE DB Destination for New rows
  • Execute SQL Task to Update changed rows
  • Completed Control Flow

    Completed Data Flow

    3Feb/122

    SSAS Cache Isn’t Making Cents

    I stole the pun from my Business Analyst, Mr. John Seiler 😉 Now on to my issue - when SSAS caches the value for my [Actual] measure, it seems to do so based on the results of the first query that requests that coordinate. In this particular cube, there's bunches of tiny fractions and depending on how you slice it, it aggregates a little different. This is a fun problem in itself, but the part that drives me (and the Finance department) crazy is that if you go and slice on something OTHER than that first query that created the cache, the values they see don't always add up to the "Grand Total" in Excel - aka. "All"

    These are the queries used for this test:

    Query A

    SELECT 
    	{Actual} ON 0
    FROM [GL] 
    CELL PROPERTIES VALUE

    Query B

    SELECT 
    	{Actual} ON 0,
    	NON EMPTY Hierarchize({DrilldownLevel({[Account].[Accounts].[All]},,,INCLUDE_CALC_MEMBERS)}) ON 1
    FROM [GL] 
    CELL PROPERTIES VALUE

    Query C

    SELECT 
    	{Actual} ON 0,
    	NON EMPTY Hierarchize({DrilldownLevel({[Date].[Fiscal].[All]},,,INCLUDE_CALC_MEMBERS)}) ON 1
    FROM [GL] 
    CELL PROPERTIES VALUE
    GO

    Results (Cache was cleared with ClearCache on the entire Database before each Series)

    Series 1 Series 2 Series 3

    So basically the Grand Total of this GL cube is a random number based on whoever sneaks the first query in after the cache is cleared (processing, sync, etc).

    And for all of you that think the MDX script breaks everything everywhere - I did comment out my entire MDX script before running these tests.

    2Feb/123

    Uncivil UNIONs

    If you asked any SQL Server Developer or DBA what UNION does, they would tell you it combines two result sets. Then if you were to ask them if they knew it removed duplicates without the ALL argument, they would say yes. That's academic, but I dare you to go look at their and your code and see how often UNION without the ALL argument is used. UNION is evil. Why do I say that? Because DISTINCT is evil and blocking operations are evil so it follows that UNION is evil.

    Would you be OK with the following query as the source for a fact table?

    SELECT DISTINCT *
    FROM
    (
      SELECT DISTINCT [c1],[c2] FROM [t1]
      UNION ALL
      SELECT DISTINCT [c1],[c2] FROM [t2]
    ) a

    No? Then why are you OK with this?

    SELECT [c1],[c2] FROM [t1]
    UNION
    SELECT [c1],[c2] FROM [t2]

    Personally, I tend to use UNION to combine data from two different sources. For example, if I wanted to combine the sales transactions from two JDE instances - the query might look like:

    SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
    UNION
    SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]

    Each line here in a sales detail represents a line on an order, so assuming you selected all the PK columns in your query, a UNION won't remove any rows within either set, but when combined - it's possible the same PK could exist in both systems. Now you could split hairs and say it's highly unlikely the rest of the row would be identical, but that's not the point - it's wrong to ever DISTINCT a set of transactions intended for creating a fact table. Sure, in this particular case you could add a "SourceSystem" column that would stop that, but then you still have to accept that the UNION is blocking and expensive. Just because a DISTINCT doesn't modify results doesn't mean it didn't do any work - it actually did a great deal of work. Now consider that we're usually not selecting 2 columns, we're selecting 10-20. Do you want to wait for it to figure out all the rows are distinct when you already knew that? Further, if they're not distinct you probably screwed up a JOIN somewhere.

    As far as coding standards regarding UNION go, I would suggest never using UNION without the ALL clause and doing your DISTINCT elsewhere if that's actually what you intended. Does that seem silly? Yes - but I would wager you won't ever use the 2nd one outside of building Dimensions.

    SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
    UNION ALL
    SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]
     
    --OR
     
    SELECT DISTINCT * FROM
    (
      SELECT [c1],[c2] FROM [JDE1].[dbo].[F42119]
      UNION ALL
      SELECT [c1],[c2] FROM [JDE2].[dbo].[F42119]
    )

    And don't get me started on unioning subqueries with GROUP BY's - Why don't you just throw half your processers in the garbage, light tempdb on fire and put all your data in heaps.

    Tagged as: , 3 Comments
    11Jan/121

    Hiding SSRS Schedule Jobs In SSMS

    Everybody hates these things. If you're in the Activity Monitor or browsing via the SSMS tree view, these GUID jobs that represent SSRS subscriptions are really just none of our concern. Sure, I can admit that I've seen people manually fire these to re-send a given subscription, but you can just do that using the AddEvent proc in a query window. Personally - I don't want to see these... usually...

    Connect to the database instance you want to filter the agent jobs out on
    Browse to Databases > System Databases > msdb > Programmability > Stored Procedures > System Stored Procedures
    Right-click on dbo.sp_help_category and select Modify...

    At the top, change the definition of @where_clause to NVARCHAR(MAX)

    19
    
      DECLARE @where_clause   NVARCHAR(max)

    At the bottom, add in a few lines to append the @where_clause variable with a predicate that filters out the Report Server category when it's you from your workstation (so you can still see that category from another machine if you need to).

    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    
      SELECT @cmd = @cmd + N'FROM msdb.dbo.syscategories '
     
      SET @where_clause += N'
      AND
    	CASE
    		WHEN 
    			name = ''Report Server'' 
    			AND (
    				SELECT RTRIM(nt_username) + ''|'' + RTRIM(hostname) 
    				FROM sys.sysprocesses 
    				where spid = @@spid) = ''tlaqua|TIMSLAPTOP''  THEN 0
    		ELSE 1
    	END = 1 '
     
      -- Execute the query
      EXECUTE (@cmd + @where_clause + N'ORDER BY category_type, name')
     
      RETURN(@@error) -- 0 means success
    END

    So, what on earth are we doing here? First, replace my nt_username with yours and replace my hostname with yours. From my less-than-exhaustive trial and error testing, it seems that when either the SSMS Jobs node is expanded or the Job Activity Monitor fetches jobs, two DB calls are made - one to fetch categories and another to fetch jobs. I tried filtering out the jobs portion originally and that yielded some errors. So I'm assuming it's trying to marry the categories and the jobs internally, and it expects there to be jobs for each category the first query returned. By not returning the Report Server category at all, the resulting merged list of jobs doesn't contain any jobs belonging to that category (logically an INNER JOIN).

    Sure, this is a dirty hack, but I don't mind.

    Update (2012-01-11)
    Here is the predicate for hiding those jobs from ALL SSMS clients:

    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    
      SET @where_clause += N'
      AND
    	CASE
    		WHEN 
    			name = ''Report Server'' 
    			AND (
    				SELECT program_name 
    				FROM sys.sysprocesses 
    				where spid = @@spid) = ''Microsoft SQL Server Management Studio''  THEN 0
    		ELSE 1
    	END = 1 '
    Tagged as: , 1 Comment
    18Nov/110

    SSIS, BIDS and 64bit System DSNs: Who’s on first?

    After an hour or so of changing settings, package configurations, protection levels, passwords, connection strings, etc - you might just still have yourself one of those dreaded red boxes (and a headache). And you thought inserting data using ODBC was easy...

    You might even have yourself some logging that indicates an error like this:
    [AS400 [19]] Error: ADO NET Destination has failed to acquire the connection {43E6AE37-24E8-46F6-8AB0-689DB6531167}. The connection may have been corrupted.

    Assuming you are using a System DSN for your connection (this article is based on an ODBC DSN using the iSeries Access ODBC Driver to write to DB2, but it applies to any ODBC driver with 32bit and 64bit versions where a System DSN is used for the connection).

    Before going on, you should understand and have checked the PackageProtectionLevel property in your package. To get any connection that requires a password to run in BIDS, you can't use DontSaveSensitive (unless you're using package configurations to enter the password or an explicit connection string). This is by far the most common reason connections fail to validate - but there's many resources out there already on that situation, so I won't go in to it.

    Now that we know it SHOULD work - why doesn't it? In short, because you don't have a 64-bit version of the System DSN you selected. What sense does that make? Let me show you.

    Tagged as: , Continue reading
    19Oct/111

    Monitoring and Starting Data Collector Sets with Powershell

    I figure it's about time I posted an update to my previous post on Monitoring, Starting, and Stopping Perfmon Performance Counter Logs using VBScript. In the years since that post, a few operating systems came out and many much more interesting languages happened by. So this update is in Powershell. You could do it in pretty much any language, but Powershell seems to be my weapon of choice lately.

    The basic premise here is that we want our Data Collector Sets to always be running. Seems like a reasonable request, but for various reasons, Data Collector Sets just... stop. This script should be run regularly (every 5 minutes?) to ensure that the sets keep collecting data. In the event that the script finds some work to do, it will send you an email.

    #======= START Config ========
    $emailServer = "1.2.3.4";
    $fromAddress = "you@yourcompany.com";
    $toAddress = "you@yourcompany.com";
    $ignorePattern = '^(RandomCollectorSet|OSTest)$';
    #======== END Config =========
     
    $ErrorActionPreference = "SilentlyContinue"
    $serverName = Get-Content env:computername;
    $emailBody = ""
     
    $collectors = New-Object -COM Pla.DataCollectorSetCollection;
    $collectors.GetDataCollectorSets($null, "Service\*");
    $collectors._NewEnum | ?{$_.Status -ne 1 -and $_.Name -notmatch $ignorePattern} | %{
    	$_.Start($true);
    	if($_.Status -eq 1) {
    		$emailBody += "Started $($_.Name)`n"
    	} else {
    		$emailBody += "Failed to start $($_.Name)`n"
    	}
    };
     
    if($emailBody -ne "") {
      $smtp = new-object Net.Mail.SmtpClient($emailServer);
      $smtp.Send(
        "$serverName <$fromAddress>", 
        $toAddress, 
        "Started Data Collector Set(s)", 
        $emailBody);
    }

    If you want the script to keep any of the non-user created collectors running, modify the namespace parameter of the GetDataCollectorSets method call.

    Tagged as: 1 Comment
    19Oct/114

    Pattern for Conditionally Sending SSRS Subscriptions

    This is a pretty common request - send the report if there's an issue, don't send it if there's not an issue. This is a simple pattern for doing that using Data Driven Subscriptions. Here, we're not using the data driven subscription to dynamically populate recipient and parameter information, we're just using it to suppress the sending of the report if the report has no value.

    To do this, we only need one thing - a query that returns one row if the report should be sent and zero rows if the report should not be sent. Here is an example - Let's say I want to send a report out if a JDE Address Book record with an AT1 of 'DC' has been updated after a certain julian date.

    First, write a query to return rows meeting that criteria:

    Tagged as: , Continue reading
    19Oct/110

    Script for Ad-Hoc SSRS TimedSubscription Scheduling

    Quick script to locate and execute SSRS timed subscriptions (useful for testing).

    1. Execute the top (uncommented) part and identify which subscription you want to schedule.
    2. Paste the SubscriptionId from that row in to the @id variable in the lower section
    3. Run the entire lower (commented) section

    DECLARE @ReportName VARCHAR(255) = 'Report Name'
     
    select
          a.Path
          ,b.SubscriptionID 
          ,b.Description
          ,CAST(b.Parameters AS XML) AS [Parameters]
          ,b.LastRunTime
          ,b.LastStatus
          ,CAST(b.ExtensionSettings AS XML) AS [ExtensionSettings]
    from ReportServer.dbo.[Catalog] a
          INNER JOIN ReportServer.dbo.[Subscriptions] b
                ON b.Report_OID = a.ItemID
    where a.Name = @ReportName
     
    /*
    DECLARE 
           @id varchar(260) = ''
          ,@Type VARCHAR(32)                  = 'TimedSubscription'
     
    EXEC [ReportServer].dbo.AddEvent @EventType=@Type, @EventData=@id
     
    */
    Tagged as: No Comments