Consuming an Authenticated JSON Feed with SSIS

For this post, we'll look in to one option for consuming a JSON data feed from a service provider's API. In my case, the provider is AtTask - a web based project management service. This sort of assignment/request is pretty common as various corners of your organization simply want to use the best tool for the job. Far too infrequently does anyone really look at things like how it connects to your current systems. They say things like "we have a well documented API and a strong user community" - yeah, I've heard that a few thousand times. JSON is neat if you're using JQuery. It's horrible if you're trying to suck data down in to a SQL Server table. Which is what we're going to do.
First, we wander over to the AtTask API Documentation and figure out how to authenticate. Notice that every request needs a sessionID - and that sessionID is created by a request to the login url. This means we'll have to make at least two requests - one to nab a sessionID and another to actually get the data that we want.
First, create a new package and drop down all the basic objects we'll need:
- OLEDB Connection
- SQL Task: BEGIN TRANSACTION
- SQL Task: TRUNCATE (truncate target table)
- Data Flow: Populate table
- SQL Task: COMMIT TRANSACTION
- SQL Task: ROLLBACK TRANSACTION (failure predicates from truncate and data flow)
Because we're trying to maintain a single transaction through multiple executables, set the RetainSameConnection property of your OLEDB connection to True. You certainly could just TRUNCATE and then run the data flow, I just wanted to avoid having the end user(s) hit an empty table mid-load (the API I was working with was a tad slow at times).
Now that you have the control flow under... control... jump in to the Data flow where the interesting stuff happens. You'll need at least a Script Component Source and an OLEDB destination.
I created two variables to hold the URLs I needed to both login to the API and make the actual request

Open up the Script Component Source and add the variables to the component:

Then configure the output to match what you want to come out of the component:

Finally, edit the script and switch it to build for the .NET Framework 3.5:
Project Explorer > Properties

Then Add references that we'll need for this script:

Select Assemblies we'll need (System.Runtime.Serialization and System.ServiceModel.Web)

Make sure to Save All at this point! You want to make sure that the changes you made to the project get saved, not just the changes to the main.cs file.
And here's my completed script. The specifics of how you need to login and what the returned data feeds look like will vary, but the basic flow will likely remain the same.
using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; using System.Xml; using System.Net; using System.IO; using System.Runtime.Serialization.Json; using System.Runtime.Serialization; using System.Text.RegularExpressions; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void CreateNewOutputRows() { // loginURL should be a url that returns JSON containing a sessionID string loginURL = Variables.loginURL; // Get the sessionID HttpWebRequest request = (HttpWebRequest)WebRequest.Create(loginURL); request.Timeout = 300000; HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader reader = new StreamReader(response.GetResponseStream()); string json = reader.ReadToEnd(); Match m = Regex.Match(json, "\"sessionID\": \"(.+?)\""); string sessionId = m.Groups[1].Value; // Need to support pagination, api has a 2000 item limit per req XmlNodeList items; int rowsPerPage = 1000; int page = 0; // Loop until we find the last page do { // The taskURL has 3 tokens: // {0} = The first record to return for the page we're requesting // {1} = The number of records to request // {2} = The sessionID string taskURL = string.Format( Variables.taskURL, page * rowsPerPage + 1, rowsPerPage, sessionId); request = (HttpWebRequest)WebRequest.Create(taskURL); response = (HttpWebResponse)request.GetResponse(); // Toss the JSON response in to XML so we can work with it better XmlDocument xd = new XmlDocument(); XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader( response.GetResponseStream(), XmlDictionaryReaderQuotas.Max); xr.Read(); xd.LoadXml(xr.ReadOuterXml()); // Put each item returned in to a new Buffer row items = xd.DocumentElement.SelectNodes("/root/data/item"); foreach (XmlNode item in items) { TaskDataBuffer.AddRow(); TaskDataBuffer.ProjectId = item.SelectSingleNode("project/ID").Value; TaskDataBuffer.ProjectName = item.SelectSingleNode("project/name").Value; // This particular feed allows assignedTo to be empty/null if (item.SelectSingleNode("assignedTo/name") != null) { TaskDataBuffer.TaskAssignedToName = item.SelectSingleNode("assignedTo/name").Value; } else { TaskDataBuffer.TaskAssignedToName_IsNull = true; } TaskDataBuffer.TaskId = item.SelectSingleNode("ID").Value; TaskDataBuffer.TaskName = item.SelectSingleNode("name").Value; TaskDataBuffer.TaskPlannedCompletionDate = item.SelectSingleNode("plannedCompletionDate").Value; TaskDataBuffer.TaskPlannedStartDate = item.SelectSingleNode("plannedStartDate").Value; TaskDataBuffer.TaskProgressStatus = item.SelectSingleNode("progressStatus").Value; } page++; } while (items.Count == rowsPerPage); } }
Remember that each API you interface with will have it's own nuances to deal with and the above code is pretty specific for the AtTask API I was working with. The part I want to point out is after we do all the API busy work, we have that JSON feed in a stream, turn it in to XML, and then access the data using XPath selects.







December 20th, 2011 - 13:35
Great info, thanks!
How would you change your script if you had a JSON file instead of the HTTP feed?
December 23rd, 2011 - 20:06
well, the signature we’re using here for he JSON reader is (Stream, XmlDictionaryReaderQuotas). So instead of the response stream, you can simply jam a FileStream in there. Hook a stream to your file like so:
FileStream fileStream = new FileStream(@”c:\awesomeFile.json”, FileMode.Open);
then modify the jsonreader creation line to something like this:
XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(
fileStream,
XmlDictionaryReaderQuotas.Max);
then make sure to close the fileStream when everything’s all said and done.