tim laqua dot com Thoughts and Code from Tim Laqua

3Jul/1114

Consuming an Authenticated JSON Feed with SSIS

For this post, we'll look in to one option for consuming a JSON data feed from a service provider's API. In my case, the provider is AtTask - a web based project management service. This sort of assignment/request is pretty common as various corners of your organization simply want to use the best tool for the job. Far too infrequently does anyone really look at things like how it connects to your current systems. They say things like "we have a well documented API and a strong user community" - yeah, I've heard that a few thousand times. JSON is neat if you're using JQuery. It's horrible if you're trying to suck data down in to a SQL Server table. Which is what we're going to do.

First, we wander over to the AtTask API Documentation and figure out how to authenticate. Notice that every request needs a sessionID - and that sessionID is created by a request to the login url. This means we'll have to make at least two requests - one to nab a sessionID and another to actually get the data that we want.

First, create a new package and drop down all the basic objects we'll need:

  1. OLEDB Connection
  2. SQL Task: BEGIN TRANSACTION
  3. SQL Task: TRUNCATE (truncate target table)
  4. Data Flow: Populate table
  5. SQL Task: COMMIT TRANSACTION
  6. SQL Task: ROLLBACK TRANSACTION (failure predicates from truncate and data flow)

Suggested Control Flow

Because we're trying to maintain a single transaction through multiple executables, set the RetainSameConnection property of your OLEDB connection to True. You certainly could just TRUNCATE and then run the data flow, I just wanted to avoid having the end user(s) hit an empty table mid-load (the API I was working with was a tad slow at times).

Now that you have the control flow under... control... jump in to the Data flow where the interesting stuff happens. You'll need at least a Script Component Source and an OLEDB destination.

Suggested Data Flow

I created two variables to hold the URLs I needed to both login to the API and make the actual request

Open up the Script Component Source and add the variables to the component:

Then configure the output to match what you want to come out of the component:

Finally, edit the script and switch it to build for the .NET Framework 3.5:
Project Explorer > Properties

Select .NET 3.5

Then Add references that we'll need for this script:

Select Assemblies we'll need (System.Runtime.Serialization and System.ServiceModel.Web)

Make sure to Save All at this point! You want to make sure that the changes you made to the project get saved, not just the changes to the main.cs file.

And here's my completed script. The specifics of how you need to login and what the returned data feeds look like will vary, but the basic flow will likely remain the same.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Xml;
using System.Net;
using System.IO;
using System.Runtime.Serialization.Json;
using System.Runtime.Serialization;
using System.Text.RegularExpressions;
 
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // loginURL should be a url that returns JSON containing a sessionID
        string loginURL = Variables.loginURL;
 
        // Get the sessionID
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(loginURL);
        request.Timeout = 300000;
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        StreamReader reader = new StreamReader(response.GetResponseStream());
        string json = reader.ReadToEnd();
        Match m = Regex.Match(json, "\"sessionID\": \"(.+?)\"");
        string sessionId = m.Groups[1].Value;
 
        // Need to support pagination, api has a 2000 item limit per req
        XmlNodeList items;
        int rowsPerPage = 1000;
        int page = 0;
 
        // Loop until we find the last page
        do {
            // The taskURL has 3 tokens:
            // {0} = The first record to return for the page we're requesting
            // {1} = The number of records to request
            // {2} = The sessionID
            string taskURL = string.Format(
                Variables.taskURL, 
                page * rowsPerPage + 1, 
                rowsPerPage, 
                sessionId);
            request = (HttpWebRequest)WebRequest.Create(taskURL);
            response = (HttpWebResponse)request.GetResponse();
 
            // Toss the JSON response in to XML so we can work with it better
            XmlDocument xd = new XmlDocument();
            XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(
                response.GetResponseStream(), 
                XmlDictionaryReaderQuotas.Max);
            xr.Read();
            xd.LoadXml(xr.ReadOuterXml());
 
            // Put each item returned in to a new Buffer row
            items = xd.DocumentElement.SelectNodes("/root/data/item");
            foreach (XmlNode item in items)
            {
                TaskDataBuffer.AddRow();
                TaskDataBuffer.ProjectId = 
                    item.SelectSingleNode("project/ID").Value;
                TaskDataBuffer.ProjectName = 
                    item.SelectSingleNode("project/name").Value;
 
                // This particular feed allows assignedTo to be empty/null
                if (item.SelectSingleNode("assignedTo/name") != null)
                {
                    TaskDataBuffer.TaskAssignedToName = 
                        item.SelectSingleNode("assignedTo/name").Value;
                }
                else
                {
                    TaskDataBuffer.TaskAssignedToName_IsNull = true;
                }
 
                TaskDataBuffer.TaskId = 
                    item.SelectSingleNode("ID").Value;
                TaskDataBuffer.TaskName = 
                    item.SelectSingleNode("name").Value;
                TaskDataBuffer.TaskPlannedCompletionDate = 
                    item.SelectSingleNode("plannedCompletionDate").Value;
                TaskDataBuffer.TaskPlannedStartDate = 
                    item.SelectSingleNode("plannedStartDate").Value;
                TaskDataBuffer.TaskProgressStatus = 
                    item.SelectSingleNode("progressStatus").Value;
            }
            page++;
        } while (items.Count == rowsPerPage);
    }
}

Remember that each API you interface with will have it's own nuances to deal with and the above code is pretty specific for the AtTask API I was working with. The part I want to point out is after we do all the API busy work, we have that JSON feed in a stream, turn it in to XML, and then access the data using XPath selects.

Tagged as: , , Leave a comment
Comments (14) Trackbacks (1)
  1. Great info, thanks!

    How would you change your script if you had a JSON file instead of the HTTP feed?

    • well, the signature we’re using here for he JSON reader is (Stream, XmlDictionaryReaderQuotas). So instead of the response stream, you can simply jam a FileStream in there. Hook a stream to your file like so:

      FileStream fileStream = new FileStream(@”c:\awesomeFile.json”, FileMode.Open);

      then modify the jsonreader creation line to something like this:

      XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(
      fileStream,
      XmlDictionaryReaderQuotas.Max);

      then make sure to close the fileStream when everything’s all said and done.

      • Hi, I try do this with a file instead HTTP Feed but it doesn’t work

        XmlNodeList items;

        FileStream fileStream = new FileStream(“D:\\SQL2008R2\\Data\\Lot\\file.json”, FileMode.Open);

        // Toss the JSON response in to XML so we can work with it better
        XmlDocument xd = new XmlDocument();

        XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(fileStream, XmlDictionaryReaderQuotas.Max);
        xr.Read();
        xd.LoadXml(xr.ReadOuterXml());

        // Put each item returned in to a new Buffer row
        items = xd.DocumentElement.SelectNodes(“/records”);
        MessageBox.Show(items.Count.ToString());

        foreach (XmlNode item in items)
        {

        TaskDataBuffer.AddRow();

        TaskDataBuffer.number =
        item.SelectSingleNode(“number”).Value;
        TaskDataBuffer.Date =
        item.SelectSingleNode(“date”).Value;
        TaskDataBuffer.time =
        item.SelectSingleNode(“time”).Value;
        TaskDataBuffer.user =
        item.SelectSingleNode(“user”).Value;

        }
        MessageBox.Show(“fin”);

        }

        my json file

        {

        “records”: [
        {
        “number”: “123456”,
        “created_on”: “2014-01-14 15:40:20”,
        “time”: “1970-01-01 00:00:07”,
        “user”: “789456”
        }

  2. Hi Tim,

    I’m trying to do the same thing – extract AtTask data into SQL to visualize in SharePoint but am new to SSIS and haven’t done much database coding. What is the declaration for TaskDataBuffer?

    Thanks,
    -Haniel

    • That’s all automatically wired up by the SSIS designer (BIDS or SSDT). It’s based on how you configured the Output objects of the script component. Each Output will have its own Buffer.

  3. Genius, thanks for your info!

    How would you change your script if I want to read a SQL query to generate Json?

  4. Solid overview, Tim. Since I’m stuck in the world of SSIS 2008, have there been improvements in 2012 or 2014 that you’re aware of in new widgets to work with RESTful services? Maybe something that avoids some of the C#/VB coding?

  5. I am trying to incorporate your do while loop in my code to get more than 200 results (that’s the limit from my api). Can you help please?
    Here is my code:
    using System;
    using System.Data;
    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
    using Microsoft.SqlServer.Dts.Runtime.Wrapper;
    using System.Net;
    using Microsoft.SqlServer.Dts.Runtime;
    using System.Windows.Forms;
    using System.IO;
    using System.Web.Script.Serialization;
    using System.Collections.Generic;

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
    public class ScriptMain : UserComponent
    {

    public override void CreateNewOutputRows()
    {
    string wUrl = “https://api.pipelinedeals.com/api/v3/deals.json?api_key=xxxxxxxxxxxxxxxxxxxxxx”;

    try
    {
    RootObject outPutMetrics = getWebServiceResult(wUrl);

    foreach (var metric in outPutMetrics.entries)
    {
    Output0Buffer.AddRow();
    Output0Buffer.closedtime = metric.closed_time;
    Output0Buffer.companyid = metric.company_id;
    Output0Buffer.createdat = metric.created_at;
    Output0Buffer.dealstageid = metric.deal_stage_id;
    Output0Buffer.expectedclosedate = metric.expected_close_date;
    Output0Buffer.expectedclosedateeventid = metric.expected_close_date_event_id;
    Output0Buffer.id = metric.id;
    Output0Buffer.importid = metric.import_id;
    Output0Buffer.isarchived = metric.is_archived;
    Output0Buffer.isexample = metric.is_example;
    Output0Buffer.name = metric.name;
    Output0Buffer.primarycontactid = metric.primary_contact_id;
    Output0Buffer.probability = metric.probability;
    Output0Buffer.sourceid = metric.source_id;
    Output0Buffer.status = metric.status;
    Output0Buffer.summary = metric.summary;
    Output0Buffer.updatedat = metric.updated_at;
    Output0Buffer.userid = metric.user_id;
    Output0Buffer.value = metric.value;
    Output0Buffer.valueincents = metric.value_in_cents;
    }

    }
    catch (Exception e)
    {
    failComponent(e.ToString());
    }

    }

    private RootObject getWebServiceResult(string wUrl)
    {

    HttpWebRequest httpWReq = (HttpWebRequest)WebRequest.Create(“https://api.pipelinedeals.com/api/v3/deals.json?api_key=xxxxxxxxxxxxxxxxx”);
    HttpWebResponse httpWResp = (HttpWebResponse)httpWReq.GetResponse();
    RootObject jsonResponse = null;

    try
    {

    if (httpWResp.StatusCode == HttpStatusCode.OK)
    {

    Stream responseStream = httpWResp.GetResponseStream();
    string jsonString;

    using (StreamReader reader = new StreamReader(responseStream))
    {
    jsonString = reader.ReadToEnd();
    reader.Close();
    }

    JavaScriptSerializer sr = new JavaScriptSerializer();
    jsonResponse = sr.Deserialize(jsonString);

    }

    else
    {
    failComponent(httpWResp.StatusCode.ToString());

    }
    }
    catch (Exception e)
    {
    failComponent(e.ToString());
    }
    return jsonResponse;

    }

    private void failComponent(string errorMsg)
    {
    bool fail = false;
    IDTSComponentMetaData100 compMetadata = this.ComponentMetaData;
    compMetadata.FireError(1, “Error Getting Data From Webservice!”, errorMsg, “”, 0, out fail);

    }
    }
    public class Entry
    {
    public string closed_time { get; set; }
    public string company_id { get; set; }
    public string created_at { get; set; }
    public string deal_stage_id { get; set; }
    public string expected_close_date { get; set; }
    public string expected_close_date_event_id { get; set; }
    public string id { get; set; }
    public string import_id { get; set; }
    public string is_archived { get; set; }
    public string is_example { get; set; }
    public string name { get; set; }
    public string primary_contact_id { get; set; }
    public string probability { get; set; }
    public string source_id { get; set; }
    public string status { get; set; }
    public string summary { get; set; }
    public string updated_at { get; set; }
    public string user_id { get; set; }
    public string value { get; set; }
    public string value_in_cents { get; set; }
    }

    public class Pagination
    {
    public int page { get; set; }
    public string page_var { get; set; }
    public int per_page { get; set; }
    public int pages { get; set; }
    public int total { get; set; }
    }

    public class RootObject
    {
    public List entries { get; set; }
    public Pagination pagination { get; set; }
    }

  6. Can you share a copy of this SSIS package?

  7. Thanks for the code and I am using it for a similar situation to extract AtTask data to SQL. I have added required references and the code is using 3.5 Framework but still get error “The name TaskDataBuffer does not exist in the current context”. Any help in this regard would be appreciated. Thanks

  8. Hi,

    what is the format of the taskurl variable you defined. how are the tokens defined ? my main confusion is session_id is the first parameter that was determined, after which other 2 parameters are defined(rows per page and page)
    i have only one token that is the sessionID that needs to be passed in the taskURL. how should i define that variable initially and set the token .

    Thanks

  9. When i do the same in script editor, it;s not able to compile ‘TaskDataBuffer’ .below is the error:
    Error 3 The name ‘TaskDataBuffer’ does not exist in the current context

    how to resolve this?

  10. Excellent tutorial!!! JSON is getting very popular due to its compact and easy to use format so recently started using it. Unfortunately SSIS doesn’t allow native JSON support… I cannot find any other component except this one
    http://zappysys.com/products/ssis-powerpack/ssis-json-file-source/

    Am I the only one thinking that way or is it like that?


Leave a comment