tim laqua dot com Thoughts and Code from Tim Laqua

5Jul/1212

Consuming Sharepoint Lists via OData with SSIS

We recently needed to bring data from a Sharepoint list in to SSIS for use in an attribute lookup for a cube dimension. Seems like this should be pretty straight forward since SSRS does it natively now, but no - that wizardry hasn't made its way over to the SSIS team yet. As stated before, we don't care for third party dependencies or external non-standard assemblies of any sort in our packages. That means, as usual, we'll be writing a script component to take care of getting our data out of Sharepoint.

First, navigate to the list you are interested in and export it as a data feed


Save the file it creates and view it with a text editor. Locate the service URL.

  • Now start SQL Server Data Tools (SSDT, used to be BIDS)
  • Create a new SSIS project
  • Add a Data Flow Task
  • Add a Script Component configured as a Source
  • Add some Columns to your script source output
  • Edit the Script
  • The only method we need to keep is CreateNewOutputRows(), the rest you can delete or just ignore.
  • Add a new Service Reference
  • Paste your Service Url from earlier in to the Address field and click Go
  • Enter a more meaningful Namespace
  • Add the Namespace of the service reference you just added to your script (just type "using SC" and intellisense will pop up your script component's namespace) and bring in System.Linq as well:
    using System;
    using System.Data;
    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
    using Microsoft.SqlServer.Dts.Runtime.Wrapper;
    using SC_bf880e9df2b64705ab3207accf3c6b9a.SharepointOData;
    using System.Linq;
  • Finish implementing CreateNewOutputRows():
        public override void CreateNewOutputRows()
        {
            SPTeamTestDataContext dc =
                new SPTeamTestDataContext(
                    new Uri("https://intranet.yourdomain.com/sites/site0/_vti_bin/ListData.svc"));
     
            dc.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;
     
            var projects = from project in dc.ITProjects
                          select new
                          {
                              Phase = project.Phase.Value,
                              ProjectName = project.Project
                          };
     
            foreach (var project in projects)
            {
                Output0Buffer.AddRow();
                Output0Buffer.C1 = project.Phase;
                Output0Buffer.C2 = project.ProjectName;
            }
        }
  • Dump all the output in to the Trash (or row count transformation, whatever you use for that sort of thing) and snap on a Data Viewer.
  • Finally, run the package:

Pagination

Now depending on how your service is configured, you may not receive all of your rows. In our case, we only got 1000. This means the service is only configured to return a maximum of 1000 per query. To work around this, you must paginate your requests:

        var projects = from project in dc.ITProjects
                      orderby project.Id
                      select new
                      {
                          Phase = project.Phase.Value,
                          ProjectName = project.Project
                      }
                      ;
 
        int recordCount = projects.Count();
        int page = 0;
        int pageSize = 100;
 
        while(page*pageSize < recordCount)
        {
            foreach (var project in projects.Skip(page * pageSize).Take(pageSize))
            {
                Output0Buffer.AddRow();
                Output0Buffer.C1 = project.Phase;
                Output0Buffer.C2 = project.ProjectName;
            }
            page++;
        }
Comments (12) Trackbacks (2)
  1. maybe I am thinking of something else, but in odata, isn’t there like a top or something you can put in the number of rows you want? or does it always limit it because of what SharePoint is doing? If you made a view on the list that didn’t paginate? Could be SharePoint, as I have seen other odata svcs return many more rows.

    • Yes and no. The Skip and Take basically get translated to REST querystring parameters, that’s why it doesn’t return 1000 rows regardless of what c# shenanigans we do. Now if we could say “give me one million rows” (and we can) – what’s to say there’s not one million and one rows? I believe you’re alluding to some sort of $giveMeEverything parameter – no, I’m not sure that exists (and it pry shouldn’t if it does).

      Let me know if you find it 😉 It’d save a few lines of code.

  2. I am trying this IRL. Have a list pulling down pretty nice. There is a field on the list though that is a “ContentType” like sub type.. have you ran across that or do I need to Google 🙂

    • Take a look at the properties of that object in VS. Either browse it in the object browser or simply type a period after the name of the column and let intellisense show you the available public members as defined by the Service reference.

  3. Following this tip to consume a Sharepoint list in SSRS, as it looks a neat solution, but having issue with the CreateNewOutputRows() code. There is no reference to ‘SPTeamTestDataContext’ in your example, can you clarify?

    • The DataContext depends on your specific service reference. When you create the service reference, it will show you the name of the DataContext at the top of the tree in the Add Service Reference dialog.

  4. I need to pull data from folders in sharepointlists. Can you explain me a way do to it

  5. I’m trying to import a large list and I always have thoses errors. Do you know what they means?
    The BreakPoint tells me that the error is haponing at “foreach”

    at System.Data.Services.Client.DataServiceRequest.Execute[TElement](DataServiceContext context, QueryComponents queryComponents)
    at System.Data.Services.Client.DataServiceQuery`1.Execute()
    at System.Data.Services.Client.DataServiceQuery`1.GetEnumerator()
    at ScriptMain.CreateNewOutputRows()
    at UserComponent.PrimeOutput(Int32 Outputs, Int32[] OutputIDs, PipelineBuffer[] Buffers, OutputNameMap OutputMap)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PrimeOutput(Int32 outputs, Int32[] outputIDs, PipelineBuffer[] buffers)

  6. Hi.

    Great post.

    I’m using this for importing data from SP list into SQL database. I have set it up and it works as expected. My issue is that if I add a column to the SP list, I can’t reference this in the script. The script component does not seem to discover the new column in the SP list.

    All help appreciated.

    Thanks.

  7. Great Solution ! Thanks for Sharing 🙂

  8. Is there any way to make “Add Service Reference” as dynamic ?? Can I read from a config file/variable ?

    In short I want to deploy this solution to DEV, QA and PROD by using xml configuration file without rebuilding the solution for each env !


Leave a comment