tim laqua dot com Thoughts and Code from Tim Laqua

8Jun/1128

Revisiting Embedded Text Qualifiers and the SSIS Flat File Connection Manager

To quickly summarize the problem at hand, the Flat File Connection Manager doesn't support embedded Text Qualifiers. The reason this is a big issue is because the most commonly used text qualifier is a double quote (") and that's also a commonly used punctuation mark. That means if you have a CSV to import and lines that looks like this:

"one","1","two","2","th""r""ee","3","asdf"
"""left","right""","mid""dle","w"",tf","w,""tf",",""omg","omg"","

You would expect to import

one 1 two 2 th"r"ee 3 asdf
"left right" mid"dle w",tf w,"tf ,"omg omg",

That's not what you get, you get an error. I've seen a few different approaches to working around this known issue:

What I'd like to explore in this post is using regular expressions to transform the entire file in to something that can be natively consumed by SSIS. There are two options for dealing with this:

  • Tokenizing the quotation marks found within the cells
  • Change the text qualifier

I chose to change the text qualifier because SSIS natively supports multiple character text qualifiers. This will allow me to pick something ridiculous that is incredibly unlikely to appear in the data naturally. Changing the text qualifier also means we won't have to add any extra transformations as we would if we tokenized the embeded quotation marks (to turn them back in to quotation marks).