Saturday, May 15, 2010

Convert comma separated string to table column

I have seen a few questions asking how to transform a delimited values into columns, so I thought I would talk about it here. In most cases it is recommended to use a string parser function to split the string; however, today I want to talk about another method. This method takes advantage of the XML data type that was introduced in SQL Server 2005. What is nice about the XML data type is it preserves the document order. The document order is critical because it guarantees the string is kept in the same order when it is converted to XML. Let’s start by creating a sample table with a few rows of data.


DECLARE @t TABLE(
ProductId INT,
ProductName VARCHAR(25),
SupplierId INT,
Descr VARCHAR(50)
)

INSERT INTO @t VALUES (1,'Product1',1,'A1,10in,30in,2lbs');
INSERT INTO @t VALUES (2,'Product2',2,'T6,15in,30in,');
INSERT INTO @t VALUES (3,'Product3',1,'A2,1in,,0.5lbs');
Okay now we have our sample data, let’s talk about our data. The column “Descr” contains 4 attributes of the product. The attributes that we are storing are Type,Length,Height, and Weight. Anyone of these attributes can be null in the string but MUST be represented in the string. For example, “A1,10in,30in,” has a comma at the end because the weight is unknown and a string like “A1,10in,,1lbs” has a empty space for height. If we do not have this space, how could we ever determine which attributes are associated with a particular column? The answer is there would be no real way of knowing with this method or any other method.


Now that we have laid all the ground work, it is time to start building our query. We will start off by creating a CTE (common table expression) making sure to convert our delimited string into valid XML. Below is the first part of our CTE (this is not the complete code).


;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT
[ProductId],
[ProductName],
[SupplierId],
CONVERT(XML,''
+ REPLACE([Descr],',', '
')
+ '
') AS Prod_Attributes
FROM @t
)
You may be asking yourself, what does this code do? This CTE code selects all the data from the table @t, but it also converts the delimited string into valid XML. How does it do that? Well let’s break down the code to figure it out.


This is the conversion code of interest:


CONVERT(XML,''
+ REPLACE([Descr],',', '
')
+ '
') AS Prod_Attributes
This code takes the input string and uses the replace function to insert the XML tags, so that it has the look and feel of valid xml. For example, the string “A1,10in,30in,5lbs” will be transformed into



A1
10in
30in
5lbs

Note: It should be noted that we are using a comma delimitation. If your delimitation is different, you will need to change the delimiter in the replace function. The delimiter is between the column and the closing XML tag. In the code following, REPLACE([Descr],',', ' . the delimiter is denoted in Red.


Now that the product description values are in a valid XML format, we can easily get the values by using a hardcoded singleton value, as shown below.


;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT
[ProductId],
[ProductName],
[SupplierId],
CONVERT(XML,''
+ REPLACE([Descr],',', '
')
+ '
') AS Prod_Attributes
FROM @t
)
SELECT
[ProductID],
[SupplierId],
Prod_Attributes.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
Prod_Attributes.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
Prod_Attributes.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
Prod_Attributes.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM cte
The singleton of each attribute element is denoted by a hardcoded value 1-4. If you have more columns you will need to specify more singletons. Here are the results:






So there you have it. An easy to implement solution to “pivot” or transform delimited values into columns.


**************** Update ********************


I have reworked my code to work with XML special characters. XML special characters are quite problematic but Brad Schulz, http://bradsruminations.blogspot.com/, has come up with a great method to work around the encoding issues. Here is the modified code:


SELECT
[ProductID],
[SupplierId],
NewXML.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
NewXML.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
NewXML.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
NewXML.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM @t t1
CROSS APPLY (SELECT XMLEncoded=(SELECT Descr AS [*] FROM @t t2 WHERE t1.ProductId = t2.[ProductId] FOR XML PATH(''))) EncodeXML
CROSS APPLY (SELECT NewXML=CAST(''+REPLACE(XMLEncoded,',','')+'' AS XML)) CastXML