This post has already been read 4552 times!
I recently had to look at a problem that required converting the column values from a set of rows into a single comma separated string. On the face of it this is quite straightforward, but as with many things SQL Server there is more than one way to solve the problem. I’ve presented a few possible solutions here, no doubt there are more.
Moving on to problem, I'll create a table which can be used to demonstrate this (for the purposes of simplicity I’ve removed most of the columns).
Here is the table creation script :
and here is the data (as a SQL script) :
The requirement is to create a query that will concatenate the values in the ‘Txt’ column into a comma separated list. The query needs to return one row and one column so that the results pane in SQL Server Management Studio looks like this :
Solution 1 : Using a Recursive CTE
SQL Server 2005 opened up the possibility of using recursion within a SQL statement. Here is a recursive CTE that produces the required result :
If I run this query then I get the result I want : The first part of the CTE is the anchor that sets the initial condition and the second part (after the UNION ALL) is the recursive part. I won’t explain this further here, however there is an excellent article about this subject on MSDN titled 'Recursive Queries Using Common Table Expressions'
Solution 2 : Using PIVOT
The first instinct when confronted with a problem which involves converting rows to columns may be to use PIVOT, and indeed the following code uses the PIVOT operator. It returns the same result as that above :
I’ve used the PIVOT command to convert the rows to an equivalent number of columns and then concatenated the result together. Normally we would have some type of aggregation as the rows are pivoted to columns, hence the use of the MAX() here. Note that in the this example no aggregation is necessary so this could equally well be a MIN(). One thing that you may have noticed is that the code is specific for the number of rows in the table – the code would have to be modified if there was an extra row. This is a significant limitation of this approach.
Solution 3 : Using a WHILE loop or Cursor
In the procedural world the chosen solution would probably be some sort of looping construct. We can do the same thing in SQL Server using either a while loop or cursor. The following code uses a WHILE loop (this could be rewritten to use a cursor) :
Generally looping constructs in SQL are to be avoided as they can perform badly. However if the number of rows is small, as it is here, then this can be a useful approach. This solution is perhaps the easiest to understand and the most flexible, where performance isn’t an issue.
Solution 4 : Using SQL Concatenation
This solution is perhaps the most surprising, in that at first glance you might not expect it to work. The SQL below simply creates a variable then concatenates the value of the row. SQL iterates around each row in the table to produce the result. I adapted this solution from some code I found on a newsgroup a while ago, and was astonished that it worked. This behaviour is certainly unexpected, and also appears to be undocumented. As such it comes with a “health warning” as it maybe that Microsoft will remove the ability to do this in the future without warning. However for non-production code it’s a very simple solution.
Solution 5 : Using FOR XML PATH
Some of the XML statements introduced in SQL Server 2005 had to implement a means of looping around data in order to produce XML. This solution takes advantage of this, but strips out the XML specific parts to produce the comma separated list.
Solution 6 : Using the CLR
SQL Server 2005 introduced the ability to write logic using procedural code in C# or other dot net languages using the CLR. As the dot net framework provides richer capabilities for implementing logic such as string handling this is another possible way of implementing a solution.
I’ve given six possible solutions here. Each has its own merits and may be useful in differing circumstances. It’s likely that a real life requirement would be more complex than the simple example given here, so it wouldn’t be sensible to give a recommendation for which approach to adopt as this will vary according to the precise requirements. However the solution chosen is likely to be based on the performance required, code clarity and maintainability.