In previous post we saw how to load the csv files that have different number of columns.
Read here: Load CSV Files that have different numberof columns using SSIS
https://bageshkumarbagi-msbi.blogspot.com/2024/08/load-csv-files-that-have-different.html
In this post we
will see the approach to load the data by using the dynamic pivot. I would like
to say special thanks to Kapil sir (https://www.linkedin.com/in/kapil-gupta-b12418/)
and Om Shakya jee (https://www.linkedin.com/in/om-shakya-49759120/)
who provided the solution to scrub such type of files and load it into the
stage table after stage main table.
First of all using
the SSIS we will load the data into the stg_RowData
table. Below is row data.
Below is stage
table
First of all we
need to get the list of column from the tables.
DECLARE @columns NVARCHAR(max) = ''; SET @columns= (SELECT ',' + Quotename(col.NAME) FROM sys.objects obj JOIN sys.all_columns col ON col.object_id = obj.object_id WHERE obj.NAME = 'stg_survey' ORDER BY column_id FOR xml path('')) SET @columns = RIGHT(@columns, Len(@columns) - 1); PRINT @columns |
It will return the
list of column
Read: STRING_SPLITfunction in SQL Server 2016
https://bageshkumarbagi-msbi.blogspot.com/2019/07/stringsplit-function-in-sql-server-2016.html
Below dynamic query
to prepare the data and load it into the stage table.
DECLARE @columns NVARCHAR(max) = '', @sqlquery NVARCHAR(max) = ''; SET @columns= (SELECT ',' + Quotename(col.NAME) FROM sys.objects obj JOIN sys.all_columns col ON col.object_id = obj.object_id WHERE obj.NAME = 'stg_survey' ORDER BY column_id FOR xml path('')) SET @columns = RIGHT(@columns, Len(@columns) - 1); SET @sqlquery=' ;WITH CTE AS ( SELECT *
from stg_RowData ), CTE2
AS ( SELECT *, [value] AS
csvdatavalue FROM CTE CROSS APPLY STRING_SPLIT(RowData, '','', 1)
x join
(SELECT col.column_id,col.name from
sys.objects obj join
sys.all_columns col on col.object_id =
obj.object_id where obj.name =''stg_survey'') col
on col.column_id=x.ordinal ) INSERT
INTO stg_survey(' + @columns +') SELECT ' + @columns + ' FROM ( SELECT
id, csvdatavalue, name FROM CTE2 ) AS
toPivot PIVOT
( MAX(csvdatavalue) FOR [name] IN ( ' + @columns + ') ) AS piv;' PRINT @sqlquery |
This will create
the below dynamic query
;WITH cte AS (SELECT *
FROM stg_rowdata), cte2 AS (SELECT *, [value] AS csvdatavalue FROM cte CROSS apply String_split(rowdata, ',', 1) x JOIN (SELECT col.column_id, col.NAME FROM sys.objects obj
JOIN sys.all_columns col
ON col.object_id = obj.object_id WHERE obj.NAME = 'stg_survey') col ON col.column_id = x.ordinal) INSERT INTO stg_survey ([sno], [voter_name], [voter_add], [constituency_name],
[constituency_state],[pre_poll_date], [question_1],[answer_1], [question_2], [answer_2], [question_3], [answer_3], [question_4],[answer_4], [question_5], [answer_5]) SELECT [sno], [voter_name], [voter_add], [constituency_name],
[constituency_state],[pre_poll_date], [question_1],[answer_1], [question_2], [answer_2], [question_3], [answer_3], [question_4],[answer_4], [question_5], [answer_5] FROM (SELECT id, csvdatavalue, NAME FROM cte2) AS toPivot PIVOT ( Max(csvdatavalue) FOR [name] IN ([sno], [voter_name], [voter_add], [constituency_name],
[constituency_state],[pre_poll_date], [question_1],[answer_1], [question_2], [answer_2], [question_3], [answer_3], [question_4],[answer_4], [question_5], [answer_5]) ) AS piv; |
Read: Dynamic pivot in the sql server
https://bageshkumarbagi-msbi.blogspot.com/2020/06/dynamic-pivot-in-sql-server.html
To execute the dynamic query we are using the
EXECUTE sp_executesql |
Below is the final sql script
DECLARE @columns NVARCHAR(max) = '', @sqlquery NVARCHAR(max) = ''; SET @columns= (SELECT ',' + Quotename(col.NAME) FROM sys.objects obj JOIN sys.all_columns col ON col.object_id = obj.object_id WHERE obj.NAME = 'stg_survey' ORDER BY column_id FOR xml path('')) SET @columns = RIGHT(@columns, Len(@columns) - 1); SET @sqlquery=' ;WITH CTE AS ( SELECT *
from stg_RowData ), CTE2
AS ( SELECT *, [value] AS
csvdatavalue FROM CTE CROSS APPLY STRING_SPLIT(RowData, '','', 1)
x join
(SELECT col.column_id,col.name from
sys.objects obj join
sys.all_columns col on col.object_id =
obj.object_id where obj.name =''stg_survey'') col
on col.column_id=x.ordinal ) INSERT
INTO stg_survey(' + @columns +') SELECT ' + @columns + ' FROM ( SELECT
id, csvdatavalue, name FROM CTE2 ) AS
toPivot PIVOT
( MAX(csvdatavalue) FOR [name] IN ( ' + @columns + ') ) AS piv;' EXECUTE sp_executesql @sqlquery; |
Before running this script see the data into the table.
Now running this script.
See the records into the stage table. Thanks for reading!