Saturday, 7 September 2024

Load CSV Files that have different number of columns using SSIS and dynamic pivot

In previous post we saw how to load the csv files that have different number of columns.

Read here: Load CSV Files that have different numberof columns using SSIS

https://bageshkumarbagi-msbi.blogspot.com/2024/08/load-csv-files-that-have-different.html

In this post we will see the approach to load the data by using the dynamic pivot. I would like to say special thanks to Kapil sir (https://www.linkedin.com/in/kapil-gupta-b12418/) and Om Shakya jee (https://www.linkedin.com/in/om-shakya-49759120/) who provided the solution to scrub such type of files and load it into the stage table after stage main table.

First of all using the SSIS we will load the data into the stg_RowData table. Below is row data.

Below is stage table

First of all we need to get the list of column from the tables.

DECLARE @columns NVARCHAR(max) = '';

SET @columns= (SELECT ',' + Quotename(col.NAME)

               FROM   sys.objects obj

                      JOIN sys.all_columns col

                        ON col.object_id = obj.object_id

               WHERE  obj.NAME = 'stg_survey'

               ORDER  BY column_id

               FOR xml path(''))

SET @columns = RIGHT(@columns, Len(@columns) - 1);

PRINT @columns

It will return the list of column


Read: STRING_SPLITfunction in SQL Server 2016

https://bageshkumarbagi-msbi.blogspot.com/2019/07/stringsplit-function-in-sql-server-2016.html

Below dynamic query to prepare the data and load it into the stage table.

DECLARE @columns NVARCHAR(max) = '',

        @sqlquery NVARCHAR(max) = '';

SET @columns= (SELECT ',' + Quotename(col.NAME)

               FROM   sys.objects obj

                      JOIN sys.all_columns col

                        ON col.object_id = obj.object_id

               WHERE  obj.NAME = 'stg_survey'

               ORDER  BY column_id

               FOR xml path(''))

SET @columns = RIGHT(@columns, Len(@columns) - 1);

SET @sqlquery=' ;WITH CTE AS (  SELECT * from  stg_RowData  ),

  CTE2 AS (  SELECT *, [value] AS csvdatavalue   FROM CTE  CROSS APPLY STRING_SPLIT(RowData, '','', 1) x 

                                                                join (SELECT col.column_id,col.name  from sys.objects obj

                                                                join sys.all_columns col  on col.object_id = obj.object_id where obj.name =''stg_survey'')

                                                                col on col.column_id=x.ordinal )

                                                                INSERT INTO stg_survey(' + @columns +')

                                                                SELECT   ' + @columns + ' FROM (  SELECT id, csvdatavalue, name  FROM CTE2 ) AS toPivot

                                                                PIVOT (  MAX(csvdatavalue)  FOR [name] IN ( '

              + @columns + ') ) AS piv;'

 

PRINT @sqlquery

                   

This will create the below dynamic query

 

;WITH cte

     AS (SELECT *  FROM   stg_rowdata),

     cte2

     AS (SELECT *, [value] AS csvdatavalue

         FROM   cte

                CROSS apply String_split(rowdata, ',', 1) x

                JOIN (SELECT col.column_id, col.NAME

                      FROM   sys.objects obj

                             JOIN sys.all_columns col

                               ON col.object_id = obj.object_id

                      WHERE  obj.NAME = 'stg_survey') col

                  ON col.column_id = x.ordinal)

INSERT INTO stg_survey

            ([sno], [voter_name], [voter_add], [constituency_name],

             [constituency_state],[pre_poll_date], [question_1],[answer_1],

             [question_2], [answer_2], [question_3], [answer_3],

             [question_4],[answer_4], [question_5], [answer_5])

SELECT [sno], [voter_name], [voter_add], [constituency_name],

             [constituency_state],[pre_poll_date], [question_1],[answer_1],

             [question_2], [answer_2], [question_3], [answer_3],

             [question_4],[answer_4], [question_5], [answer_5]

FROM   (SELECT id,

               csvdatavalue,

               NAME

        FROM   cte2) AS toPivot

       PIVOT ( Max(csvdatavalue)

             FOR [name] IN ([sno], [voter_name], [voter_add], [constituency_name],

             [constituency_state],[pre_poll_date], [question_1],[answer_1],

             [question_2], [answer_2], [question_3], [answer_3],

             [question_4],[answer_4], [question_5], [answer_5])

) AS piv;

Read: Dynamic pivot in the sql server

https://bageshkumarbagi-msbi.blogspot.com/2020/06/dynamic-pivot-in-sql-server.html

To execute the dynamic  query we are using the

EXECUTE sp_executesql

 Below is the final sql script

DECLARE @columns NVARCHAR(max) = '',

        @sqlquery NVARCHAR(max) = '';

SET @columns= (SELECT ',' + Quotename(col.NAME)

               FROM   sys.objects obj

                      JOIN sys.all_columns col

                        ON col.object_id = obj.object_id

               WHERE  obj.NAME = 'stg_survey'

               ORDER  BY column_id

               FOR xml path(''))

SET @columns = RIGHT(@columns, Len(@columns) - 1);

SET @sqlquery=' ;WITH CTE AS (  SELECT * from  stg_RowData  ),

                                                                CTE2 AS (  SELECT *, [value] AS csvdatavalue   FROM CTE  CROSS APPLY STRING_SPLIT(RowData, '','', 1) x 

                                                                join (SELECT col.column_id,col.name  from sys.objects obj

                                                                join sys.all_columns col  on col.object_id = obj.object_id where obj.name =''stg_survey'')

                                                                col on col.column_id=x.ordinal )

                                                                INSERT INTO stg_survey(' + @columns +')

                                                                SELECT   ' + @columns + ' FROM (  SELECT id, csvdatavalue, name  FROM CTE2 ) AS toPivot

                                                                PIVOT (  MAX(csvdatavalue)  FOR [name] IN ( '

              + @columns + ') ) AS piv;'

   -- execute the dynamic SQL

EXECUTE sp_executesql @sqlquery;

Before running this script see the data into the table.      

Now running this script.


See the records into the stage table.

Thanks for reading!

Popular Posts