The column store index store the index in a column based format. A column store index is an index that was designed mainly for improving the query performance for workloads with very large amounts of data (eg. data warehouse fact tables). This type of index stores the index data in a column-based format rather than row-based as is done with traditional indexes.
A column store index is a type of index
used in databases, especially in data warehouses, to optimize query performance
for large datasets by storing data in a columnar format instead of the
traditional row-based format. It is highly efficient for queries that
aggregate, filter, or scan large volumes of data across multiple columns, such
as those used in analytical workloads.
In sql server data are stored in the 8 KB
pages. See the below
Sno |
Name |
Address |
Mobile |
1 |
Bagesh |
Noida |
888880XXXX |
2 |
Rajesh |
Pune |
999990NNNN |
3 |
Ganesh |
Patna |
888880PPPP |
4 |
Ramesh |
Chennai |
666660NNNN |
But when we will create column stored index data will be stored in the columnar format like below.
Creating column store index is taking much time because
it re structure and doing compress. It is bad for the transactional table. It
is the best for the OLAP system because in this system we are populating data
at nightly.
There are two
type of column stored index
Ø Cluster column store index
ü It must contains the all columns
ü This index stores the entire table in
a columnar format, effectively replacing the traditional row-based storage of
the table.
ü It is the only index on the table,
and the table’s data is physically stored as columns.
ü Great for data warehouses and
analytical queries where data is rarely modified but frequently read.
ü Row based index can be added on top
of it.
ü We cannot create filter index on it.
Syntax
CREATE CLUSTERED COLUMNSTORE INDEX ix_name ON Table_name;
Ø Non cluster column store index
ü This index stores a separate
columnar index on specific columns, while the rest of the table continues to be
stored in a row-based format.
ü It allows both row-based
transactional workloads (OLTP) and columnar queries (OLAP) to coexist.
ü We can create filter index on it.
ü Syntax
CREATE NONCLUSTERED
COLUMNSTORE INDEX ix_Indexname ON table (columns list);
Limitation of column
store index
Ø Column store index is not support
the below data type
ü Next
ü Text
ü Image
ü Nvarchar(max)
ü Rowversion
ü Timestamp
ü Sql variant
ü Xml
ü Unique identifier
Ø We can’t create this index on the
view.
Ø We can’t alter this index, whenever
we need to change the columns we need to drop and create the index.
Ø We can’t create trigger or cursor on
the table where we have used cluster index.
Let’s see the example
For the demo we are using “AdventureWorks2019”
database.
CREATE TABLE SalesOrderDetail_demo( SalesOrderDetailID
int IDENTITY(1,1) NOT NULL primary key, CarrierTrackingNumber
nvarchar(25) NULL, OrderQty smallint NOT NULL, ProductID int NOT NULL, SpecialOfferID
int NOT NULL, UnitPrice money NOT NULL, UnitPriceDiscount
money NOT NULL, LineTotal AS (isnull((UnitPrice*((1.0)-UnitPriceDiscount))*OrderQty,(0.0))) ) |
insert into SalesOrderDetail_demo( CarrierTrackingNumber,OrderQty ,ProductID,SpecialOfferID ,UnitPrice,UnitPriceDiscount) select CarrierTrackingNumber ,OrderQty,ProductID,SpecialOfferID ,UnitPrice,UnitPriceDiscount from Sales.SalesOrderDetail; go 100 |
We have create cluster index on the table.
Let see how much time will table the below
query
set statistics io on go set statistics time on go select ProductID ,Sum(OrderQty) as OrderQty,sum(SpecialOfferID) as SpecialOfferID ,sum(UnitPrice) as UnitPrice,sum(UnitPriceDiscount) as UnitPriceDiscount from SalesOrderDetail_demo group by ProductID order by ProductID |
Without any index
Taking time to execute this query.
Logical Read: 101270
IO Cost: 75.0172
Total execution time
Logical read : 101624
And IO cost
Now creating non cluster index on this table
create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID) |
Logic Read : 101624
Now creating non cluster index on this table with includes column. Dropping the existing index and creating new one.
create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID) include(OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount) |
Logical Read: 54274
IO Cost: 40.1209
Now creating the non-cluster column store index
CREATE NONCLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo (ProductID,OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount) |
Logical Read: 4577
IO Cost : .0342361
Cluster column store index
Dropping the all existing indexes and creating
cluster column store index.
CREATE CLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo |
Logical Read: 240
IO cost:.0342361
See the summary
Type of index |
Execution time |
Logical Read |
IO Cost |
Without any index |
14952ms |
101270 |
74.9994 |
Cluster index |
15117ms |
101624 |
74.9994 |
Non cluster index without
includes columns |
13577ms |
101624 |
74.994 |
Non cluster index with includes
columns |
9402ms |
54274 |
40.1209 |
Non cluster column stored
index |
650ms |
4577 |
.0342361 |
cluster column stored
index |
638ms |
240 |
.0342361 |
Here we can see the performance and we can say
that Column stored index is the best but it is only for the OLAP system. If in
that table we have insert, update or delete than it will be bad performance for
Insert, Update and Delete because each time it is re arranging the data.
Hope this simple demo help to understand about
column store index.
No comments:
Post a Comment
If you have any doubt, please let me know.