Microsoft Business Intelligence: Column Store Index in sql server

The column store index store the index in a column based format. A column store index is an index that was designed mainly for improving the query performance for workloads with very large amounts of data (eg. data warehouse fact tables). This type of index stores the index data in a column-based format rather than row-based as is done with traditional indexes.

A column store index is a type of index used in databases, especially in data warehouses, to optimize query performance for large datasets by storing data in a columnar format instead of the traditional row-based format. It is highly efficient for queries that aggregate, filter, or scan large volumes of data across multiple columns, such as those used in analytical workloads.

In sql server data are stored in the 8 KB pages. See the below

Sno	Name	Address	Mobile
1	Bagesh	Noida	888880XXXX
2	Rajesh	Pune	999990NNNN
3	Ganesh	Patna	888880PPPP
4	Ramesh	Chennai	666660NNNN

This data stored in the sql server as below

But when we will create column stored index data will be stored in the columnar format like below.

Creating column store index is taking much time because it re structure and doing compress. It is bad for the transactional table. It is the best for the OLAP system because in this system we are populating data at nightly.

There are two type of column stored index

Ø Cluster column store index

ü It must contains the all columns

ü This index stores the entire table in a columnar format, effectively replacing the traditional row-based storage of the table.

ü It is the only index on the table, and the table’s data is physically stored as columns.

ü Great for data warehouses and analytical queries where data is rarely modified but frequently read.

ü Row based index can be added on top of it.

ü We cannot create filter index on it.

Syntax

CREATE CLUSTERED COLUMNSTORE INDEX ix_name ON Table_name;

Ø Non cluster column store index

ü This index stores a separate columnar index on specific columns, while the rest of the table continues to be stored in a row-based format.

ü It allows both row-based transactional workloads (OLTP) and columnar queries (OLAP) to coexist.

ü We can create filter index on it.

ü Syntax

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_Indexname ON table (columns list);

Limitation of column store index

Ø Column store index is not support the below data type

ü Next

ü Text

ü Image

ü Nvarchar(max)

ü Rowversion

ü Timestamp

ü Sql variant

ü Xml

ü Unique identifier

Ø We can’t create this index on the view.

Ø We can’t alter this index, whenever we need to change the columns we need to drop and create the index.

Ø We can’t create trigger or cursor on the table where we have used cluster index.

Let’s see the example

For the demo we are using “AdventureWorks2019” database.

CREATE TABLE SalesOrderDetail_demo(

SalesOrderDetailID int IDENTITY(1,1) NOT NULL primary key,

CarrierTrackingNumber nvarchar(25) NULL,

OrderQty smallint NOT NULL,

ProductID int NOT NULL,

SpecialOfferID int NOT NULL,

UnitPrice money NOT NULL,

UnitPriceDiscount money NOT NULL,

LineTotal AS (isnull((UnitPrice*((1.0)-UnitPriceDiscount))*OrderQty,(0.0)))

)

Now we are inserting around 1.2 CR records on it.

insert into SalesOrderDetail_demo(

CarrierTrackingNumber,OrderQty

,ProductID,SpecialOfferID

,UnitPrice,UnitPriceDiscount)

select

CarrierTrackingNumber

,OrderQty,ProductID,SpecialOfferID

,UnitPrice,UnitPriceDiscount

from Sales.SalesOrderDetail;

go 100

We have create cluster index on the table.

Let see how much time will table the below query

set statistics io on

set statistics time on

select

ProductID

,Sum(OrderQty) as OrderQty,sum(SpecialOfferID) as SpecialOfferID

,sum(UnitPrice) as UnitPrice,sum(UnitPriceDiscount) as UnitPriceDiscount

from SalesOrderDetail_demo

group by ProductID

order by ProductID

Without any index

Taking time to execute this query.

Logical Read: 101270

IO Cost: 75.0172

Creating cluster index

Total execution time

Logical read : 101624

And IO cost

Now creating non cluster index on this table

create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID)

Execution time : 13577ms

Logic Read : 101624

IO cost : 74.9994

Now creating non cluster index on this table with includes column. Dropping the existing index and creating new one.

create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID)

include(OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount)

Execution time: 9402 ms

Logical Read: 54274

IO Cost: 40.1209

Now creating the non-cluster column store index

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo

(ProductID,OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount)

Execution time: 650 ms

Logical Read: 4577

IO Cost : .0342361

Cluster column store index

Dropping the all existing indexes and creating cluster column store index.

CREATE CLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo

Execution time: 638ms

Logical Read: 240

IO cost:.0342361

See the summary

Type of index	Execution time	Logical Read	IO Cost
Without any index	14952ms	101270	74.9994
Cluster index	15117ms	101624	74.9994
Non cluster index without includes columns	13577ms	101624	74.994
Non cluster index with includes columns	9402ms	54274	40.1209
Non cluster column stored index	650ms	4577	.0342361
cluster column stored index	638ms	240	.0342361

Here we can see the performance and we can say that Column stored index is the best but it is only for the OLAP system. If in that table we have insert, update or delete than it will be bad performance for Insert, Update and Delete because each time it is re arranging the data.

Hope this simple demo help to understand about column store index.

Microsoft Business Intelligence

Pages

Sunday, 29 September 2024

Column Store Index in sql server

No comments:

Post a Comment

Popular Posts