Showing posts with label Index. Show all posts
Showing posts with label Index. Show all posts

Sunday 29 September 2024

Column Store Index in sql server

The column store index store the index in a column based format. A column store index is an index that was designed mainly for improving the query performance for workloads with very large amounts of data (eg. data warehouse fact tables).  This type of index stores the index data in a column-based format rather than row-based as is done with traditional indexes.

A column store index is a type of index used in databases, especially in data warehouses, to optimize query performance for large datasets by storing data in a columnar format instead of the traditional row-based format. It is highly efficient for queries that aggregate, filter, or scan large volumes of data across multiple columns, such as those used in analytical workloads.

In sql server data are stored in the 8 KB pages. See the below

Sno

Name

Address

Mobile

1

Bagesh

Noida

888880XXXX

2

Rajesh

Pune

999990NNNN

3

Ganesh

Patna

888880PPPP

4

Ramesh

Chennai

666660NNNN

 This data stored in the sql server as below

But when we will create column stored index data will be stored in the columnar format like below.   

Creating column store index is taking much time because it re structure and doing compress. It is bad for the transactional table. It is the best for the OLAP system because in this system we are populating data at nightly.

There are two type of column stored index

Ø  Cluster column store index

ü  It must contains the all columns

ü  This index stores the entire table in a columnar format, effectively replacing the traditional row-based storage of the table.

ü  It is the only index on the table, and the table’s data is physically stored as columns.

ü  Great for data warehouses and analytical queries where data is rarely modified but frequently read.

ü  Row based index can be added on top of it.

ü  We cannot create filter index on it.

  Syntax

CREATE CLUSTERED COLUMNSTORE INDEX ix_name ON Table_name;

Ø  Non cluster column store index

ü  This index stores a separate columnar index on specific columns, while the rest of the table continues to be stored in a row-based format.

ü  It allows both row-based transactional workloads (OLTP) and columnar queries (OLAP) to coexist.

ü  We can create filter index on it.

ü  Syntax

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_Indexname ON table (columns list);

Limitation of column store index

Ø  Column store index is not support the below data type

ü  Next

ü  Text

ü  Image

ü  Nvarchar(max)

ü  Rowversion

ü  Timestamp

ü  Sql variant

ü  Xml

ü  Unique identifier

Ø  We can’t create this index on the view.

Ø  We can’t alter this index, whenever we need to change the columns we need to drop and create the index.

Ø  We can’t create trigger or cursor on the table where we have used cluster index.

Let’s see the example

For the demo we are using “AdventureWorks2019” database.

CREATE TABLE SalesOrderDetail_demo(               

                SalesOrderDetailID int IDENTITY(1,1) NOT NULL primary key,

                CarrierTrackingNumber nvarchar(25) NULL,

                OrderQty smallint NOT NULL,

                ProductID int NOT NULL,

                SpecialOfferID int NOT NULL,

                UnitPrice money NOT NULL,

                UnitPriceDiscount money NOT NULL,

                LineTotal  AS (isnull((UnitPrice*((1.0)-UnitPriceDiscount))*OrderQty,(0.0)))

                )

 Now we are inserting around 1.2 CR records on it.

insert into SalesOrderDetail_demo(

CarrierTrackingNumber,OrderQty

,ProductID,SpecialOfferID

,UnitPrice,UnitPriceDiscount)

select

CarrierTrackingNumber

,OrderQty,ProductID,SpecialOfferID

,UnitPrice,UnitPriceDiscount

from Sales.SalesOrderDetail;

go 100

We have create cluster index on the table.

Let see how much time will table the below query

set statistics io on

go

set statistics time on

go 

select

ProductID

,Sum(OrderQty) as OrderQty,sum(SpecialOfferID) as SpecialOfferID

,sum(UnitPrice) as UnitPrice,sum(UnitPriceDiscount) as UnitPriceDiscount

from SalesOrderDetail_demo

group by ProductID

order by ProductID

Without any index

Taking time to execute this query.

Logical Read: 101270

IO Cost: 75.0172

Creating cluster index

Total execution time


Logical read : 101624

And IO cost


Now creating non cluster index on this table

create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID)

 Execution time : 13577ms

Logic Read : 101624

IO cost : 74.9994

Now creating non cluster index on this table with includes column. Dropping the existing index and creating new one.

create nonclustered index ix_non_SalesOrderDetail_demo on SalesOrderDetail_demo (ProductID)

include(OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount)

 Execution time: 9402 ms


Logical Read: 54274

IO Cost: 40.1209

Now creating the non-cluster column store index

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo

(ProductID,OrderQty,SpecialOfferID,UnitPrice,UnitPriceDiscount)

 Execution time: 650 ms


Logical Read: 4577

IO Cost : .0342361

Cluster column store index

Dropping the all existing indexes and creating cluster column store index.

CREATE CLUSTERED COLUMNSTORE INDEX ix_SalesOrderDetail_demo_cl_clmn ON SalesOrderDetail_demo

 Execution time: 638ms


Logical Read: 240


IO cost:.0342361

See the summary

Type of index

Execution time

Logical Read

IO Cost

Without any index

14952ms

101270

74.9994

Cluster index

15117ms

101624

74.9994

Non cluster index without includes columns

13577ms

101624

74.994

Non cluster index with includes columns

9402ms

54274

40.1209

Non cluster column stored index

650ms

4577

.0342361

cluster column stored index

638ms

240

.0342361

Here we can see the performance and we can say that Column stored index is the best but it is only for the OLAP system. If in that table we have insert, update or delete than it will be bad performance for Insert, Update and Delete because each time it is re arranging the data.

Hope this simple demo help to understand about column store index.

Popular Posts