Sunday, 13 October 2024

Key Lookup in SQL server

A Lookup happen when a query uses a non-clustered index but the index does not include the all column needed for the query. Unlike a RID Lookup, which occurs in the heap table, but key lookup happens when the table has a cluster index. SQL server must go back to the cluster index (using the cluster key) to fetch the missing column that are not included in the non-cluster index.

See the example.

Creating the table

create table Key_Lookup_Test

(

Key_Lookup_Test_ID int identity(1,1) not null primary key,

col_num1 varchar(20),

col_num2 varchar(20),

col_num3 varchar(20)

)

Creating a non-clustered index on this table.

create nonclustered index IX_01_Key_Lookup_Test_col_num1

on Key_Lookup_Test(col_num1)

Inserting 10k Records.

declare @i int=1;

while @i<=10000

begin

insert into Key_Lookup_Test(col_num1,col_num2,col_num3)

select 'col_num1'+cast(@i as varchar(10)),

                                'col_num2'+cast(@i as varchar(10)),

                                'col_num3'+cast(@i as varchar(10))

set @i=@i+1;

end;

Records inserted successfully.

Let’s on the execution plan and run the below query.

select key_Lookup_Test_id,col_num1,col_num2,col_num3

from key_Lookup_Test where col_num1='col_num110';

See the result.

Key lookup happen.

See the execution time.

It took 42 MS .

Let’s drop the existing index and creating the new index with included columns.

drop index key_Lookup_Test.IX_01_key_Lookup_Test_col_num1

create nonclustered index IX_01_Key_Lookup_Test_col_num1

on key_Lookup_Test(col_num1)

include (key_Lookup_Test_id,col_num2,col_num3)

Now running the same query. 

Key lookup gone.

See the execution time.


It took 1 MS.

RID Lookup in SQL server

RID (Row Identifier) Lookup occurs when a query use a non-cluster index to find row and the index does not included all the required column from the query. The database engine has to go back to the table to fetch the missing column. This is happen in the heap table. A table does not have the primary key or clustered index. When we query a non-clustered index and need to retrieve column that are not part of index, SQL server uses the RID lookup for the corresponding rows in the heap. This lookup affect the performance.

See the below example

Here we are creating a head table.

create table RID_Lookup_Test

(

RID_Lookup_Test int identity(1,1),

col_num1 varchar(20),

col_num2 varchar(20),

col_num3 varchar(20)

)

 

Table created successfully.

Now creating the non-cluster index on the col_num1 column.

create nonclustered index IX_01_RID_Lookup_Test_col_num1

on RID_Lookup_Test(col_num1)

Non-cluster index is created.

Now Inserting 10k records in this table.

declare @i int=1;

while @i<=10000

begin

insert into RID_Lookup_Test(col_num1,col_num2,col_num3)

select 'col_num1'+cast(@i as varchar(10)),

                                'col_num2'+cast(@i as varchar(10)),

                                'col_num3'+cast(@i as varchar(10))

set @i=@i+1;

end;

 10k records inserted successfully.

Let’s on the execution plan and run the below query.

select RID_Lookup_Test,col_num1,col_num2,col_num3

from RID_Lookup_Test where col_num1='col_num110';

 

See the result.

See the execution time for this query.

It took 46 MS

RID look up is one cause of performance degrade.

To overcome from RID Lookup we need to include the columns which are used in the select query.

Here I am dropping the existing index and creating the new index which include the all column which are in the select query.

drop index RID_Lookup_Test.IX_01_RID_Lookup_Test_col_num1

create nonclustered index IX_01_RID_Lookup_Test_col_num1

on RID_Lookup_Test(col_num1)

include (RID_Lookup_Test,col_num2,col_num3)

 Now running the same query

See the execution plan 

RID Look gone.

See the execution time

 

It took 1 MS.

See the performance impact.

Select * vs select columns performance (why select * not recommended in SQL server)

There are plenty of reasons why selecting all columns from a table with the SELECT * operator is not a good idea. It increases network traffic by transmitting columns that the client application does not need. It also makes query performance tuning more complicated, and it introduces side effects when the table schema changes. Don’t use * in SELECT query, instead use only required column. This is one of the tips to optimize SELECT query. However, does this really give better performance in SQL Server query processing?

This is especially important with row-overflow and LOB storage, when one row can have data stored in multiple data pages. SQL Server needs to read all of those pages, which can significantly decrease the performance of queries.

Let’s see the below example.

Here we have create a table with 4 column 2 columns have normal data pages and 2 columns have LOB data pages.

create  table LOB_Read_Test

(

LOB_Read_Test_ID int not null identity(1,1) PRIMARY Key,

Col_varchar varchar(128) not null,

Col_varchar_Max varchar(max) null,

Col_varbinary varbinary(max) null

);

Now inserting 50k Records in this table.

declare @i int =1;

while @i<=50000

begin

insert into LOB_Read_Test(Col_varchar,Col_varchar_Max,Col_varbinary)

select 'Col_varchar ' + cast(@i as varchar(10)),

replicate('a',20000),

convert(varbinary(max),replicate(convert(varchar(max),'b'),20000))

set @i=@i+1;

end

50K Records inserted successfully. It may take approx. 2 min.

Let’s on the IO and TIME Statistics on to see the IO and execution time for the running query.

SET STATISTICS  IO ON

SET STATISTICS  Time ON

Running the select * and see the time taken by it.

 

To getting the 50k records it took 133347 MS. And reading LOB Logical read: 542497 and LOB physical Read: 174997.

Suppose in our application we want only first two column LOB_Read_Test_ID, col_varchar column only. In this case if we write select * it will be very expensive. In place of select * we will use the select columns name. See below.  

See here for the 50k records it took only 887MS and only have Logical read 688. There is no any LOB reads happen.

As we can see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude slower than the second select. One case where this becomes extremely important is with client applications, which use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts of an application. As a result, an application may load all attributes/columns even though it does not need all of them in many cases. 

Performance Comparison

SELECT *

Specifying Columns

I/O and Memory

Retrieves all columns, causing higher I/O and memory usage, even if only a few columns are needed.

 

Retrieves only necessary columns, reducing resource consumption.

 

Query Optimization

SQL Server must read the entire row structure, even for unnecessary columns, leading to inefficiencies.

 

Allows the optimizer to streamline the query plan and improve performance.

 

Network Bandwidth

Sends more data over the network, which can cause delays.

 

Less data compare to select * send  to network , due to this it is faster than select *

 

Lob Physical read in SQL server

LOB (Large object) Logical reads refers to the number of pages reads from the Disk because these pages are not in buffer pool. LOB may be TEXT, NTEXT, IMAGE, XML, VARBINARY, VARCHAR (MAX), NVARCHAR (MAX) and FILESTREAM. Physical reads refers to pages of regular data like int , varchar, nvarchar, bigint , datetime etc while LOB is refer to read the LOB pages from Disk.

LOB data types are generally store across the multiple pages in the database due to that we are reading the LOB column it required more I/O Operation. To avoid selecting LOB data unnecessarily in our query. By reducing the LOB reads we can improve the overall performance of our query.

See the example

Creating a new table which have LOB columns.

create table LOB_Read_Test

(

LOB_Read_Test_ID int not null identity(1,1) PRIMARY Key,

Col_varchar varchar(128) not null,

Col_varchar_Max varchar(max) null,

Col_varbinary varbinary(max) null

);

Table created successfully. 

Now Inserting 10k records in this table.

declare @i int =1;

while @i<=10000

begin

insert into LOB_Read_Test(Col_varchar,Col_varchar_Max,Col_varbinary)

select 'Col_varchar ' + cast(@i as varchar(10)),

replicate('a',12000),

convert(varbinary(max),replicate(convert(varchar(max),'b'),12000))

set @i=@i+1;

end

10K records inserted successfully.

Now Reading the data from this table.

 

SET STATISTICS  IO ON

select * from LOB_Read_Test

  


Popular Posts