Sunday, 13 October 2024

Select * vs select columns performance (why select * not recommended in SQL server)

There are plenty of reasons why selecting all columns from a table with the SELECT * operator is not a good idea. It increases network traffic by transmitting columns that the client application does not need. It also makes query performance tuning more complicated, and it introduces side effects when the table schema changes. Don’t use * in SELECT query, instead use only required column. This is one of the tips to optimize SELECT query. However, does this really give better performance in SQL Server query processing?

This is especially important with row-overflow and LOB storage, when one row can have data stored in multiple data pages. SQL Server needs to read all of those pages, which can significantly decrease the performance of queries.

Let’s see the below example.

Here we have create a table with 4 column 2 columns have normal data pages and 2 columns have LOB data pages.

create  table LOB_Read_Test

(

LOB_Read_Test_ID int not null identity(1,1) PRIMARY Key,

Col_varchar varchar(128) not null,

Col_varchar_Max varchar(max) null,

Col_varbinary varbinary(max) null

);

Now inserting 50k Records in this table.

declare @i int =1;

while @i<=50000

begin

insert into LOB_Read_Test(Col_varchar,Col_varchar_Max,Col_varbinary)

select 'Col_varchar ' + cast(@i as varchar(10)),

replicate('a',20000),

convert(varbinary(max),replicate(convert(varchar(max),'b'),20000))

set @i=@i+1;

end

50K Records inserted successfully. It may take approx. 2 min.

Let’s on the IO and TIME Statistics on to see the IO and execution time for the running query.

SET STATISTICS  IO ON

SET STATISTICS  Time ON

Running the select * and see the time taken by it.

 

To getting the 50k records it took 133347 MS. And reading LOB Logical read: 542497 and LOB physical Read: 174997.

Suppose in our application we want only first two column LOB_Read_Test_ID, col_varchar column only. In this case if we write select * it will be very expensive. In place of select * we will use the select columns name. See below.  

See here for the 50k records it took only 887MS and only have Logical read 688. There is no any LOB reads happen.

As we can see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude slower than the second select. One case where this becomes extremely important is with client applications, which use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts of an application. As a result, an application may load all attributes/columns even though it does not need all of them in many cases. 

Performance Comparison

SELECT *

Specifying Columns

I/O and Memory

Retrieves all columns, causing higher I/O and memory usage, even if only a few columns are needed.

 

Retrieves only necessary columns, reducing resource consumption.

 

Query Optimization

SQL Server must read the entire row structure, even for unnecessary columns, leading to inefficiencies.

 

Allows the optimizer to streamline the query plan and improve performance.

 

Network Bandwidth

Sends more data over the network, which can cause delays.

 

Less data compare to select * send  to network , due to this it is faster than select *

 

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts