There are plenty of reasons why selecting all columns from a table with the SELECT * operator is not a good idea. It increases network traffic by transmitting columns that the client application does not need. It also makes query performance tuning more complicated, and it introduces side effects when the table schema changes. Don’t use * in SELECT query, instead use only required column. This is one of the tips to optimize SELECT query. However, does this really give better performance in SQL Server query processing?
This is especially important with
row-overflow and LOB storage, when one row can have data stored in multiple
data pages. SQL Server needs to read all of those pages, which can
significantly decrease the performance of queries.
Let’s see the below example.
Here we have create a table with 4 column 2 columns have normal
data pages and 2 columns have LOB data pages.
create
table LOB_Read_Test ( LOB_Read_Test_ID int not null identity(1,1) PRIMARY Key, Col_varchar varchar(128) not null, Col_varchar_Max varchar(max) null, Col_varbinary varbinary(max) null ); |
Now inserting 50k Records in this table.
declare @i int =1; while @i<=50000 begin insert into LOB_Read_Test(Col_varchar,Col_varchar_Max,Col_varbinary) select 'Col_varchar ' + cast(@i as varchar(10)), replicate('a',20000), convert(varbinary(max),replicate(convert(varchar(max),'b'),20000)) set @i=@i+1; end |
50K Records inserted successfully. It may take approx. 2
min.
Let’s on the IO and TIME Statistics on to see the IO and
execution time for the running query.
SET STATISTICS IO ON SET STATISTICS Time ON |
Running the select * and see the time taken by it.
Suppose in our application we want only first two column LOB_Read_Test_ID, col_varchar column only. In this case if we write select * it will be very expensive. In place of select * we will use the select columns name. See below.
See here for the 50k records it took only 887MS and only
have Logical read 688. There is no any LOB reads happen.
As we can see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude slower than the second select. One case where this becomes extremely important is with client applications, which use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts of an application. As a result, an application may load all attributes/columns even though it does not need all of them in many cases.
Performance
Comparison |
SELECT
* |
Specifying
Columns |
I/O and Memory |
Retrieves all columns, causing higher I/O and memory
usage, even if only a few columns are needed. |
Retrieves only necessary columns, reducing resource
consumption. |
Query Optimization |
SQL Server must read the entire row structure, even for unnecessary
columns, leading to inefficiencies. |
Allows the optimizer to streamline the query plan and improve
performance. |
Network Bandwidth |
Sends more data over the network, which can cause
delays. |
Less data compare to select * send to network , due to this it is faster than
select * |
No comments:
Post a Comment
If you have any doubt, please let me know.