Tuesday, 17 November 2015

Deleting Duplicate Records from a table

It is one of the very important questions for sql server interview. How to delete the duplicate records from a table? Let’s learn how to achieve that. There is multiple ways to delete duplicate records.
Ø  Using Row Number function
Ø  Using sub query
Ø  Using self-join
Here I am using below table.
create table student
(
ID int Identity(1,1),
Fname varchar(50),
Lname varchar(50),
SSubject varchar(50),
Saddress varchar(100)
)

insert into student values('Bagesh','Kumar','Sql server','Pune')
insert into student values('Bagesh','Kumar','Sql server','Pune')
insert into student values('Rajesh','Kumar','Sql server','Chennai')
insert into student values('Rajesh','Kumar','Sql server','Chennai')
insert into student values('Mahesh','Kumar','Sql server','Pune')
insert into student values('Ganesh','Kumar','Sql server','Mumbai')
insert into student values('Anil','Kumar','Sql server','Mumbai')
insert into student values('Bagi','Kumar','Sql server','Pune')
insert into student values('Bagesh','Singh','Sql server','Pune')
insert into student values('Bagesh','Singh','Sql server','Pune')

In this table I have 10 records. Out of 10 records 3 records are duplicates.


Let’s start to learn how to delete duplicate records from table.

(A)   Using Row number function

select *,ROW_NUMBER() over
(PARTITION by Fname,Lname,ssubject,Saddress order by Fname,Lname,ssubject,Saddress)
 as RowNumber from student

 

Here I am using CTE to get the duplicate records.
with duplicate as
(
select *,ROW_NUMBER() over
(PARTITION by Fname,Lname,ssubject,Saddress
 order by Fname,Lname,ssubject,Saddress)
 as RowNumber from student
)
select * from duplicate where RowNumber>1

    
Here we get the duplicate records. Now we need to delete these records using below sql query.
with duplicate as
(
select *,ROW_NUMBER() over
(PARTITION by Fname,Lname,ssubject,Saddress
 order by Fname,Lname,ssubject,Saddress)
 as RowNumber from student
)
delete from duplicate where RowNumber>1
select * from student

  


(B)   Using sub query

Using below sql query we will get the duplicate records
select * from student where
ID NOT IN (select MAX(id) from student group by Fname,Lname,SSubject,saddress)

   


Now we need to delete these records using below sql query.
--using Sub query
DELETE from student
where ID NOT IN (select MAX(id) from student group by Fname,Lname,SSubject,saddress)



(C)   Using self-join

We can delete the duplicate records using self-join.
With the help of below sql query we will get the duplicate records.
--Using self join
select distinct a.Fname,a.Lname,a.SSubject,a.Saddress
from student a
JOIN student b
ON  a.Fname=b.Fname
AND a.Lname=b.Lname
AND a.SSubject=b.SSubject
AND a.Saddress=b.Saddress
and a.ID > b.ID


Now we need to delete these records.
--Using self join
DELETE from a
from student a
JOIN student b
ON  a.Fname=b.Fname
AND a.Lname=b.Lname
AND a.SSubject=b.SSubject
AND a.Saddress=b.Saddress
and a.ID > b.ID




Hope it will be help you. Thanks!!! 

2 comments:

If you have any doubt, please let me know.

Popular Posts