How to Remove Duplicate Rows in MySQL

January 22, 2025

1

Duplicate rows in a database can lead to inaccurate reporting, wasted storage, and degraded application performance. Removing these duplicates is essential for maintaining data integrity and ensuring efficient database operations. In this blog post, we’ll learn different methods to remove duplicate rows in MySQL.

Why Do Duplicate Rows Occur?

Duplicate rows can occur for various reasons:

Poor data import processes
Application logic issues
Manual data entry errors

Identifying Duplicate Rows

Before removing duplicates, it’s important to identify them. You can use the GROUP BY clause along with aggregate functions like COUNT to find duplicates. For example:

SELECT column1, column2, COUNT(*)
FROM your_table
GROUP BY column1, column2
HAVING COUNT(*) > 1;

This query groups rows by specific columns and identifies rows that occur more than once.

Methods to Remove Duplicate Rows

1. Using `DELETE` with a Subquery

You can delete duplicates by retaining only one row for each duplicate group. Here’s an example:

DELETE FROM your_table
WHERE id NOT IN (
SELECT MIN(id)
FROM your_table
GROUP BY column1, column2
);

MIN(id) ensures that the smallest id value is retained for each group of duplicates.
Replace column1 and column2 with the columns that define a duplicate.

2. Using a Temporary Table

Another approach is to create a temporary table to store unique rows and then replace the original table:

CREATE TEMPORARY TABLE temp_table AS
SELECT DISTINCT *
FROM your_table;

TRUNCATE TABLE your_table;

INSERT INTO your_table
SELECT * FROM temp_table;

DROP TABLE temp_table;

3. Using `ROW_NUMBER()` (MySQL 8.0+)

If you’re using MySQL 8.0 or later, you can leverage the ROW_NUMBER() window function to identify and delete duplicates:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
FROM your_table
)
DELETE FROM your_table
WHERE id IN (
SELECT id FROM cte WHERE row_num > 1
);

PARTITION BY specifies the columns that define duplicates.
ROW_NUMBER() assigns a unique number to each row within the duplicate groups.
Rows with row_num > 1 are considered duplicates and are deleted.

Best Practices

Backup Your Data: Always create a backup of your table before performing delete operations.
Test in a Non-Production Environment: Test your queries on a staging or development database first.
Indexing: Ensure that the relevant columns have proper indexes to optimize query performance.

Removing duplicate rows in MySQL is a straightforward process if approached carefully. By using methods like DELETE with subqueries, temporary tables, or the ROW_NUMBER() function, you can effectively clean your data and improve database performance. Always ensure you have a backup and test thoroughly to avoid accidental data loss.

Have questions or suggestions? Let us know in the comments below!

How to Remove Duplicate Rows in MySQL

Why Do Duplicate Rows Occur?

Identifying Duplicate Rows

Methods to Remove Duplicate Rows

1. Using `DELETE` with a Subquery

2. Using a Temporary Table

3. Using `ROW_NUMBER()` (MySQL 8.0+)

Best Practices

Alternative to iFrames with HTML5

Understanding Various Flags in Bash

What Is Yugoslavia Called Now?

Leave a ReplyCancel reply

Most Popular

What Does the Word “Wicked” Stand for in the Book Maze Runner?

What is the Difference Between Lexis and Grammar?

What Is the Scientific Name for Wild Turkeys?

What is Rubber Made of?

Recent Comments

Python | Counter Objects | elements()

How to Use PySpark

Can you explain the use of memset() in C with examples?

How to Remove Duplicate Rows in MySQL

Why Do Duplicate Rows Occur?

Identifying Duplicate Rows

Methods to Remove Duplicate Rows

1. Using DELETE with a Subquery

2. Using a Temporary Table

3. Using ROW_NUMBER() (MySQL 8.0+)

Best Practices

Related posts:

Leave a ReplyCancel reply

Most Popular

Recent Comments

1. Using `DELETE` with a Subquery

3. Using `ROW_NUMBER()` (MySQL 8.0+)