HOW TO REMOVE DUPLICATES IN GOOGLE SHEETS

Removing Duplicates Using the Default Function

Introduction to Removing Duplicates

Removing duplicates is one of the most important operations you can do when cleaning your data.

Every time we transfer data from one resource to another or merge different datasets, duplicates are bound to arise. Removing these text entries can lead to tremendous benefits like improved customer satisfaction, reduced labour costs and increased efficiency of your data management system.

In this post, we shall examine how to remove duplicates using two different methods. Please note that the images we have used are for illustration purposes only.

An illustration showing the process of removing of duplicates in Google Sheets.

Removing Duplicates Using the Default Function

A small spreadsheet section showing a list of fictional characters and their respective emails. It is being prepared for duplicate removal using the default Google Sheets function.

Google Sheets has an excellent function for removing duplicates and it can be great option for getting the job done quickly.

To begin, let us select our data as shown here.

The "Remove duplicates" function window in Google Sheets with default parameters selected.

In your spreadsheet menu, head to Data > Data clean-up > Remove duplicates. This will open a window where you will enter you will notice that the different columns are selected:

The "Remove duplicates" function window in Google Sheets with adjusted parameters selected.

Adjust the parameters to match what you want to do. In our example, we want to remove duplicates by Column B, so we shall make adjustments accordingly:

A section showing unique spreadsheet values in Column B.

When we click "Remove duplicates", Flookup will produce a dataset that is free of duplicates, as shown here.

Removing Duplicates Using Flookup

In some cases, you have a list of data that you merged from different sources and are absolutely sure that there are fuzzy duplicates in them. So, you scan through the list and find duplicates but, to your utter frustration, the duplicates are the kind that can be easily missed e.g. “Isaac Newton” vs “Netwon Issac”.

Using the default method discussed in the section before will not help with these kinds of duplicates. Therefore, to continue cleaning data, we shall use Flookup.

Let us assume that we want to remove duplicates by the column labelled "Client Name" in this image.

A sidebar that is used for removing duplicates by percentage similarity using Flookup.

In the table above, there are a couple of duplicates that have different spellings. By using Flookup, we can choose to remove them by percentage similarity or by sound similarity. In this example, we shall use the former.

Go to Extensions > Flookup Data Wrangler > Remove duplicates and click "By percentage". This will open the sidebar that is shown here.

A spreadsheet section showing unique spreadsheet values in column A.

Select Keep first identified duplicate value or Keep last identified duplicate value.
Leave the next option set to Compare data in a single column for this scenario.
Click “Map columns in selection” in order to get the current number columns in your selection.
Enter the "Left_column" index. In this case, we shall leave the value set at "1" since we aim to analyse the first column in our selection.
Adjust the "Threshold" value to 0.8.
Click "Remove duplicates".

All duplicates in the first column of the selection, with a percentage similarity of 0.8 between them, will be deleted.

HOW TO REMOVE DUPLICATES IN GOOGLE SHEETS

Introduction to Removing Duplicates

Removing Duplicates Using the Default Function

Removing Duplicates Using Flookup

Explore More