How to Highlight Fuzzy Duplicates in Google Sheets
Summary of Steps
Select a range.
Click the "Highlight duplicates" function via the spreadsheet menu.
Adjust the Column index, the column to analyse for duplicates.
Adjust the threshold, the percentage above which values are considered duplicates.
Click "Highlight duplicates".
Highlighting duplicates is one of the best ways to get an overview of the state of your dataset in Google Sheets. Unfortunately, most of the data we work with contains partial matches, punctuation marks or other variables which throw off traditional methods you might know of or use.
The advantage of using Flookup is that all these pitfalls can be avoided. Here's how:
We can see that the column showing cities (column B) has potential duplicates. In order to highlight duplicate rows in the entire data set based on this column, we start by selecting our range of interest and then heading to Add-ons > Highlight duplicates > By percentage as shown below:
In the next window we make necessary adjustments, specifically the column we want to analyse for duplicates and the minimum level of percentage similarity that we desire. This window also displays the number of columns you have selected to help guide you with adjusting the "Column index":
After running the function, duplicates are highlighted in yellow:
We can also highlight duplicates by similarity in sound. To do this, we follow the same steps shown above, but this time we access the function located at Add-ons > Highlight duplicates > By sound:
A window, which also shows the number of columns selected, pops up prompting you to enter the column to analyse:
After running the function, all duplicates are highlighted in aqua: