How to Remove Fuzzy Duplicates in Google Sheets
Summary of Steps
Type =ULIST inside any cell.
Type or select a range as the colArray.
(Optional) Enter indexNum, which is the column in colArray to analyse for unique values.
(Optional) Adjust the threshold, the percentage above which values are considered duplicates and removed.
Press Enter or Return.
Let's look at an example showing a simple and intuitive method of removing fuzzy duplicates (deduping) in Google Sheets using an Flookup function called ULIST. Duplicates can be a extremely costly and, therefore, removing them is an excellent way to improve data quality.
Suppose that you have a list with names of individuals, the cities they live in and their respective salaries as shown below:
This list obviously contains duplicates under the column titled "City". In order to extract a list of individuals without the duplicated cities, we can use the ULIST formula as shown below:
And our results will look like this:
However, as we can see, there is a duplicate that has not been caught because it contains a spelling error; that duplicate is "New Yorke" vs "New York". In order to resolve this, we need to adjust the threshold parameter as follows:
And finally, we have a clean list containing unique values only: