How to Remove Fuzzy Duplicates in Google Sheets

Summary of Steps

    1. Type =ULIST inside any cell.

    2. Type or select a range as the colArray.

    3. (Optional) Enter indexNum, which is the column in colArray to analyse for unique values.

    4. (Optional) Adjust the threshold, the percentage above which values are considered duplicates and removed.

    5. Press Enter or Return.

Let's look at an example showing a simple and intuitive method of removing fuzzy duplicates (deduping) in Google Sheets using an Flookup function called ULIST. Duplicates can be a extremely costly and, therefore, removing them is an excellent way to improve data quality.

Suppose that you have a list with names of individuals, the cities they live in and their respective salaries as shown below:

Dedupe Google Sheets Add-on

This list obviously contains duplicates under the column titled "City". In order to extract a list of individuals without the duplicated cities, we can use the ULIST formula as shown below:

Dedupe Google Sheets Add-on

And our results will look like this:

Dedupe Google Sheets Add-on

However, as we can see, there is a duplicate that has not been caught because it contains a spelling error; that duplicate is "New Yorke" vs "New York". In order to resolve this, we need to adjust the threshold parameter as follows:

Dedupe Google Sheets Add-on

And finally, we have a clean list containing unique values only:

Dedupe Google Sheets Add-on

If you are a fan of Google's UNIQUE function, then you will love the power ULIST brings to your spreadsheet.