EXTRACTING UNIQUE DATA IN GOOGLE SHEETS

Introduction to Extracting Data

Creating a clean record for datasets like contact addresses, marketing leads, employee records, etc. is critical to operations of any business. It vastly improves efficiency and reduces costs that would otherwise be high if left ignored. Here, we shall look at two different ways we can remove duplicates from any list when faced with such a challenge.

Extracting Values Using UNIQUE by Google

A section of a spreadsheet showing a list of text entries with exact duplicates

Google provides a default function called UNIQUE that can be used to remove duplicates.

Let us assume you have a dataset like the one here.

A spreadsheet section showing two columns of text entries: One has duplicates and the other has only unique values

In the example above, Column A contains some duplicate text entries. To extract unique values from this list, type the following formula in Cell B2 and press ENTER: =UNIQUE($A$2:$A$15)

This will return a list of unique values as shown below:

A section of a spreadsheet showing a list of text entries with duplicates that a spelled differently

UNIQUE has two optional parameters i.e. "by_column" and "exactly_once". You can read more about them in the official documentation.

This function might be convenient and sufficient for some cases but, unfortunately, real-world data is rarely this well-behaved. Instead, you are more likely to come across something that looks like this:

If you take a close look at the table above, you will notice that I have adjusted the duplicates we had before so that, this time, they will have different spellings. In such cases, the UNIQUE function will fail to extract the duplicates.

Extracting Values Using ULIST by Flookup

A section of a spreadsheet showing a list of text entries with fuzzy duplicates in one column and unique values in the other column

Fortunately, Flookup uniquely suited to handle this kind of problem. To extract duplicates from this dataset, we shall use the ULIST function.

Type the following formula in Cell B2 and press ENTER: =ULIST($A$2:$A$15)

Your data should look like this afterwards.

In the example above, ULIST considered any text entries with a 0.6 level of similarity to other text entries in the list, to be duplicates and this is its default behaviour. We are, therefore, left with a list of unique values despite having strange spelling differences in our original list.

ULIST has two optional parameters i.e. "indexNum" and "threshold". The former allows you to determine which column to extract unique values from and the latter allows you to set the level of similarity between text entries, above which they will be considered duplicates. You can read more about them in the official documentation.

Please note that ULIST will work down the list vertically, so it will retain unique values based on which text entry appeared first.