HOW TO USE CUSTOM FUNCTIONS TO CLEAN DATA

NORMALIZE

=NORMALIZE(lookup_value, stop_array, threshold, "operation")

Use the NORMALIZE function to remove unwanted words, diacritics and punctuation marks from text entries. This improves the data cleaning process by removing parts of text entries or whole text entries that could negatively affect the accuracy of the fuzzy matching process. Moreover, with less data to process, the speed of the data cleaning process could be greatly improved.

NORMALIZE Parameters

Be careful not to mix unwanted words with punctuation marks in your "stop_array" input.

Your options are "text" to remove stop words, "punctuations" to remove punctation marks and "diacritics" to remove diacritical marks.

Removing diacritical marks is the default operation and, therefore, all you need to do is enter the range to process in your formula e.g. =NORMALIZE(A5)

FUZZYMATCH

=FUZZYMATCH(left_string, right_string)

Use the FUZZYMATCH function to calculate the percentage similarity between text entries and return the result in decimal form. 

You can compare single cells or ranges to one another e.g. A1 vs B1, A1 vs B1:B100 or A1:A100 vs B1:B100 but "left_string" should never hold more rows of data than "right_string".

FUZZYMATCH Parameters

FLOOKUP

=FLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold)

Use the FLOOKUP function to search any column of a range of cells, and then return a value from a specified column of that range, paired with its respective percentage similarity. 

FLOOKUP Parameters

QLOOKUP

=QLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold)

The QLOOKUP function is a variant of the FLOOKUP function, but it only returns the lookup result without the relative percentage similarity.

SOUNDMATCH

=SOUNDMATCH(lookup_value, table_array, lookup_col, index_num)

Use the SOUNDMATCH function to lookup values in any column and return values based on a refined version of the Soundex algorithm.

SOUNDMATCH Parameters

A "lookup_col" value of 1 searches the first column in "table_array", a "lookup_col" value of 2 searches second column in "table_array", and so on. Values must be greater than 0.

An "index_num" argument of 1 returns values in the first column in "table_array", an "index_num" of 2 returns values in the second column in "table_array", and so on. Values must be greater than 0.

ULIST

=ULIST(col_array, index_num, threshold, "operation")

You can use the ULIST function to return unique values from any list of text entries based on their relative percentage or sound similarities. If you like the UNIQUE function in Google Sheets, you will love this function.

ULIST Parameters

The values are by_percentage for extracting by percentage similarity and by_sound for extracting by sound similarity. The former is the default function, meaning that ULIST will choose it even if it is not specified in the formula.

Possible Error Messages

Explore More

The following are quick links to variants of each of the functions shown here. They offer a more performant data cleaning experience and are accessible via the spreadsheet menu: