HOW TO USE SPREADSHEET FUNCTIONS
Introduction to Spreadsheet Functions
These functions can be accessed or activated by typing specific formulas directly inside any spreadsheet cell.
They offer a significant advantage as they seamlessly integrate into your existing workflow. They are particularly useful for routine data cleaning tasks, enabling you to efficiently automate the same. Moreover, spreadsheet functions are universally accessible, whether you are using a phone, tablet, or PC ensuring that you work effectively on any platform.
IMPORTANT NOTE: If an optional parameter lies between two required parameters, simply leave them blank as shown below:
=MYFUNCTION(required, $, $, required, $)
In this example, I have used $ as a placeholder to identify positions where the optional parameters would be. In your actual formula, these should be blank.
NORMALIZE
=NORMALIZE(lookup_value, stop_array, threshold, operation)
Use the NORMALIZE function to remove unwanted words, diacritics and punctuation marks from text entries. This improves the data cleaning process by removing parts of text entries or whole text entries that could negatively affect the accuracy of the fuzzy matching process. Moreover, with less data to process, the speed of the data cleaning process could be greatly improved.
NORMALIZE Parameters
lookup_value: This refers to the text entries that are to be normalized. It can be a reference to one cell e.g. A1 or a range of multiple cells in a single column e.g. A1:A100.
stop_array: This optional parameter refers to the values that you wish to eliminate from the "lookup_value" entry. It could comprise a group of unwanted words that are separated by commas or punctuation marks that have no delimiters. These values should be contained within a single cell.
Be careful not to mix unwanted words with punctuation marks in your "stop_array" input.
threshold: This optional parameter refers to the minimum percentage similarity between parts of the "lookup_value" that you would like to remove and the values in the "stop_array". The default value is 1, which means that only exact matches will be removed from the "lookup_value" entry.
operation: This refers to the operation that you would like the function to run. Your options are "text" to remove stop words, "punctuations" to remove punctation marks and "diacritics" to remove diacritical marks. Removing diacritical marks is the default operation and, therefore, all you need to do is enter the range to process in your formula e.g. =NORMALIZE(A5)
FUZZYMATCH
=FUZZYMATCH(left_string, right_string)
Use the FUZZYMATCH function to calculate the percentage similarity between text entries and return the result in decimal form.
You can compare single cells or ranges to one another e.g. A1 vs B1, A1 vs B1:B100 or A1:A100 vs B1:B100 but "left_string" should never hold more rows of data than "right_string".
FUZZYMATCH Parameters
left_string: The primary text entry to be compared to "right_string".
right_string: The secondary text entry to be compared to "left_string".
FLOOKUP
=FLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold, rank)
Use the FLOOKUP function to search any column of a range of cells, and then return a value from a specified column of that range, paired with its respective percentage similarity.
FLOOKUP Parameters
lookup_value: This determines the specific value or values you are looking for within any column of the "table_array" parameter. The "lookup_value" can be a reference to one cell e.g. A1 or a range covering multiple rows e.g. A1:A500.
table_array: This determines the range of cells that contains the data to be returned or merged. The values in the specified or target column of "table_array" are methodically compared to the values in "lookup_value". The "table_array" should be a reference to a range e.g. B1:E500.
lookup_col: This determines the index of the "table_array" column that this function will run through. A "lookup_col" value of 1 means the function will search the first column in "table_array", a value of 2 will search the second column, and so on. These values must always be greater than 0.
index_num: This determines the column index in the "table_array" from which data should be returned. If the "index_num" is 1, then values from the first column in the "table_array" are returned. If it is 2, then values from the second column are returned, and so on. These values must always be greater than 0.
threshold: This an optional parameter that determines whether this function should return exact or approximate matches. It represents the similarity level below which results will not be returned from "table_array". Valid values range from 0 for no filtering to 1 for exact matches and the default value is set to 0.6.
rank: This optional parameter determines the rank of the potential matches that you want to be returned from "table_array". The values can range from 1, which corresponds to the first possible match, to n, which corresponds to the nth possible match. The default value is 1, and the results are ranked in descending order based on their percentage similarity.
Derivatives of FLOOKUP
There are two derivatives of the FLOOKUP function that are designed to cater for special lookup and merge cases that the former does not do. Below is an overview of how they can be helpful:
QLOOKUP
=QLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold)
The QLOOKUP function is also works like FLOOKUP, but it will only return the lookup result without the relative percentage similarity. It also only returns the best match in the dataset, therefore, has no option to return other matches for each lookup item.
MLOOKUP
=MLOOKUP(lookup_value, table_array, lookup_col, threshold, rank)
This function works exactly like FLOOKUP, but with one difference:
Whereas FLOOKUP returns a matched value from a cell, MLOOKUP will instead return the entire row of "table_array" in which the match is located. This makes it very useful in merging different lists or tables with multiple data points.
Therefore, when using MLOOKUP, you are advised to leave an appropriate number of columns for all the results to be displayed without overriding any of your pre-existing data.
SOUNDMATCH
=SOUNDMATCH(lookup_value, table_array, lookup_col, index_num, rank)
Use the SOUNDMATCH function to lookup values in any column and return values based on a refined version of the Soundex algorithm.
SOUNDMATCH Parameters
lookup_value: The value to search in the column of the "tableArray". The "lookupValue" argument can be a text entry, a single cell reference or a range of many cells.
table_array: The range of cells that contains data that will be returned or merged. The values in the user-specified column of "tableArray" are systematically compared to "lookupValue". These values can be text, numbers or anything else.
lookup_col: The column index in "tableArray" which will be searched by the function. A "lookupCol" value of 1 searches the first column in "tableArray", a "lookupCol" value of 2 searches second column in "tableArray", and so on. Values must be greater than 0.
index_num: The column index in the "tableArray" argument from which the matching value must be returned. An "indexNum" argument of 1 returns values in the first column in "tableArray", an "indexNum" of 2 returns values in the second column in "tableArray", and so on. Values must be greater than 0.
rank: An optional variable that indexes the rank of all the possible matches that you want returned from "tableArray". Values can range from 1, meaning the best possible match, to the nth possible match. The default value is 1 and matched results are sorted alphabetically, in ascending order.
ULIST
=ULIST(col_array, index_num, threshold, operation)
You can use the ULIST function to return unique values from any list of text entries based on their relative percentage or sound similarities. If you like the UNIQUE function in Google Sheets, you will love this function.
ULIST Parameters
col_array: This a range of one or more columns from which you intend to extract unique text entries and discard duplicates e.g. A1:C200.
index_num: This is an index number that represents the column in "col_array" that is going to be analysed for duplicates with the view of extracting unique values. If no input is made, the first column of "col_array" will be analysed.
threshold: This optional variable determines whether ULIST should extract exact or approximate matches. It represents the relative percentage similarity between any two text entries in the target column of "col_array", below which text entries are considered unique and above which text entries are considered to be duplicates. The default value is set at 0.6.
operation: This determines the method ULIST will use to extract unique values from your dataset. The values are by_percentage for extracting by percentage similarity and by_sound for extracting by sound similarity. The former is the default function, meaning that ULIST will choose it even if it is not specified in the formula.
Error Messages Explained
!SubscriptionInactive: You have not activated Flookup or you are using the wrong user account.
!IndexError: You have entered an index number that is less than 1 or greater than the number of columns available in the search table.
!RankError: You have entered a rank value that is less than 1.
!NoMatchFound: There are no matches available for the current parameters you have set.
!ThresholdError: You have entered a figure that is less than 0 or greater than 1.
Exceeded maximum execution time: This is a built-in Google Sheets error that tells us that the function in question has run for too long. You can resolve it by reducing the number of rows to process for each execution or entering ranges instead of single cells for the functions to process.
#NAME: This is a Google Sheets error that indicates that the function is not recognised. It typically resolves by itself, but you can quicken the process by clearing a portion of the affected formulas and using the "Undo" button or "ctrl + Z" to restore them.