Documentation and Tips

Before You Begin

Fuzzy matching algorithms are processor intensive because each value of interest must be compared to every value in the lookup table. To speed up the process in Flookup, try the following:

  • Use NORMALIZE to remove those unwanted words that negatively affect the accuracy of the results, and to reduce the workload.

  • Enter ranges (e.g. A1:A1000) as the lookupValue value in order to significantly improve speed and efficiency. Doing this also prevents the spreadsheet from hanging.

  • Set the threshold value to 1 in order to eliminate the "low-hanging fruit" during your first run. You can gradually lower it for your consequent runs.

  • Take advantage of the the Long Run Mode. Custom functions are given exactly 30 seconds to return results. However, in Long Run Mode, they can run for 6 minutes (for regular Google accounts) or 30 minutes (for Google Workspace accounts) before timing out. To access this feature, click on the menu items located under Flookup > Long Run Mode.

NORMALIZE

=NORMALIZE(lookupValue, [stopArray])

Use the NORMALIZE function to remove diacritical marks from strings before applying FUZZYMATCH or any of the lookup functions. We highly recommend that you begin by using this function before running any analysis on your dataset.

The lookupValue parameter can either be:

    • A single cell e.g. =NORMALIZE(A1) or,

    • A range of cells e.g. =NORMALIZE(A1:A30000).

You can also use the following function to remove unwanted words. This is useful if you have words that you would like Flookup to ignore during processing. To do this, you need to feed the stopArray parameter with an array of words you want to ignore as shown below:

  • =NORMALIZE(A1:A30000, B1:B10)

  • =NORMALIZE(A1:A30000, "company, https, ltd, limited, org")

The second argument can be a range containing the unwanted words (e.g. B1:B500) or a list of directly typed words.

Note

This second parameter is optional and, therefore, if you do not include it in your formula, the function will default to removing diacritical marks. Also note that, in this mode, NORMALIZE will return a new list of words, without the unwanted words and without modifying the original list.

FUZZYMATCH

=FUZZYMATCH(string1, string2)

Use the FUZZYMATCH function to calculate the percentage similarity between strings and return the result in decimal form.

FUZZYMATCH Parameters

    • string1 [Required]. The string to be compared to string2.

    • string2 [Required]. The string to be compared to string1.

Here are a few things to note about this function:

    • The two arguments can be interchanged without affecting the result.

    • Numbers are treated as strings.

    • Both parameters can also be ranges e.g. =FUZZYMATCH(A1:A20000, B1:B20000) OR =FUZZYMATCH(A1, B1:B20000).

    • FUZZYMATCH supports multiple languages.

Using the Long Run Mode

  1. Lookup range: Select the lookup range and click "Get selected range" for the first slot.

  2. Table to search: Select the table/range you want to analyse and then click "Get selected range" for the second slot.

  3. Indicate the column with values you want to compare to values from First range, by adjusting by adjust the "Target column" parameter.

  4. Click an empty cell where you want your results to be displayed.

  5. Click "Calculate percentage similarities".

Note

  • When the function times out, the results that have been processed up to that point, will be displayed. Keep an eye on the tooltip popups to track progress.

FLOOKUP

=FLOOKUP(lookupValue, tableArray, lookupCol, indexNum, [threshold], [rank])

Use the FLOOKUP function to search any column of a range of cells, and then return a value from a specified column of that range. For example, suppose you have two separate lists of names contained in ranges A2:A3000 and B2:D5000. In order to compare the text in both these columns you can simply type:

=FLOOKUP(A2, B2:D5000, 1, 3)

This formula searches for the value found in cell A2 in the first column of the range B2:D5000, and then returns the first match contained in the third column of that same range.

In order to retrieve a value and there is a chance that it might be spelled differently in B2:D5000, one can type:

=FLOOKUP(A2, B2:D5000, 1, 3, 0.5)

This formula searches for the value found in cell A2 in the first column of the range B2:D5000 but, unlike the first example, this time the algorithm matches the first string with a 0.5 percentage similarity minimum. This parameter is adjustable and it takes any value from 0 to 1.

Finally, unlike other solutions, Flookup is not limited to searching the left most column of a specified range. For example, one can simply search for A2 in the second column of B2:D5000 and return data from the first column by using the formula below:

=FLOOKUP(A2, B2:D5000, 2, 1)

... or search for A2 in the second column of B2:D5000 and return data from the third column like this:

=FLOOKUP(A2, B2:D5000, 2, 3)

Pro Tip

For increased efficiency you can write your formula as shown below:

=FLOOKUP(A2:A1000, B2:D1000, 2, 3)

This variation takes an array (range of many cells) as the lookupValue, thereby improving performance.

FLOOKUP Parameters

    • lookupValue [Required]. The value to search for in any column of the table or range. The lookupValue argument can be a string, concatenated strings, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells (database) that contains the data. The values in the user-specified column of tableArray are systematically compared to lookupValue. These values can be text, numbers, or other values.

    • lookupCol [Required]. The column number in tableArray which must be searched by the algorithm. A lookupCol value of 1 searches the first column in tableArray, a lookupCol value of 2 searches second column in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The column number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first column in tableArray, an indexNum of 2 returns the value in the second column in tableArray, and so on. Values must be greater than 0.

    • threshold [Optional]. A variable that specifies whether you want Flookup to find an exact match or an approximate match. It is the similarity level below which results WILL NOT be returned from tableArray. Values range from 0 (lowest possible match) to 1 (exact match). The default value is 0.6.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked in order of descending percentage similarity.

Note 1

  • FLOOKUP returns the lookup result and its respective percentage similarity. Please allow 2 columns for this to work.

Using the Long Run Mode

  1. Select the lookup range and click "Get selected range" for the first slot.

  2. Select the table/range you want to analyse and then click "Get selected range" for the second slot.

  3. Make appropriate changes to the other parameters as desired.

      • Column to search: A column you want to search in a specified table,

      • Column to retrieve: A column you want to return values from, in a specified table.

      • Minimum similarity: The minimum percentage similarity to return.

      • Maximum similarity: The maximum percentage similarity to return.

  4. Click an empty cell where you want your results to be displayed.

  5. Click "Get fuzzy matches".

Note 2

  • The different mode names describe what they do:

    1. In the default mode, Flookup returns the best match.

    2. In the alternative mode, Flookup returns all possible matches for each lookup item.

  • When the function times out, the results that have been processed up to that point, will be displayed. Keep an eye on the tooltip popups to track progress.

HFLOOKUP

=HFLOOKUP(lookupValue, tableArray, lookupRow, indexNum, [threshold], [rank])

Use the HFLOOKUP function to search any row of a range of cells, and then return a value from a specified row of that range. It works exactly like a transposed FLOOKUP function.

HFLOOKUP Parameters

    • lookupValue [Required]. The value to search for in any row of the table or range. The lookupValue argument can be a string, concatenated strings, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells that you want to retrieve data from. The values in a user-specified row of tableArray are systematically compared to lookupValue. These values can be text, numbers, or other values.

    • lookupRow [Required]. The row number in tableArray which must be searched by the algorithm. A lookupRow value of 1 searches the first row in tableArray, a lookupRow value of 2 searches second row in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The row number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first row in tableArray, an indexNum of 2 returns the value in the second row in tableArray, and so on. Values must be greater than 0.

    • threshold [Optional]. A variable that specifies whether you want HFLOOKUP to find an exact match or an approximate match. It is the similarity level below which results WILL NOT be returned from tableArray. Values range from 0 (lowest possible match) to 1 (exact match). The default value is 0.6.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked in order of descending percentage similarity.

SOUNDMATCH

=SOUNDMATCH(lookupValue, tableArray, lookupCol, indexNum, [rank])

Use the SOUNDMATCH function to perform lookup tasks and return values based on what their Soundex codes are.

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It encodes similar sounding words so that they can be matched regardless of differences in spelling. You can find out more about it here.

SOUNDMATCH also allows you to return alternative matches using the rank parameter. These matches range from 1 (best match) to n (the nth best match).

SOUNDMATCH Parameters

    • lookupValue [Required]. The value to search in the column of the table or range. The lookupValue argument can be a string, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells (database) that contains the data. You can use a reference to a range. The values in the first column of tableArray are the values searched by lookupValue. These values can be text, numbers, or other values.

    • lookupCol [Required]. The column number in tableArray which must be searched by the algorithm. A lookupCol value of 1 searches the first column in tableArray, a lookupCol value of 2 searches second column in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The column number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first column in tableArray, an indexNum of 2 returns the value in the second column in tableArray, and so on. Values must be greater than 0.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked alphabetically (ascending order).

Using the Long Run Mode

  1. Select the lookup range and click "Get selected range" for the first slot.

  2. Select the table/range you want to analyse and then click "Get selected range" for the second slot.

  3. Make appropriate changes to the other parameters as desired.

      • Column to search: A column you want to search in a specified table,

      • Column to retrieve: A column you want to return values from, in a specified table.

      • Click an empty cell where you want your results to be displayed.

  4. Click "Get fuzzy matches".

Note

  • When the function times out, the results that have been processed up to that point, will be displayed. Keep an eye on the tooltip popups to track progress.

ULIST

=ULIST(colArray, [indexNum], [threshold])

Use ULIST to remove duplicates and return unique values from a range that you have specified. This function does not modify the original range or values and they can either be exact matches or not.

ULIST Parameters

    • colArray [Required]. The range from which you want to return unique values.

    • indexNum [Optional]. The column index to analyse for unique values. The default value is 1.

    • threshold [Optional]. The minimum percentage similarity between the colArray values that are not unique. Therefore a threshold value of 0.6 means that ULIST will look for partial matches and then eliminate any values with a 60 percent similarity and above. The default value is 1.

Highlighting Duplicate Values

Flookup can be used to highlight duplicates in Google Sheets based on matches from a single column.

To highlight duplicate values using Flookup, go to Add-ons > Flookup > Highlight duplicates and either click By percentage or By sound.

How to Highlight Duplicates By Percentage Similarity

With this function, Flookup highlights duplicate values which have a specific minimum percentage similarity between them. The steps to highlight duplicates by this method are:

    1. Select a range.

    2. Find and click the menu item labelled "By percentage".

    3. A prompt is displayed requesting the user to enter the "Target column" and the "Threshold value". If no user input is made, values which have a 100% similarity match between them will be found and the first column of the selected range will be analysed.

All selected values with the specified percentage similarity between them will be highlighted in yellow.

How to Highlight Duplicates By Sound Similarity

With this function, Flookup highlights duplicate values based on sound similarity between the them. The steps to highlight duplicates by this method are:

    1. Select a range.

    2. Find and click the menu item labelled "By sound".

    3. A prompt is displayed requesting the user to enter the "Target column" . If no user input is made, the first column of the selected range will be analysed.

All selected values with similar sounds will be highlighted in aqua.

Note

  • For both highlight modes, you can choose to highlight every single duplicate occurrence or to skip the first/topmost duplicate occurrence.

  • The Target column is the column that will be analysed and the Threshold value is the level of similarity below which items will not be considered duplicates.

Removing Duplicate Values

Flookup can be used to remove duplicates in Google Sheets based on matches from a single column.

To remove duplicates using Flookup, go to Add-ons > Flookup > Remove duplicates and either click By percentage or By sound.

How to Remove Duplicates By Percentage Similarity

With this function, Flookup finds and deletes selected rows that contain duplicate values with a specific minimum percentage similarity between them. The steps to remove duplicates by this method are:

    1. Select values in a range.

    2. Find and click the menu item labelled "By percentage".

    3. A prompt is displayed requesting the user to enter the "Target column" and the "Threshold value". If no user input is made, values which have a hundred percent similarity match between them will be found and the first column of the selected range will be analysed.

All selected rows containing duplicate values, with the specified percentage similarity between them, will be deleted.

How to Remove Duplicates By Sound Similarity

With this function, Flookup deletes selected rows that contain duplicate values based on the sound similarity between them. The steps to remove duplicates by this method are:

    1. Select a range.

    2. Find and click the menu item labelled "By sound".

    3. A prompt is displayed requesting the user to enter the "Target column". If no user input is made, the first column of the selected range will be analysed.

All selected rows containing values with similar sounds will be deleted.

Note

  • The Target column is the column that will be analysed and the Threshold value is the level of similarity below which items will not be considered duplicates.