Documentation and Tips

Before You Begin

Fuzzy matching algorithms are processor intensive because each value of interest must be compared to every value in the lookup table. To speed up the process in Flookup, try the following:

  • Use NORMALIZE to remove those unwanted words that negatively affect the accuracy of the results, and to reduce the workload.

  • Enter ranges (e.g. A1:A1000) as the lookupValue value in order to significantly improve speed and efficiency. Doing this also prevents the spreadsheet from hanging.

  • Set the threshold value to 1 in order to eliminate the "low-hanging fruit" during your first run. You can gradually lower it for your subsequent runs.

  • Take advantage of the Long Run Mode. Custom functions are given exactly 30 seconds to return results. However, in Long Run Mode (LRM), they can run for 6 minutes (for regular Google accounts) or 30 minutes (for Google Workspace accounts) before timing out. To access this feature, click on the menu items located under Flookup > Long Run Mode.

NORMALIZE

=NORMALIZE(lookupValue, [stopArray])

Use the NORMALIZE function to remove diacritical marks from strings before applying FUZZYMATCH or any of the lookup functions. We highly recommend that you begin by using this function before running any analysis on your data.

The lookupValue parameter can either be:

    • A single cell e.g. =NORMALIZE(A1) or,

    • A range of cells e.g. =NORMALIZE(A1:A30000).

You can also use the following function to remove unwanted words. This is useful if you have words that you would like Flookup to ignore during processing. To do this, you need to feed the stopArray parameter with an array of words you want to ignore as shown below:

  • =NORMALIZE(A1:A30000, B1:B10)

  • =NORMALIZE(A1:A30000, "company, https, ltd, limited, org")

The second argument can be a range containing the unwanted words (e.g. B1:B10) or a list of directly typed words.

Note

  • NORMALIZE will create a new list without modifying the original list.

  • The second parameter is optional and, therefore, if you do not include it in your formula, the function will default to removing diacritical marks.

FUZZYMATCH

=FUZZYMATCH(string1, string2)

Use the FUZZYMATCH function to calculate the percentage similarity between strings and return the result in decimal form.

FUZZYMATCH Parameters

    • string1 [Required]. The string to be compared to string2.

    • string2 [Required]. The string to be compared to string1.

Here are a few things to note about this function:

    • Numbers are treated as strings.

    • The two arguments can be interchanged without affecting the result.

    • Both parameters can also be ranges e.g. =FUZZYMATCH(A1:A20000, B1:B20000) OR =FUZZYMATCH(A1, B1:B20000). The first function compares a cell in one column, to a cell in a second column which is in the same row and the second function compares a single cell in one column to all the cells in a second column.

    • FUZZYMATCH supports multiple languages.

Using the Long Run Mode

  1. Lookup range: Select the lookup range and click "Get selected range".

  2. Table to search: Select the table of one column or more you want to analyse and then click "Get selected range".

  3. Target column: Indicate the column in "Table to search", with values that you want to compare to values from "Lookup range".

  4. Click an empty cell where you want your results to be displayed.

  5. Click "Calculate percentage similarities".

Note

  • When the FUZZYMATCH LRM function times out, the results that have been processed up to that point will be displayed. Keep an eye on the tooltip popups to track progress.

  • This mode will compare cells on the same row to one another e.g. A1 to B1, C2 to D2 and so on.

  • The Lookup range cannot be longer than Table to search.

FLOOKUP

=FLOOKUP(lookupValue, tableArray, lookupCol, indexNum, [threshold], [rank])

Use the FLOOKUP function to search any column of a range of cells, and then return a value from a specified column of that range. For example, suppose you have two separate lists of names contained in ranges A2:A3000 and B2:D5000. In order to compare the text in both these columns you can simply type:

=FLOOKUP(A2, B2:D5000, 1, 3)

This formula searches for the value found in cell A2 in the first column of the range B2:D5000, and then returns the first match contained in the third column of that same range.

In order to retrieve a value and there is a chance that it might be spelled differently in B2:D5000, one can type:

=FLOOKUP(A2, B2:D5000, 1, 3, 0.5)

This formula searches for the value found in cell A2 in the first column of the range B2:D5000 but, unlike the first example, this time the algorithm matches the first string with a 0.5 percentage similarity minimum. This parameter is adjustable and it takes any value from 0 to 1.

Finally, unlike other solutions, Flookup is not limited to searching the left most column of a specified range. For example, one can simply search for A2 in the second column of B2:D5000 and return data from the first column by using the formula below:

=FLOOKUP(A2, B2:D5000, 2, 1)

... or search for A2 in the second column of B2:D5000 and return data from the third column like this:

=FLOOKUP(A2, B2:D5000, 2, 3)

Pro Tip

For increased efficiency you can write your formula as shown below:

=FLOOKUP(A2:A1000, B2:D1000, 2, 3)

This variation takes an array (range of many cells) as the lookupValue, thereby improving performance.

FLOOKUP Parameters

    • lookupValue [Required]. The value to search for in any column of the table or range. The lookupValue argument can be a string, concatenated strings, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells (database) that contains the data. The values in the user-specified column of tableArray are systematically compared to lookupValue. These values can be text, numbers, or other values.

    • lookupCol [Required]. The column number in tableArray which must be searched by the algorithm. A lookupCol value of 1 searches the first column in tableArray, a lookupCol value of 2 searches second column in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The column number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first column in tableArray, an indexNum of 2 returns the value in the second column in tableArray, and so on. Values must be greater than 0.

    • threshold [Optional]. A variable that specifies whether you want Flookup to find an exact match or an approximate match. It is the similarity level below which results WILL NOT be returned from tableArray. Values range from 0 (lowest possible match) to 1 (exact match). The default value is 0.6.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked in order of descending percentage similarity.

Note 1

  • In this mode, FLOOKUP returns the lookup result and its respective percentage similarity. Please allow 2 columns for this to work.

Using the Long Run Mode

  1. Lookup values: Select a single column with the values you want to search for in a separate table and click "Get selected range".

  2. Table to search: Select the table of one column or more that you want to analyse and then click "Get selected range".

  3. Make appropriate changes to the other parameters as desired:

      • Column to search: A column under "Table to search" that you want to search,

      • Column to retrieve: A column under "Table to search" that you want to return values from.

      • Minimum similarity: The minimum percentage similarity below which values should not be returned.

      • Maximum similarity: The maximum percentage similarity above which values should not be returned.

  4. Click an empty cell where you want your results to be displayed.

  5. Click "Get fuzzy matches".

Note 2

  • The FLOOKUP LRM has two different modes:

    1. Return the best match for each item: Flookup returns the best match for each value found in the "Lookup Values" range.

    2. Return all possible matches for each item: Flookup returns all possible matches for each value found in the "Lookup Values" range.

  • When the function times out, the results that have been processed up to that point, will be displayed. Keep an eye on the tooltip popups to track progress.

QLOOKUP

=QLOOKUP(lookupValue, tableArray, lookupCol, indexNum, [threshold])

Use the QLOOKUP function to search any column of a range of cells, and then return a value from a specified column of that range.

QLOOKUP Parameters

    • lookupValue [Required]. The value to search for in any column of the table or range. The lookupValue argument can be a string, concatenated strings, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells (database) that contains the data. The values in the user-specified column of tableArray are systematically compared to lookupValue. These values can be text, numbers, or other values.

    • lookupCol [Required]. The column number in tableArray which must be searched by the algorithm. A lookupCol value of 1 searches the first column in tableArray, a lookupCol value of 2 searches second column in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The column number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first column in tableArray, an indexNum of 2 returns the value in the second column in tableArray, and so on. Values must be greater than 0.

    • threshold [Optional]. A variable that specifies whether you want Flookup to find an exact match or an approximate match. It is the similarity level below which results WILL NOT be returned from tableArray. Values range from 0 (lowest possible match) to 1 (exact match). The default value is 0.6.

Note

  • QLOOKUP only return the lookup result, not the corresponding percentage similarity. This means you only need a single column to display results (unlike with FLOOKUP).

  • QLOOKUP only returns the best match. There is no option to return other results for each lookup item.

HFLOOKUP

=HFLOOKUP(lookupValue, tableArray, lookupRow, indexNum, [threshold], [rank])

Use the HFLOOKUP function to search any row of a range of cells, and then return a value from a specified row of that range. It works exactly like a transposed FLOOKUP function.

HFLOOKUP Parameters

    • lookupValue [Required]. The value to search for in any row of the table or range. The lookupValue argument can be a string, concatenated strings, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells that you want to retrieve data from. The values in a user-specified row of tableArray are systematically compared to lookupValue. These values can be text, numbers, or other values.

    • lookupRow [Required]. The row number in tableArray which must be searched by the algorithm. A lookupRow value of 1 searches the first row in tableArray, a lookupRow value of 2 searches second row in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The row number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first row in tableArray, an indexNum of 2 returns the value in the second row in tableArray, and so on. Values must be greater than 0.

    • threshold [Optional]. A variable that specifies whether you want HFLOOKUP to find an exact match or an approximate match. It is the similarity level below which results WILL NOT be returned from tableArray. Values range from 0 (lowest possible match) to 1 (exact match). The default value is 0.6.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked in order of descending percentage similarity.

SOUNDMATCH

=SOUNDMATCH(lookupValue, tableArray, lookupCol, indexNum, [rank])

Use the SOUNDMATCH function to perform lookup tasks and return values based on what their Soundex codes are.

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It encodes similar sounding words so that they can be matched regardless of differences in spelling. You can find out more about it here.

SOUNDMATCH also allows you to return alternative matches using the rank parameter. These matches range from 1 (best match) to n (the nth best match).

SOUNDMATCH Parameters

    • lookupValue [Required]. The value to search in the column of the table or range. The lookupValue argument can be a string, a single cell reference or a range of many cells.

    • tableArray [Required]. The range of cells (database) that contains the data. You can use a reference to a range. The values in the first column of tableArray are the values searched by lookupValue. These values can be text, numbers, or other values.

    • lookupCol [Required]. The column number in tableArray which must be searched by the algorithm. A lookupCol value of 1 searches the first column in tableArray, a lookupCol value of 2 searches second column in tableArray, and so on. Values must be greater than 0.

    • indexNum [Required]. The column number in the tableArray argument from which the matching value must be returned. An indexNum argument of 1 returns the value in the first column in tableArray, an indexNum of 2 returns the value in the second column in tableArray, and so on. Values must be greater than 0.

    • rank [Optional]. A variable that specifies the rank of all the possible matches, that you want returned from tableArray. Values range from 1 (first possible match) to n (the nth possible match). The default value is 1 and results are ranked alphabetically (ascending order).

Using the Long Run Mode

  1. Lookup values: Select a single column with the values you want to search for in a separate table and click "Get selected range".

  2. Table to search: Select the table of one column or more that you want to analyse and then click "Get selected range".

  3. Make appropriate changes to the other parameters as desired:

      • Column to search: A column under "Table to search" that you want to search.

      • Column to retrieve: A column under "Table to search" that you want to return values from.

      • Click an empty cell where you want your results to be displayed.

  4. Click "Get fuzzy matches".

Note

  • When the function times out, the results that have been processed up to that point, will be displayed. Keep an eye on the tooltip popups to track progress.

ULIST

=ULIST(colArray, [indexNum], [threshold])

Use ULIST to remove duplicates and return unique values from a range that you have specified. This function does not modify the original range or values and they can either be exact matches or not.

ULIST Parameters

    • colArray [Required]. The range from which you want to return unique values.

    • indexNum [Optional]. The column index to analyse for unique values. The default value is 1.

    • threshold [Optional]. The minimum percentage similarity between the colArray values that are not unique. Therefore a threshold value of 0.6 means that ULIST will look for partial matches and then eliminate any values with a 60 percent similarity and above. The default value is 1.

Highlighting Duplicate Values

Flookup can be used to highlight duplicates in Google Sheets based on matches from a single column.

To highlight duplicate values using Flookup, go to Add-ons > Flookup > Highlight duplicates and either click By percentage or By sound.

How to Highlight Duplicates By Percentage Similarity

With this function, Flookup highlights duplicate values which have a specific minimum percentage similarity between them. The steps to highlight duplicates by this method are:

    1. Select a range with one or more columns. This determines the number of columns you want to highlight.

    2. Find and click the menu item labelled "By percentage".

    3. In the resulting window, click the dropdown menu and select "Highlight all duplicates" or "Skip first occurrence", depending on how you want your results to appear.

    4. Enter the "Target column" and the "Threshold value". If no user input is made, values which have a 100% similarity match between them will be found and the first column of the selected range will be analysed.

The row containing values with the specified percentage similarity between them, will be highlighted in yellow.

How to Highlight Duplicates By Sound Similarity

With this function, Flookup highlights duplicate values based on sound similarity between the them. The steps to highlight duplicates by this method are:

  1. Select a range with one or more columns. This determines the number of columns you want to highlight.

  2. Find and click the menu item labelled "By sound".

  3. In the resulting window, click the dropdown menu and select "Highlight all duplicates" or "Skip first occurrence", depending on how you want your results to appear.

  4. A prompt is displayed requesting the user to enter the "Target column" . If no user input is made, the first column of the selected range will be analysed.

The row containing values with similar sounds will be highlighted in aqua.

Note

  • The function highlight modes can either highlight every single duplicate occurrence or to skip the first/topmost duplicate occurrence, respectively.

  • The Target column is the column that will be analysed and the Threshold value is the level of similarity below which items will not be considered duplicates.

  • The number of columns you select determines the number of cells that will be highlighted.

How To Highlight Duplicates Across Two Different Columns

You can also compare two different columns for percentage or sound similarity.

These steps are largely similar to the other modes above except that, for this task, you must tick the checkbox labelled "Compare two different columns" in the function window.

Here are the steps in detail:

  1. Select a range of more than one column.

  2. Find and click the menu item labelled "By percentage" or "By sound".

  3. In the resulting window, check the box labelled "Compare two different columns".

  4. Specify your Target column and the Secondary column. These are the two columns to compare.

  5. Adjust the "Threshold value" to match your purpose (if the option is available).

  6. Click "Highlight duplicates".

After this, the cells with duplicate data will be appropriately highlighted i.e. yellow for percentage similarity and aqua for sound similarity.

Removing Duplicate Values

Flookup can be used to remove duplicates in Google Sheets based on matches from a single column.

To remove duplicates using Flookup, go to Add-ons > Flookup > Remove duplicates and either click By percentage or By sound.

How to Remove Duplicates By Percentage Similarity

With this function, Flookup finds and deletes selected rows that contain duplicate values with a specific minimum percentage similarity between them. The steps to remove duplicates by this method are:

    1. Select values in a range.

    2. Find and click the menu item labelled "By percentage".

    3. A prompt is displayed requesting the user to enter the "Target column" and the "Threshold value". If no user input is made, values which have a hundred percent similarity match between them will be found and the first column of the selected range will be analysed.

All selected rows containing duplicate values, with the specified percentage similarity between them, will be deleted.

How to Remove Duplicates By Sound Similarity

With this function, Flookup deletes selected rows that contain duplicate values based on the sound similarity between them. The steps to remove duplicates by this method are:

    1. Select a range.

    2. Find and click the menu item labelled "By sound".

    3. A prompt is displayed requesting the user to enter the "Target column". If no user input is made, the first column of the selected range will be analysed.

All selected rows containing values with similar sounds will be deleted.

Note

  • The Target column is the column that will be analysed.

  • The Threshold value is the level of similarity below which items will not be considered duplicates.

How To Remove Duplicates Across Two Different Columns

You can also compare two different columns for percentage or sound similarity.

These steps are largely similar to the other modes above except that, for this task, you must tick the checkbox labelled "Compare two different columns" in the function window.

Here are the steps in detail:

  1. Select a range of more than one column.

  2. Find and click the menu item labelled "By percentage" or "By sound".

  3. In the resulting window, check the box labelled "Compare two different columns".

  4. Specify your Target column and the Secondary column. These are the two columns to compare.

  5. Adjust the "Threshold value" to match your purpose (if the option is available).

  6. Click "Remove duplicates".

After this, the cells with duplicate data will be appropriately highlighted i.e. yellow for percentage similarity and aqua for sound similarity.