CUSTOM FUNCTIONS FOR GOOGLE SHEETS
Quick Overview
Function | What it does | What it is best for |
---|---|---|
NORMALIZE | Cleans and standardizes text entries | Preparing data for analysis |
FUZZYSIM | Compares two text values for percentage similarity | Spotting near-duplicates between words or sentences |
FLOOKUP | Finds the closest match in a table by percentage similarity and returns the result plus its score or row position | Reconciling data by token variations |
SHEETMERGE | Finds the best match and returns the entire row in which the match is found | Enriching data by finding full matching records |
QLOOKUP | Finds the closest match in a table by percentage similarity and returns the result alone | Reconciling data by token variations |
HFLOOKUP | Finds the closest match in a row by percentage similarity and returns the result alone | Reconciling data organized horizontally |
SOUNDMATCH | Matches words by sound similarity | Reconciling data by phonetic variations |
DEDUPE | Removes duplicate rows by percentage or sound similarity | Remove duplicates using a non-destructive method |
ULIST | Extracts unique values by percentage or sound similarity | Enhanced deduplication |
TIP: When using these functions, use ranges as inputs e.g. A1:A100
instead of single cells e.g. A1
for efficient data processing.
How to Skip Optional Parameters
When using Flookup functions, you can skip optional parameters by leaving them blank, but you must still include the correct number of commas.
For example, if a function has four parameters and you want to skip the third, use two commas with nothing between them:
=FUNCTION(required1, required2, , required4)
If you want to skip all optional parameters at the end, you can simply omit them:
=FUNCTION(required1, required2)
NORMALIZE
=NORMALIZE(lookup_value, stop_array, threshold, "operation")
The NORMALIZE function for Google Sheets removes unwanted words, diacritics and punctuation from spreadsheet text entries. This process helps clean your data and improves the accuracy of fuzzy matching and data analysis.
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | The cell or range containing the values to normalize. For example: A1 or A1:A100 . |
stop_array | Optional | A cell or range of words/punctuation to remove. If a single cell is used, items should be separated by spaces. |
threshold | Optional | Minimum similarity, from 0 to 1 , for removing items when using the "text" operation. Default is 0.8 . |
operation | Optional |
Type of content to remove. Options are: "text" (removes stop words), "punctuations" (removes punctuation), "diacritics" (default, removes diacritical marks), "domain" (extracts the domain from a URL), and
"path" (extracts the path from a URL).
|
Example of NORMALIZE
Suppose cell A2
contains the text "St. John's Café"
and you want to clean it for analysis. You could use functions like these ones.
- The formula returns
"St. John's Cafe"
. By default, it removes only diacritics: =NORMALIZE(A2) - The formula returns
"St Johns Café"
. This removes punctuation like the period and apostrophe: =NORMALIZE(A2, , , "punctuations")
FUZZYSIM
=FUZZYSIM(left_string, right_string, "compare_mode")
The FUZZYSIM function calculates the percentage similarity between two text entries using one of two advanced algorithms. This is useful for fuzzy matching and data comparison in Google Sheets.
Parameters
Parameter | Type | Description |
---|---|---|
left_string | Required | The main text value to compare against right_string . |
right_string | Required | The value to compare against left_string . |
compare_mode | Optional |
The comparison method to use. "by_word" (default) compares strings based on character pairs. "by_phrase" compares strings word-by-word, which is better for matching phrases with different word orders or minor variations.
|
Example of FUZZYSIM
Imagine you have "Apple Inc."
in cell A2
and "Apple Incorporated"
in cell B2
.
- The formula returns a similarity score, for example,
0.75
. This shows the two strings are closely related despite the different words: =FUZZYSIM(A2, B2) - To compare phrases with different word orders, use the
"by_phrase"
mode. For example, if cellA3
contains"big red car"
and cellB3
contains"this red car"
, the following formula will return0.63
because they are very similar despite the word order: =FUZZYSIM(A3, B3, "by_phrase")
FLOOKUP
=FLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold, output_mode)
The FLOOKUP function searches a specified column for the best match and returns the corresponding value from your chosen return column. By default, it pairs the result with its similarity score, but you can also choose to see the row number of the match instead. It is ideal for detailed data reconciliation where you need to see the confidence or position of each match.
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | Values to search for in table_array . You can use a single cell or a range, such as A1 or A1:A500 . |
table_array | Required | The range that contains the data to compare and return, for example B1:E500 . |
lookup_col | Required | The index of the column in table_array to search for matches. Use 1 for the first column, 2 for the second and so on. Value must be greater than 0 . |
index_num | Required | The index of the column in table_array from which to return values. Use 1 for the first column and so on. Value must be greater than 0 . |
threshold | Optional | Minimum similarity, from 0 to 1 , for a match. Default is 0.8 . |
output_mode | Optional | Determines what the result is paired with. Use "score" (default) to return the similarity score or "position" to return the 1-based row number of the match in table_array . |
Example of FLOOKUP
Let us say you have a list of company names in column A and a master list of companies and their industries in columns D and E. You want to find the industry for "Innovate Corp" (in cell A2
) from your master list (D2:E100
). The company names are in column D (the 1st column of the table array) and industries in column E (the 2nd).
The following formula looks for "Innovate Corp" in column D.
If it finds a close match like "Innovate Corporation", it will return the industry from column E along with the similarity score, for example: {"Tech", 0.92}
.
=FLOOKUP(A2, D2:E100, 1, 2, 0.8, "score")
SHEETMERGE
=SHEETMERGE(lookup_value, table_array, lookup_col, threshold, num_cols)
The SHEETMERGE function is designed for data enrichment. It finds the best fuzzy match for a lookup value in a table and returns the entire matching row, with the similarity score appended as the last column. This is perfect for when you need to pull all related information for a matched item from another dataset.
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | Values to search for in table_array . You can use a single cell or a range e.g. A1 or A1:A200 . |
table_array | Required | The range that contains the data to compare with lookup_value data and return, for example B1:C200 . |
lookup_col | Required | The index of the column in table_array to search for matches. Use 1 for the first column, 2 for the second and so on. |
threshold | Optional | Minimum similarity for a match. Defaults to 0.8 . |
num_cols | Optional | The number of columns to return from the start of the matching row. If omitted, all columns from the row are returned. |
Example of SHEETMERGE
Suppose you have a simple list of product names in column A and a detailed product database in columns C2:E100
(Name, Category, Price). You want to pull all details for the product in A2
, which is "Widget Pro".
The formula below will search for "Widget Pro" in the first column of C2:E100
. If it finds "Widget Professional" as the best match, it will return the entire row for that product, plus the similarity score in a new last column, like: {"Widget Professional", "Electronics", "$99.99", 0.95}
.
=SHEETMERGE(A2, C2:E100, 1)
QLOOKUP
The QLOOKUP function is a streamlined version of FLOOKUP. It searches a column for the best match and returns a single corresponding value without the similarity score. It is perfect for when you need a quick, clean lookup result and don't need to see the score.
=QLOOKUP(lookup_value, table_array, lookup_col, index_num, threshold)
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | Values to search for in table_array . You can use a single cell or a range e.g. A1 or A1:A200 . |
table_array | Required | The range that contains the data to compare with lookup_value data and return, for example B1:C200 . |
lookup_col | Required | The index of the column in table_array from which to return values. Use 1 for the first column and so on. It must be greater than 0 . |
index_num | Required | The column number in table_array to return a data from. |
threshold | Optional | Minimum similarity for a match. Defaults to 0.8 . |
Example of QLOOKUP
Using the same scenario as FLOOKUP
, you want to find the industry for "Innovate Corp" (in A2
) from your master list in D2:E100
, but you only need the industry name, not the score.
This formula will find the best match for "Innovate Corp" in column D and return only the corresponding industry from column E, for instance, "Tech"
.
=QLOOKUP(A2, D2:E100, 1, 2, 0.8)
HFLOOKUP
The HFLOOKUP function works like a transposed version of QLOOKUP. It searches a row for the best match and returns a single corresponding value from another row without the similarity score. It is ideal for horizontal lookups where your data is organized in rows instead of columns.
=HFLOOKUP(lookup_value, table_array, lookup_row, index_num, threshold)
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | The value or horizontal range to search for. For multiple values, use a single row, e.g. A1:E1 . |
table_array | Required | The range that contains the data to compare with lookup_value data and return, for example A2:E10 . |
lookup_row | Required | The row number in table_array to search for matches. Use 1 for the first row, 2 for the second and so on. It must be greater than 0 . |
index_num | Required | The row number in table_array to return data from. |
threshold | Optional | Minimum similarity for a match. Defaults to 0.8 . |
Example of HFLOOKUP
Imagine your data is laid out horizontally. Row 1 (A1:E1
) contains product codes you want to look up. Your product data is in A2:E10
, with product codes in row 2 and prices in row 4. To find the price for the code in A1
:
This formula searches for the value of A1
in the first row of the table (which is row 2 of the sheet), and returns the corresponding value from the third row of the table (row 4 of the sheet).
=HFLOOKUP(A1, A2:E10, 1, 3)
SOUNDMATCH
=SOUNDMATCH(lookup_value, table_array, lookup_col, index_num)
The SOUNDMATCH function looks up values in a specified column and returns matches based on a refined version of the Soundex algorithm. This is useful for phonetic matching and cleaning inconsistent data in Google Sheets.
Parameters
Parameter | Type | Description |
---|---|---|
lookup_value | Required | Values to search for. You can use text, a cell reference or a range. |
table_array | Required | The range that contains the data to compare and return. |
lookup_col | Required | The index of the column in table_array to search for matches. Use 1 for the first column and so on. Value must be greater than 0 . |
index_num | Required | The index of the column in table_array from which to return values. Use 1 for the first column and so on. Value must be greater than 0 . |
Example of SOUNDMATCH
Suppose you are reconciling a list of names and you have "Jon" in cell A2
. Your master list of contacts is in C2:D100
, with names in column C and email addresses in column D. You want to find the email for "Jon".
This formula will search for names that sound like "Jon" in column C. If it finds "John" or "Jonathon", it will return the corresponding email address from column D, such as "john.doe@email.com"
.
=SOUNDMATCH(A2, C2:D100, 1, 2)
DEDUPE
=DEDUPE(data_range, key_column, keep_mode, match_type, threshold)
The DEDUPE function provides a powerful and flexible way to remove duplicate rows from a range. Unlike menu-based tools, it is non-destructive, returning a new, clean array of your data. You can define duplicates based on percentage similarity or sound similarity and choose whether to keep the first or last instance of a duplicate group.
Parameters
Parameter | Type | Description |
---|---|---|
data_range | Required | The range of data to process, for example A1:C100 . |
key_column | Optional | The column number within data_range to check for duplicates. Defaults to 1 . |
keep_mode | Optional | "first" to keep the first occurrence of a duplicate, "last" to keep the last. Defaults to "first" . |
match_type | Optional | "percentage" for fuzzy matching or "sound" for phonetic matching. Defaults to "percentage" . |
threshold | Optional | The similarity threshold for "percentage" mode. Defaults to 0.8 . |
Example of DEDUPE
You have a dataset in A2:C100
where column B contains company names that might be duplicates (e.g. "Global Tech" and "Global Tech, Inc."). You want to remove these fuzzy duplicates, keeping only the first entry found for each.
This formula processes the range, identifies duplicates in the second column based on an 85% similarity threshold, and returns a new table containing only the unique rows, keeping the first occurrence of each duplicate group.
=DEDUPE(A2:C100, 2, "first", "percentage", 0.85)
ULIST
=ULIST(col_array, index_num, threshold, "operation")
The ULIST function returns unique values from any list of text entries, based on percentage or sound similarity. If you like the UNIQUE function in Google Sheets, you will find this function even more flexible for deduplication and data cleaning.
Parameters
Parameter | Type | Description |
---|---|---|
col_array | Required | The range of one or more columns from which to extract unique values. For example: A1:C200 . |
index_num | Optional | The index of the column in col_array to analyze for duplicates.Default is 1 , meaning that the first column is processed. |
threshold | Optional | Minimum similarity, from 0 to 1 , for uniqueness. Default is 0.8 . |
operation | Optional | Choose by_percentage to extract by percentage similarity (default) or by_sound to extract by sound similarity. |
Example of ULIST
Imagine you have a list of cities in column A (A2:A100
), with slight variations like "New York", "New York City", and "NYC". You want to create a clean, unique list of these cities.
The formula will analyze column A and, based on an 80% similarity threshold, recognize that "New York", "New York City", and "NYC" are variations of the same entity. It will return a clean list with just one representative entry, such as "New York"
.
=ULIST(A2:A100, 1, 0.8, "by_percentage")
Possible Error Messages
- !Error: Invalid index value. The index number is less than 1 or greater than the number of columns in the search table.
- !Notice: No match found. No matches are available for the parameters you set. Try lowering the threshold or changing the operation.
- !Error: Invalid threshold value.The value is less than 0 or greater than 1.
- !Error: Invalid operation. You entered an operation that is not supported by this function.
- !Error: Try retyping the formula. There was a synchronising error. Retype the formula to force a refresh.
Explore Enhanced Menu Functions
Menu functions are available under
Extensions > Flookup Data Wrangler
in your spreadsheet menu.
These functions have a 6-minute timeout, compared to 30 seconds for custom functions, so you can clean more data at once. They also support advanced operations: