In this tutorial we are going to look at a few examples of how to use Flookup for Google Sheets. Flookup runs on a battle-tested algorithm that makes it the perfect solution for your data quality needs.
HOW TO FUZZY MATCH IN GOOGLE SHEETS
SUMMARY OF STEPS
=FLOOKUP(lookupValue, tableArray, lookupCol, indexNum, threshold, [rank])
- Type a range or type a value as the lookupValue.
- Select the tableArray.
- Enter the lookupCol which is the column in the tableArray to analyse.
- Enter indexNum, which is the column in tableArray from which results should be returned.
- Press Return or Enter.
One of the most popular functions of Flookup is the eponymous FLOOKUP and it can be used to to find matches or merge data. In this example, we have two tables (shown in the image below): one contains a list of individuals and their respective monthly salaries, and another contains list of individuals only:
We can use FLOOKUP to match names in the list titled "Work List" to the names in the list titled "Full Name", in order to return their respective salaries. By writing the formula like this
=FLOOKUP(D2,$A$2:$B$15,1,2), with the default percentage similarity of 0.6, we get the following results:
As you can see, even at a 0.6 level of similarity, some matches are not caught. In order to fix this, we simply adjust the threshold parameter downwards until all possible matches are found. Highlighted in yellow are the results we get at a 0.5 similarity level:
And these are the results at a 0.3 similarity level, at which point all possible matches have been found. The matches highlighted aqua are found at the lowest threshold of the batch:
NOTE: The initial parameter, the lookupValue, can either be a single string or concatenated text. For example, suppose we have extra data in column F e.g. cities lived in, we can rewrite the function we already used like this:
=FLOOKUP(D2&F2,$A$2:$B$15,1,2). This function combines the values in D2 and F2 into one string. This is especially useful for distinguishing similar looking strings that are not duplicates.
And that concludes our brief tutorial on how to fuzzy match with Flookup. Please note that the principles demonstrated here also hold true for the HFLOOKUP function.
HOW TO REMOVE DUPLICATES IN GOOGLE SHEETS
SUMMARY OF STEPS
=ULIST(colArray, [indexNum], [threshold])
- Type or select a range as the colArray.
- (Optional) Enter indexNum, which is the column in colArray to analyse for unique values.
- (Optional) Adjust the threshold, the percentage above which values are considered duplicates and removed.
Duplicates in data can be a constant headache for every professional. This article by HubSpot highlights the enormous costs and issues that organisations face because of not dealing with duplicates and poor data quality in general. Luckily, with Flookup, removing duplicates from your data is a simple and intuitive process.
Let's take a look at an example showing how to improve data quality by removing duplicates from it. In this example, we will be focusing on a function called ULIST.
Suppose that you have a list with names of individuals, the cities they live in and their respective salaries as shown below:
This list obviously contains duplicates under the column titled "city". In order to extract a list of individuals without the duplicated cities, we can use the ULIST formula as shown below:
And our results will look like this:
However, as we can see, there is a duplicate that has not been caught because it contains a spelling error; that duplicate is "New Yorke" vs "New York". In order to resolve this, we need to adjust the threshold parameter as follows:
And finally, we have a clean list with no duplicates:
If you are a fan of Google's UNIQUE function, then you will love the power ULIST brings to your spreadsheet.
HOW TO HIGHLIGHT DUPLICATES IN GOOGLE SHEETS
SUMMARY OF STEPS
- Select a range.
- Adjust the Column index, the column to analyse for duplicates.
- Adjust the threshold, the percentage above which values are considered duplicates.
- Click "Highlight duplicates".
Highlighting duplicates is one of the best ways to get an overview of the state of your data. Unfortunately, most of the data we work with contains partial matches, punctuation marks or are that variables which throw off traditional methods you might know of or use. However, with Flookup, you have the power to handle strings with such variations.
We can see, in the illustration above, that the column showing cities (column B) has potential duplicates. In order to highlight duplicate rows in the entire data set based on this column, we start by selecting our range of interest and then heading to Add-ons > Highlight duplicates > By percentage as shown below:
In the next window we make necessary adjustments, specifically the column we want to analyse for duplicates and the minimum level of percentage similarity that we desire. This window also displays the number of columns you have selected to help guide you with adjusting the "Column index":
After running the function, duplicates are highlighted in yellow:
We can also highlight duplicates by similarity in sound. To do this, we follow the same steps shown above, but this time we access the function located at Add-ons > Highlight duplicates > By sound:
A window, which also shows the number of columns selected, pops up prompting you to enter the column to analyse:
After running the function, all duplicates are highlighted in aqua:
HOW TO CALCULATE PERCENTAGE SIMILARITY IN GOOGLE SHEETS
SUMMARY OF STEPS
- Type any 2 strings or ranges as the first and second arguments, respectively.
- Press Return or Enter.
Another highly popular feature of Flookup is the FUZZYMATCH function. You can use this function to calculate the percentage similarity between any two strings, especially strings that follow name-like patterns. FUZZYMATCH allows you to compare strings in three modes: string vs string, string vs list and list vs list. The last two modes cut down the time taken to calculate percentage similarities from minutes to mere seconds.
The following example shows how to use FUZZYMATCH in string vs string mode:
The formula shown in the image above works through one row at a time and yields the following results:
This formula is sufficient for most cases, but sometimes you need more power and efficiency. In such cases, especially if you are working with long lists, using FUZZYMATCH in the list vs list mode is the better option:
The above formula works through all the rows in one fell swoop and returns results exactly the same as the first mode. The key difference here is that, whereas the string vs string might take minutes to complete long lists, this list vs list mode completes the same work in seconds.
Finally, in some cases, you might simply want to compare one string to multiple strings. To accomplish this, we use the third mode of FUZZYMATCH, and that is string vs list.
Say you have just one string in column A and a list of strings in column B. In order to compare the two columns, we can use a formula like this:
This formula will return an array of percentage similarities between the first and second arguments.
That concludes our text comparison tutorial. Be sure to check out the Documentation page for details about all Flookup functions.