MATCH AND MERGE DATA BY TEXT SIMILARITY
Introduction to Match and Merge
These functions find and return desired matches based on their respective similarities to other text entries in comparison.
FLOOKUP has four different lookup function modes and they each compare text entries by their percentage similarity.
SOUNDMATCH on the other hand, compares text entries based on how similar they sound when pronounced in English.
FLOOKUP
Head to Extensions > Flookup Data Wrangler > Lookup and merge > FLOOKUP in your spreadsheet menu.
Select the mode you want this function to run from the drop-down menu [See Explanation Below].
Primary range: Select a single column of data and click "Grab selected range".
Secondary range: Select a separate range of one or more columns and click "Grab selected range".
Lookup_Column: Enter the index of the column of values in "Secondary range" that you want to compare with values in "Primary range".
Return_column: Enter the index of the column of values in "Secondary range" from which you want values to be returned.
Threshold: Enter the minimum percentage similarity between the "Primary range" and "Secondary range" values, below which matches should not be returned. The default value is 0.6, meaning that matches with at least that level of similarity will be returned.
Click an empty cell where you would like your results to be displayed. This cell should be in any sheet inside that same spreadsheet.
Click the "Get fuzzy matches" button to finish.
FLOOKUP Modes Explained
Return best match only: This returns the best match for every lookup value.
Return best match and similarity: This returns the best match for every lookup value, paired with its respective similarity.
Return best match and row index: This returns the best match for every lookup value, paired with the row index in which it is located.
Return all matches for each item: This returns all possible matches for each lookup value in a comma separated format.
Merge data from rows with matches: This will return all the selected data from each row in which a match is found, as opposed to returning data from rows in a single column e.g. Assuming you selected range B2:E1000 and, while searching for matches in column C, a match for cell A2 is found in cell C50. In such a scenario, data from row B50:E50 will be returned.
Match long text entries: This compares text entries on a non-lexical level. It is suited for longer text entries e.g. sentences and paragraphs, and it returns the best match for every lookup value, paired with its respective similarity to the lookup value.
Key Points on FLOOKUP
Lookup_column takes only one column. If you select more than one column, only the left most column of that range will be analysed.
Resume_on_row is optional, and it is used to specify the row index of "Primary range" that you want to resume processing from.
This is especially useful if you are working on a large dataset and the function times out prematurely. If you do not provide any input, the function will start processing from the first row of "Primary range".
In order for FLOOKUP to return a match and its respective percentage similarity or its respective position in the Secondary range, you must leave two empty columns for the results to be displayed. If you forget to do this, your data will be overwritten.
For good results when comparing long text entries, be sure to correct any spelling errors in them as a preprocessing step. You can find the function under Tools > Spelling > Spell check. You also need to leave two empty columns in this mode.
If FLOOKUP times out, the results that have been processed up to that point will be displayed.
SOUNDMATCH
Head to Extensions > Flookup Data Wrangler > Lookup and merge > SOUNDMATCH in your spreadsheet menu.
Primary range: Select a single column of data and click "Grab selected range".
Secondary range: Select a separate range of one or more columns and then click "Grab selected range".
Lookup_column: Enter the index of the column of values in "Secondary range" that you want to compare with values in "Primary range".
Return_column: Enter the index of the column of key values in "Secondary range" from which you want values to be returned.
Click an empty cell where you would like your results to be displayed, inside that same spreadsheet.
Click the "Get fuzzy matches" button to finish.
Key Points on SOUNDMATCH
Lookup_column takes only one column. If you select more than one column, only the left most column of that range will be analysed.
When SOUNDMATCH times out, the results that have been processed up to that point, will be displayed.