HOW TO PREPARE YOUR DATASET_
Optimise the Process
Fuzzy matching algorithms are generally processor intensive. Therefore, to speed up the process in Flookup and improve your overall success rate, try the following:
Use NORMALIZE to remove unwanted words or punctuation marks that negatively affect the accuracy of the results, and to reduce the workload.
Use the FUZZYMATCH function to get a feel of the scale Flookup uses to grade percentage similarities between text entries. All our lookup functions, except SOUNDMATCH, depend on this function.
Set the threshold value (i.e. the level of similarity) to 1 in order to eliminate the "low-hanging fruit" during your first run. You can gradually lower it for your subsequent runs.
Enter ranges (e.g. A1:A1000) as the lookup value in order to significantly improve speed and efficiency. Doing this also prevents the spreadsheet from hanging.
Take advantage of the Long Run Mode. Custom functions are given exactly 30 seconds to return results. However, in Long Run Mode (LRM), they can run for 6 minutes before timing out. To access this feature, click on the menu items located under Flookup > Long Run Mode.
Use the NORMALIZE function to remove diacritical marks from text entries before applying FUZZYMATCH or any of the lookup functions. We highly recommend that you begin by using this function before running any further analysis on your data.
The lookupValue parameter can either be:
A single cell e.g. =NORMALIZE(A1) or,
A range of cells e.g. =NORMALIZE(A1:A3000)
You can also use the following function to remove unwanted words or punctuation marks. This is useful if you have words or punctuation marks that you would like Flookup to ignore during processing. To do this, you need to feed the stopArray parameter with an array of words or punctuation marks you want to remove as shown below:
=NORMALIZE(A1:A3000, "company, https, ltd, limited, org")
=NORMALIZE(A1:A3000, "-, &, .")
The second argument can be a range containing the unwanted words or punctuation marks (e.g. B1:B10) or a list of directly typed words or punctuation marks.
NORMALIZE will create a new list without modifying the original list. It is therefore advised that you use this function in an empty column or sheet.
The stopArray parameter is optional and, therefore, if you do not include it in your formula, the function will default to removing diacritical marks.
Using the Long Run Mode
Use the FUZZYMATCH function to calculate the percentage similarity between text entries and return the result in decimal form.