HIGHLIGHTING DUPLICATES BY TEXT SIMILARITY
Introduction to Highlighting Duplicates
To highlight duplicate values from a single column using Flookup, simply go to Extensions > Flookup Data Wrangler > Highlight duplicates in your spreadsheet menu.
Highlighting Duplicates By Percentage or Sound Similarity
Click the menu item labelled "By percentage" or "By sound".
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
Select a range with one or more columns. This specifies the number of columns, on the row with duplicates, that you want to highlight. This means that if you select range A2:D500 and duplicates are identified on rows B10, B20 and B50, then A10:D10, A20:D20 and A50:D50 will be highlighted.
Click "Map columns in selection" in order to map the columns in your selection.
Specify the Left_column index. If no user input is made, then the first column of the selected range will be analysed.
This step is only necessary when highlighting duplicates by percentage similarity. Specify the "Threshold" value. If no user input is made, then only exact matches will be highlighted.
Click "Highlight" to execute the function.
Key Points on Highlighting Duplicates
Threshold values must increase or reduce by magnitudes of 0.05.
The number of columns you select determines the number of cells that will be highlighted in each row.
The Left_column value is the column index, in your selection, that will be analysed.
If you are removing duplicates "By percentage", then duplicates will be values in Left_column that have a level of similarity that that is equal to or higher than the Threshold value.
If this function finishes running or times out, a message will be displayed indicating how many rows have been processed up to that point.
How To Highlight Duplicates Across Two Different Columns
To highlight duplicates across two different columns, simply follow these steps:
Click the menu item labelled "By percentage" or "By sound".
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
Click the option labelled Compare two different columns.
Select a range with two or more columns. This also defines number of columns that will be highlighted for each row that contains duplicates.
Click "Map columns in selection" in order to index the current columns in your selection. Do this also if "Right_column" is inactive.
Specify your Left_column and Right_column index. These are the two columns to compare to each other.
If you are running "By percentage", adjust the Threshold value to match your needs. Otherwise, skip this step.
Click "Highlight".
Key Points on Highlighting Duplicates Across Different Columns
Threshold values must increase or reduce by magnitudes of 0.05.
If you are removing duplicates "By percentage", then duplicates will be values in Left_column that exist in Right_column and have a level of similarity that is equal to or higher than the Threshold value.
If this function finishes running or times out, a message will be displayed indicating how many rows have been processed up to that point.
How to View Duplicate Clusters
Once you have highlighted duplicates and want to view which duplicates are related to each other, follow these steps:
In the second drop-down menu, select Trace duplicate clusters.
Specify your Trace_row index. This is the row for which you would like to see related duplicates.
Scroll to the bottom and click "Trace".
---
The duplicate clusters for any particular row index will be highlighted in a distinct peach colour. In order to revert to the original highlight colour, simply use the "Undo" button.