HIGHLIGHTING DUPLICATES BY TEXT SIMILARITY
Introduction to Highlighting Duplicates
To highlight duplicate values from a single column using Flookup, go to Extensions > Flookup Data Wrangler > Transformation functions > Highlight duplicates in your spreadsheet menu.
Highlighting Duplicates by Percentage or Sound Similarity
- Select the function to run
Click the menu item labelled "By percentage" or "By sound". - Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear. - Select the data range to analyse
Select a range with one or more columns. This specifies the number of columns, on the row with duplicates, that you want to highlight. For example, if you select range A2:D500 and duplicates are identified on rows B10, B20 and B50, then A10:D10, A20:D20 and A50:D50 will be highlighted. - Index the selected data
Click Map columns in selection in order to map the columns in your selection. - Specify the column of data to analyse
Specify the Left_column index. If no user input is made, then the first column of the selected range will be analysed. - Specify the level of similarity
If you selected "By percentage" in step #1, specify the Threshold value. If no user input is made, then only exact matches will be highlighted. - Highlight Duplicates
Click Highlight to execute the function.
Notes on Highlighting Data in a Single Column
- The number of columns you select determines the number of cells that will be highlighted in each row.
- The Left_column value is the column index, in your selection, that will be analysed.
- If you are removing duplicates "By percentage", then duplicates will be values in Left_column that have a level of similarity that is equal to or higher than the Threshold value.
How to Highlight Duplicates Across Two Different Columns
- Select the function to run
Click the menu item labelled "By percentage" or "By sound". - Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear. - Select the comparison mode
Click the option labelled Compare two different columns. - Select the data to compare
Select a range with two or more columns. - Index the selected data
Click Map columns in selection in order to index the current columns in your selection. - Specify the column indexes to analyse
Specify your Left_column and Right_column index. These are the two columns to compare to each other. - Set the level of similarity
If you selected "By percentage" in step #1, adjust the Threshold value to match your needs. Otherwise, skip this step. - Highlight duplicates
Click Highlight to execute the function.
Notes on Highlighting Data Across Two Columns
- If you are removing duplicates "By percentage", then duplicates will be values in Left_column that exist in Right_column and have a level of similarity that is equal to or higher than the Threshold value.
How to View Duplicate Clusters
- In the second drop-down menu, select Trace duplicate clusters.
- Specify your Trace_row index. This is the row for which you would like to see related duplicates.
- Scroll to the bottom and click Trace.
Notes on Tracing Highlighted Data
- The duplicate clusters for any particular row index will be highlighted in a distinct peach colour. In order to revert to the original highlight colour, simply use the Undo button.
For the Visual Learners
Labels might differ slightly but the steps are the same.