HIGHLIGHTING DUPLICATES BY TEXT SIMILARITY

Introduction to Highlighting Duplicates

Improve your data quality in Google Sheets by easily identifying and highlighting duplicate or similar entries with Flookup. Whether you are working with a single column or need to compare multiple columns, Flookup allows you to visually mark duplicates based on percentage similarity or sound based matching. This helps you quickly spot redundancies and inconsistencies, leading to more accurate data analysis and cleaner spreadsheets. To begin highlighting duplicate values from a single column, navigate to Extensions > Flookup Data Wrangler > Highlight duplicates in your Google Sheets menu.

Highlighting Duplicates by Percentage or Sound Similarity

Select the function to run
Click the menu item labelled "By percentage" or "By sound".
Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
Select the data range to analyse
Select a range with one or more columns. This specifies the number of columns, on the row with duplicates, that you want to highlight. For example, if you select range A2:D500 and duplicates are identified on rows B10, B20 and B50, then A10:D10, A20:D20 and A50:D50 will be highlighted.
Index the selected data
Click Map columns in selection in order to map the columns in your selection.
Specify the column of data to analyse
Specify the Left_column index. If no user input is made, then the first column of the selected range will be analysed.
Specify the level of similarity
If you selected "By percentage" in step #1, specify the Threshold value. If no user input is made, the default threshold of 0.8 will be used.
Highlight Duplicates
Click Highlight to execute the function.

Notes on Highlighting Data in a Single Column

The number of columns you select determines the number of cells that will be highlighted in each row.
The Left_column value is the column index, in your selection, that will be analysed.
If you are removing duplicates "By percentage", then duplicates will be values in Left_column that have a level of similarity that is equal to or higher than the Threshold value.

How to Highlight Duplicates Across Two Different Columns

Select the function to run
Click the menu item labelled "By percentage" or "By sound".
Select the highlight mode
Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
Select the comparison mode
Click the option labelled Compare two different columns.
Select the data to compare
Select a range with two or more columns.
Index the selected data
Click Map columns in selection in order to index the current columns in your selection.
Specify the column indexes to analyse
Specify your Left_column and Right_column index. These are the two columns to compare to each other.
Set the level of similarity
If you selected "By percentage" in step #1, adjust the Threshold value to match your needs. If no user input is made, the default threshold of 0.8 will be used.
Highlight duplicates
Click Highlight to execute the function.

Notes on Highlighting Data Across Two Columns

If you are removing duplicates "By percentage", then duplicates will be values in Left_column that exist in Right_column and have a level of similarity that is equal to or higher than the Threshold value.

How to View Duplicate Clusters

In the second drop down menu, select Trace duplicate clusters.
Specify your Trace_row index. This is the row for which you would like to see related duplicates.
Scroll to the bottom and click Trace.

Notes on Tracing Highlighted Data

The duplicate clusters for any particular row index will be highlighted in a distinct peach colour. In order to revert to the original highlight colour, simply use the Undo button.

For the Visual Learners

Labels might differ slightly but the steps are the same.