HIGHLIGHTING DUPLICATES BY TEXT SIMILARITY

Introduction to Highlighting Duplicates

To highlight duplicate values from a single column using Flookup, go to Extensions > Flookup Data Wrangler > Transformation functions > Highlight duplicates in your spreadsheet menu.


Highlighting Duplicates by Percentage or Sound Similarity

  1. Select the function to run
    Click the menu item labelled "By percentage" or "By sound".
  2. Select the highlight mode
    Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
  3. Select the data range to analyse
    Select a range with one or more columns. This specifies the number of columns, on the row with duplicates, that you want to highlight. For example, if you select range A2:D500 and duplicates are identified on rows B10, B20 and B50, then A10:D10, A20:D20 and A50:D50 will be highlighted.
  4. Index the selected data
    Click Map columns in selection in order to map the columns in your selection.
  5. Specify the column of data to analyse
    Specify the Left_column index. If no user input is made, then the first column of the selected range will be analysed.
  6. Specify the level of similarity
    If you selected "By percentage" in step #1, specify the Threshold value. If no user input is made, then only exact matches will be highlighted.
  7. Highlight Duplicates
    Click Highlight to execute the function.

Notes on Highlighting Data in a Single Column


How to Highlight Duplicates Across Two Different Columns

  1. Select the function to run
    Click the menu item labelled "By percentage" or "By sound".
  2. Select the highlight mode
    Select Highlight all duplicates or Skip first occurrence, depending on how you want your results to appear.
  3. Select the comparison mode
    Click the option labelled Compare two different columns.
  4. Select the data to compare
    Select a range with two or more columns.
  5. Index the selected data
    Click Map columns in selection in order to index the current columns in your selection.
  6. Specify the column indexes to analyse
    Specify your Left_column and Right_column index. These are the two columns to compare to each other.
  7. Set the level of similarity
    If you selected "By percentage" in step #1, adjust the Threshold value to match your needs. Otherwise, skip this step.
  8. Highlight duplicates
    Click Highlight to execute the function.

Notes on Highlighting Data Across Two Columns


How to View Duplicate Clusters

  1. In the second drop-down menu, select Trace duplicate clusters.
  2. Specify your Trace_row index. This is the row for which you would like to see related duplicates.
  3. Scroll to the bottom and click Trace.

Notes on Tracing Highlighted Data


For the Visual Learners

Labels might differ slightly but the steps are the same.


Explore Further