REMOVING DUPLICATES BY TEXT SIMILARITY

Introduction to Removing Duplicates

To remove duplicates from a single column using Flookup, go to Extensions > Flookup Data Wrangler > Transformation functions > Remove duplicates in your spreadsheet menu.


Removing Duplicates by Percentage or Sound Similarity

  1. Select the function to run
    Click the menu item labelled By percentage or By sound, depending on your needs.
  2. Select the mode to run
    Choose from the first drop-down menu:
    • Keep first unique value
    • Keep last unique value
  3. Select the text entries to analyse
    Select one or more columns. If you select a range (e.g. A2:D500) and duplicates are identified on a row, all columns in that row will be removed.
  4. Index the selected data
    Click Map columns in selection to index your columns.
  5. Specify the column of data to analyse
    Enter the Left_column index. If left blank, the first column is analysed.
  6. Enter the level of similarity
    (Only for "By percentage") Enter the Threshold value. Higher values mean only close matches are considered duplicates; lower values are more permissive.
  7. Remove Duplicates
    Click the Remove duplicates button.

How to Remove Duplicates Across Two Different Columns

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop-down menu:
    • Keep first unique value
    • Keep last unique value
  3. Select the comparison mode
    Select Compare two different columns from the second drop-down.
  4. Select the data to compare
    Select text entries of two or more columns. This determines the number of columns deleted for each duplicate row.
  5. Index the selected data
    Click Map columns in selection.
  6. Specify the column indexes to analyse
    Enter your Left_column and Right_column index.
  7. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed.
  8. Remove duplicates
    Click Remove duplicates.

How to Remove Duplicate Rows

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop-down menu:
    • Keep first identified duplicate value
    • Keep last identified duplicate value
  3. Select the comparison mode
    Select Compare data in selection by row from the second drop-down.
  4. Select the data to compare
    Select a data range of two or more columns to be analysed for duplicates.
  5. Index the selected data
    Click Map columns in selection.
  6. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed.
  7. Remove duplicates
    Click Remove duplicates.

How to Remove Duplicates of Data in a Single Cell

  1. Click By percentage or By sound.
  2. Select Remove duplicates by cell value from the second drop-down.
  3. Click a single cell containing the content whose duplicates you wish to remove and click Grab selected cell.
  4. Select the data range to be analysed and click Map columns in selection.
  5. Change the Left_column value to specify the column index to remove duplicates from.
  6. (Only for "By percentage") Adjust the Threshold value as needed.
  7. Click Remove duplicates.

How to Roll Up Data from Duplicate Rows

  1. Click By percentage.
  2. Select Roll up data in selection by row from the second drop-down.
  3. Select the data range of two or more columns to be analysed for duplicates.
  4. Click Map columns in selection.
  5. (Only for "By percentage") Adjust the Threshold value as needed.
  6. Click Remove duplicates.

How to Extract Unique Values

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop-down menu:
    • Keep first identified duplicate value
    • Keep last identified duplicate value
  3. Select the comparison mode
    Select Compare data in selection by row from the second drop-down.
  4. Select the data to compare
    Select a data range of two or more columns to be analysed for duplicates.
  5. Index the selected data
    Click Map columns in selection.
  6. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed. Higher values are stricter, lower values are more permissive.
  7. Click Remove duplicates.

Notes on Removing Duplicates


For the Visual Learners


Explore Further