REMOVING DUPLICATES BY TEXT SIMILARITY

Introduction to Removing Duplicates

Flookup helps maintain data integrity in Google Sheets by removing duplicate entries. This guide covers how to deduplicate data based on exact matches, percentage similarity, or sound.
You can apply these checks to single columns, across multiple columns, or to entire rows, with an option to roll up data from duplicates. To begin, navigate to Extensions > Flookup Data Wrangler > Remove duplicates in your Google Sheets menu.


Removing Duplicates by Percentage or Sound Similarity

  1. Select the function to run
    Click the menu item labelled By percentage or By sound, depending on your needs.
  2. Select the mode to run
    Choose from the first drop down menu:
    • Keep first unique value
    • Keep last unique value
  3. Select the text entries to analyse
    Select one or more columns. If you select a range for example A2:D500 and duplicates are identified on a row, all columns in that row will be removed.
  4. Specify the column of data to analyse
    Enter the Left_column index. If left blank, the first column is analysed.
  5. Enter the level of similarity
    (Only for "By percentage") Enter the Threshold value. If no user input is made, the default threshold of 0.8 will be used. Higher values mean only close matches are considered duplicates; lower values are more permissive.
  6. Remove Duplicates
    Click the Remove duplicates button.

How to Remove Duplicates Across Two Different Columns

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop down menu:
    • Keep first unique value
    • Keep last unique value
  3. Select the comparison mode
    SelectCompare two different columns from the second drop down.
  4. Select the data to compare
    Select text entries of two or more columns. This determines the number of columns deleted for each duplicate row.
  5. Specify the column indexes to analyse
    Enter your Left_column and Right_column index.
  6. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed.
  7. Remove duplicates
    Click Remove duplicates.

How to Remove Duplicate Rows

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop down menu:
    • Keep first identified duplicate value
    • Keep last identified duplicate value
  3. Select the comparison mode
    Select Compare data in selection by row from the second drop down.
  4. Select the data to compare
    Select a data range of two or more columns to be analysed for duplicates.
  5. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed.
  6. Remove duplicates
    Click Remove duplicates.

How to Remove Duplicates of Data in a Single Cell

  1. Click By percentage or By sound.
  2. SelectRemove duplicates by cell valuefrom the second drop down.
  3. Click a single cell containing the content whose duplicates you wish to remove and click Grab selected cell.
  4. Change the Left_column value to specify the column index to remove duplicates from.
  5. (Only for "By percentage") Adjust theThreshold value as needed.
  6. Click Remove duplicates.

How to Roll Up Data from Duplicate Rows

  1. Click By percentage.
  2. Select Roll up data in selection by rowfrom the second drop down.
  3. Select the data range of two or more columns to be analysed for duplicates.
  4. (Only for "By percentage") Adjust the Threshold value as needed.
  5. Click Remove duplicates.

How to Extract Unique Values

  1. Select the function to run
    Click By percentage or By sound.
  2. Select the mode to run
    Choose from the first drop down menu:
    • Keep first identified duplicate value
    • Keep last identified duplicate value
  3. Select the comparison mode
    Select Compare data in selection by row from the second drop down.
  4. Select the data to compare
    Select a data range of two or more columns to be analysed for duplicates.
  5. Set the level of similarity
    (Only for "By percentage") Adjust the Threshold value as needed. Higher values are stricter, lower values are more permissive.
  6. Click Remove duplicates.

Notes on Removing Duplicates


For the Visual Learners

Labels might differ slightly but the steps are the same.


You Might Also Like