REMOVING DUPLICATES BY TEXT SIMILARITY
Introduction to Removing Duplicates
Flookup helps maintain data integrity in Google Sheets by removing duplicate entries. This guide covers how to deduplicate data based on exact matches, percentage similarity, or sound.
You can apply these checks to single columns, across multiple columns, or to entire rows, with an option to roll up data from duplicates. To begin, navigate to Extensions > Flookup Data Wrangler > Remove duplicates in your Google Sheets menu.
Removing Duplicates by Percentage or Sound Similarity
- Select the function to run
Click the menu item labelled By percentage or By sound, depending on your needs. -
Select the mode to run
Choose from the first drop down menu:- Keep first unique value
- Keep last unique value
- Select the text entries to analyse
Select one or more columns. If you select a range for example A2:D500 and duplicates are identified on a row, all columns in that row will be removed. - Specify the column of data to analyse
Enter the Left_column index. If left blank, the first column is analysed. - Enter the level of similarity
(Only for "By percentage") Enter the Threshold value. If no user input is made, the default threshold of 0.8 will be used. Higher values mean only close matches are considered duplicates; lower values are more permissive. - Remove Duplicates
Click the Remove duplicates button.
How to Remove Duplicates Across Two Different Columns
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first unique value
- Keep last unique value
- Select the comparison mode
SelectCompare two different columns from the second drop down. - Select the data to compare
Select text entries of two or more columns. This determines the number of columns deleted for each duplicate row. - Specify the column indexes to analyse
Enter your Left_column and Right_column index. - Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. - Remove duplicates
Click Remove duplicates.
How to Remove Duplicate Rows
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first identified duplicate value
- Keep last identified duplicate value
- Select the comparison mode
Select Compare data in selection by row from the second drop down. - Select the data to compare
Select a data range of two or more columns to be analysed for duplicates. - Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. - Remove duplicates
Click Remove duplicates.
How to Remove Duplicates of Data in a Single Cell
- Click By percentage or By sound.
- SelectRemove duplicates by cell valuefrom the second drop down.
- Click a single cell containing the content whose duplicates you wish to remove and click Grab selected cell.
- Change the Left_column value to specify the column index to remove duplicates from.
- (Only for "By percentage") Adjust theThreshold value as needed.
- Click Remove duplicates.
How to Roll Up Data from Duplicate Rows
- Click By percentage.
- Select Roll up data in selection by rowfrom the second drop down.
- Select the data range of two or more columns to be analysed for duplicates.
- (Only for "By percentage") Adjust the Threshold value as needed.
- Click Remove duplicates.
How to Extract Unique Values
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first identified duplicate value
- Keep last identified duplicate value
- Select the comparison mode
Select Compare data in selection by row from the second drop down. - Select the data to compare
Select a data range of two or more columns to be analysed for duplicates. - Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. Higher values are stricter, lower values are more permissive. - Click Remove duplicates.
Notes on Removing Duplicates
- The Left_column value is the only column analysed in single column mode.
- For two column mode, remove duplicates within the Left_column first for best results.
- Duplicates are values in Left_column that exist in Right_column and any row with a duplicate will be deleted.
- For row comparison, all columns in a duplicate row are deleted.
- Threshold controls how strict the function is: higher means stricter, lower means more permissive.
- After running, a message will indicate how many rows were processed.
For the Visual Learners
Labels might differ slightly but the steps are the same.