REMOVING DUPLICATES BY TEXT SIMILARITY
Introduction to Removing Duplicates
Achieve superior data integrity in your Google Sheets by mastering data deduplication with Flookup. This guide will walk you through efficiently removing duplicate or similar entries, whether they are exact matches or identified by percentage or sound similarity. Flookup offers versatile modes for handling duplicates in single columns, across multiple columns or even entire rows, including options for data roll up. To begin removing duplicates from a single column, navigate to Extensions > Flookup Data Wrangler > Transformation functions > Remove duplicates in your Google Sheets menu.
Removing Duplicates by Percentage or Sound Similarity
-
Select the function to run
Click the menu item labelled By percentage or By sound, depending on your needs. -
Select the mode to run
Choose from the first drop down menu:- Keep first unique value
- Keep last unique value
-
Select the text entries to analyse
Select one or more columns. If you select a range for example A2:D500 and duplicates are identified on a row, all columns in that row will be removed. - Index the selected data
Click Map columns in selection to index your columns. - Specify the column of data to analyse
Enter the Left_column index. If left blank, the first column is analysed. -
Enter the level of similarity
(Only for "By percentage") Enter the Threshold value. Higher values mean only close matches are considered duplicates; lower values are more permissive. - Remove Duplicates
Click the Remove duplicates button.
How to Remove Duplicates Across Two Different Columns
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first unique value
- Keep last unique value
-
Select the comparison mode
Select Compare two different columns from the second drop down. - Select the data to compare
Select text entries of two or more columns. This determines the number of columns deleted for each duplicate row. - Index the selected data
Click Map columns in selection. - Specify the column indexes to analyse
Enter your Left_column and Right_column index. - Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. - Remove duplicates
Click Remove duplicates.
How to Remove Duplicate Rows
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first identified duplicate value
- Keep last identified duplicate value
-
Select the comparison mode
Select Compare data in selection by row from the second drop down. - Select the data to compare
Select a data range of two or more columns to be analysed for duplicates. - Index the selected data
Click Map columns in selection. - Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. - Remove duplicates
Click Remove duplicates.
How to Remove Duplicates of Data in a Single Cell
- Click By percentage or By sound.
- Select Remove duplicates by cell value from the second drop down.
- Click a single cell containing the content whose duplicates you wish to remove and click Grab selected cell.
- Select the data range to be analysed and click Map columns in selection.
- Change the Left_column value to specify the column index to remove duplicates from.
- (Only for "By percentage") Adjust the Threshold value as needed.
- Click Remove duplicates.
How to Roll Up Data from Duplicate Rows
- Click By percentage.
- Select Roll up data in selection by row from the second drop down.
- Select the data range of two or more columns to be analysed for duplicates.
- Click Map columns in selection.
- (Only for "By percentage") Adjust the Threshold value as needed.
- Click Remove duplicates.
How to Extract Unique Values
- Select the function to run
Click By percentage or By sound. -
Select the mode to run
Choose from the first drop down menu:- Keep first identified duplicate value
- Keep last identified duplicate value
-
Select the comparison mode
Select Compare data in selection by row from the second drop down. - Select the data to compare
Select a data range of two or more columns to be analysed for duplicates. - Index the selected data
Click Map columns in selection. -
Set the level of similarity
(Only for "By percentage") Adjust the Threshold value as needed. Higher values are stricter, lower values are more permissive. - Click Remove duplicates.
Notes on Removing Duplicates
- The Left_column value is the only column analysed in single column mode.
- For two column mode, remove duplicates within the Left_column first for best results.
- Duplicates are values in Left_column that exist in Right_column and any row with a duplicate will be deleted.
- For row comparison, all columns in a duplicate row are deleted.
- Threshold controls how strict the function is: higher means stricter, lower means more permissive.
- After running, a message will indicate how many rows were processed.
For the Visual Learners
Labels might differ slightly but the steps are the same.