HOW TO REMOVE DUPLICATES

Introduction to Removing Duplicates

Flookup can be used to remove fuzzy duplicates in Google Sheets based on matches from a single column. To remove duplicates using Flookup, go to Extensions > Flookup > Remove duplicates and either click By percentage or By sound.

How to Remove Duplicates By Percentage Similarity

    1. Click the menu item labelled "By percentage".

    2. Select text entries of one column or more.

    3. Click "Count columns in selection" in order to get the current number columns in your selection.

    4. Enter the "Index One" value. If no user input is made, the the first column of the selected range will be analysed.

    5. Enter the "Threshold" value. If no user input is made, then only exact matches will be deleted.

    6. Click "Remove duplicates".

How to Remove Duplicates By Sound Similarity

    1. Click the menu item labelled "By sound".

    2. Select text entries of one column or more.

    3. Click "Count columns in selection" in order to get the current number columns in your selection.

    4. Specify the "Index One" value. If no user input is made, the first column of the selected range will be analysed.

    5. Click "Remove duplicates".

Key Points

  • The "Index One" value is the only column that will be analysed in this mode.

  • If you are running "By percentage", then duplicates will be values within "Index One" that have a level of similarity that is higher than or equal to the "Threshold" value.

  • Only rows within the selected columns will be deleted.

  • If this function times out, a message will be displayed indicating what row you should start processing from in your next run.

How To Remove Duplicates Across Two Different Columns

    1. Click the menu item labelled "By percentage" or "By sound".

    2. Select a range of more two columns or more.

    3. Click "Count columns in selection" in order to get the current number columns in your selection.

    4. In the resulting window, select the option labelled Compare two different columns.

    5. Specify your "Index One" and "Index Two". These are the two columns that will be compared to each other.

    6. If you are running "By percentage", adjust the "Threshold" value to match your needs.

    7. Click "Remove duplicates".

Key Points

  • Only rows within the selected columns will be deleted.

  • Duplicates are values in "Index One" that exist in "Index Two".

  • Duplicates have a level of similarity that is higher than or equal to "Threshold".

  • If this function times out, a message will be displayed indicating what row you should start processing from in your next run.

ULIST

=ULIST(colArray, [indexNum], [threshold])

Use ULIST to remove duplicates and return unique values from a range that you have specified. This function does not modify the original range or values.

ULIST Parameters

    • colArray [Required]. The range from which you want to return unique values.

    • indexNum [Optional]. The column index to analyse for unique values. The default value is 1.

    • threshold [Optional]. The minimum percentage similarity between the colArray values that are not unique. Therefore a threshold value of 0.6 means that ULIST will eliminate any values with a 60 percent similarity and above. The default value is 1.


Using Long Run Mode

  1. Primary range: Select range of one or more columns and click "Get selected range".

  2. Index One: Enter the index of the column of values in "Primary range" that you want to analyse. If no user input is made, then the leftmost column of "Primary range" will be analysed.

  3. Threshold: Enter the minimum percentage similarity. If no user input is made, then values that are exact matches will be marked as duplicates and removed.

  4. Click an empty cell in a column where you want your results to be displayed.

  5. Click "Get unique values".

Key Point

  • If ULIST LRM times out, the results that have been processed up to that point, will be displayed.