HIGHLIGHTING DUPLICATES: GOOGLE SHEETS VS FLOOKUP
Brief Introduction
One of the most important features Google Sheets has, is the ability to highlight duplicates.
Using the highlight function can help us profile our data easily; whether it is by preventing double work and therefore resources wastage or by identifying text entries that might appear to be duplicates and yet they are not.
Let us take a brief look at how to use default functions provided in Google Sheets and Flookup.
The images we have used are for illustration purposes only.
Highlighting your data is a very good way to get an overview of what your data looks like before performing destructive operations like removing duplicates or using that data for any other purpose. Unfortunately, at the time of writing this article, Google Sheets does not provide a predefined way of highlighting duplicates. However, with custom formulas and conditional formatting, highlighting duplicates in your spreadsheet can be done in a few clicks.
Highlighting Duplicates Using Conditional Formatting
This method of highlighting duplicates assumes that your data is relatively clean i.e. there are very few or no text-based differences.
In your Google Sheets spreadsheet, select the cells where you want to find duplicates. This can be a range with any number of rows or columns.
In your spreadsheet menu, head to Format > Conditional formatting. This will open a sidebar where you can enter your highlighting rules.
In the side bar, you will see two tabs, “Single colour” and “Colour scale” and the former is selected by default.
First, ensure that the range you want to highlight is selected and correctly showing in the Apply to range field. If this is not the case, simply click the Range Picker icon at the right end of this field to select a new range.
Click the dropdown menu located under the "Format rules", scroll down to an item labelled “Custom formula is” and click it. The sidebar will then update to look like this:
In the field with the “Value or formula” placeholder, enter this formula:
=COUNTIF($B$2:$B$10,B2) > 1
This is a COUNTIF formula with arguments specific to the range you selected. It will run over range B2:B10 and note if any item there occurs more than once. You can choose to adjust these arguments to suit any other range you might want to select in subsequent operations. Please note that, as you adjust the ranges in this formula, Google Sheets will automatically select the relevant fields for you.
At this point you can choose the formatting style you would like to use for your results using this section:
… and, after clicking “Done”, your results will look like this:
Highlighting Duplicates Using Flookup
Google’s efforts above are admirable, but sometimes you need to bring out the big guns when faced with problematic datasets [as real-world data often tends to be].
It is a well-established fact that nobody wants to make Bruce Banner angry and, to be completely honest, we really don’t want any angry emails to come flying back at us from anyone else on our list above… and one such person is old Barney.
As you can see from the screenshot below, Barneys name is listed twice with a subtle spelling difference. Let us see how to go about highlighting cases that look like this.
Head to Extensions > Flookup Data Wrangler > Highlight Duplicates > By Percentage in your spreadsheet menu. This will open a sidebar that looks like this:
Select the client names in the manner shown below:
After that, follow these steps to highlight the selected data:
Select Highlight all duplicates or Skip first occurrence, depending on how you want to highlight every duplicate occurrence or skip the first time it occurs and highlight the rest.
Leave the next option set to Compare data in a single column.
Click “Map columns in selection” in order to get the current number columns in your selection.
Enter the “Left_column” value. This is the range with duplicates that you want to highlight. If no user input is made, the first column of the selected range will be processed. Every cell in the selection, that lies in the same row with the identified duplicate, will be highlighted.
Enter the “Threshold" value. This is the percentage similarity, below which, text entries will NOT be considered duplicated. If no user input is made, then only exact matches will be treated as duplicates. [I will use 0.8 in this example]
Click “Highlight duplicates”.
The result will be duplicates that have been identified despite differences in spelling. Please note that you can also choose to highlight duplicates by sound similarity by clicking Extensions > Flookup Data Wrangler > Highlight Duplicates > By Sound and following steps similar to the ones above.