AUTOMATE DEDUPLICATION IN GOOGLE SHEETS

Tags: data cleaning google sheets google sheets macros google sheets remove duplicates flookup

WHY AUTOMATE DEDUPLICATION?

Duplicates silently reduce trust in your data and cost teams time. Manual cleanup is error-prone and doesn't scale — automation makes deduplication repeatable, auditable and schedulable.

SAMPLE DATASET AND GOALS

We'll use a contacts sheet with name, email and phone columns. The goal: find likely duplicate rows, mark them with a confidence score, and optionally merge or flag them for review.

APPS SCRIPT WORKFLOW (COPY-PASTE)

Drop this script into Extensions → Apps Script in your sheet. The example below reads rows from a Contacts sheet, computes match signals (name normalisation, email similarity and optional phonetic keys), and writes back a match confidence score and suggested merge action in adjacent columns.

Key ideas:


/**
* Dedupe runner: read rows, compute signals, call Flookup API for scored matches.
* Adjust column indices to match your sheet. This writes results to columns F..H.
*/
function runDedupe() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName('Contacts');
const lastRow = sh.getLastRow();
if (lastRow < 2) return; // nothing to process
const rows = sh.getRange(2,1,Math.max(0,lastRow-1),4).getValues();
const out = [];
for (let i=0;i 0) {
sh.getRange(2,6,out.length,3).setValues(out);
}
}

INTEGRATING WITH FLOOKUP

Flookup provides higher-quality similarity scores and contextual matching than simple key comparisons. Use the add-on for quick, UI-driven workflows or the Flookup API for scheduled, programmatic deduplication.

When calling the API from Apps Script: keep requests small (batch 10-50 records), cache responses where possible, and store the raw match payload in a review sheet so humans can inspect suggested merges before applying them.


// Example: minimal UrlFetchApp call (already shown in runDedupe above)
// Remember to protect your API key and store it in PropertiesService rather than inlined.

SCHEDULING, NOTIFICATIONS AND APPROVALS

Add a time-based trigger to run the dedupe regularly. Best practice: run a conservative scoring pass daily that writes suggestions to a Review sheet, then run a weekly manual approval that applies merges. You can also email a short summary when the number of high-confidence matches exceeds a threshold.

function scheduleDedupe(){
ScriptApp.newTrigger('runDedupe').timeBased().everyDays(1).atHour(2).create();
}

For notifications, generate a CSV summary and email it via MailApp.sendEmail() or integrate with Slack using an incoming webhook.

TESTING, ROLLBACK AND METRICS

Always run the script on a copy first. Store suggested edits in a separate sheet with a proposed_action column. Keep a change log (timestamp, user, rows affected) and provide a one-click rollback that replays the log to revert changes.

Track metrics: matches found, merges applied, false positive rate (from manual review). These numbers inform threshold tuning.

FINAL THOUGHTS

Automating deduplication saves time and reduces risk — but start slow. Use suggested matches in a review workflow, monitor false positives, and tune thresholds. When ready, automate the merge step for high-confidence matches only.

If you'd like, I can add a downloadable Apps Script file and a sample Google Sheet to the post so readers can copy/paste and test immediately.

YOU MIGHT ALSO LIKE