COMBINE FUZZY MATCHING AND EMBEDDINGS IN GOOGLE SHEETS AND EXCEL

Tags: fuzzy-matching embeddings google-sheets
On this page

WHY HYBRID MATCHING

Classical fuzzy matching remains efficient for typographical errors and close variants; embeddings capture semantic similarity across phrasing, abbreviations and synonyms. Combining both reduces calls to embedding services while preserving precision. The hybrid approach is particularly suitable for spreadsheet audiences who require low‑friction, low‑cost integration.

HIGH-LEVEL PATTERN

  1. Normalise text in the sheet (for example with NORMALIZE).
  2. Use Flookup functions such as FLOOKUP or ULIST to produce a short candidate list per row.
  3. Compute embeddings for candidates only and score with a semantic service.
  4. Combine semantic score with a lexical check (for example FUZZYSIM) and apply thresholds to auto‑accept, reject or flag for review.

GOOGLE SHEETS (APPS SCRIPT)

The snippet below is a minimal, production‑aware Apps Script that batches candidate requests, calls a hypothetical embedding match endpoint and writes scores back to the sheet. Adapt the endpoint and authentication to the chosen provider.

function batchSemanticMatch() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName('Matches');
  var data = sheet.getRange(2,1,sheet.getLastRow()-1,3).getValues(); // [id, query, candidates_json]

  var batchSize = 50;
  for (var i=0; i<data.length; i+=batchSize) {
    var batch = data.slice(i, i+batchSize);
    var payload = batch.map(function(r){
      return {id: r[0], query: r[1], candidates: JSON.parse(r[2])};
    });

    var resp = UrlFetchApp.fetch('https://your-embedding-service.example/api/match', {
      method: 'post',
      contentType: 'application/json',
      payload: JSON.stringify({items: payload, top_k:5}),
      muteHttpExceptions: true
    });

    if (resp.getResponseCode() !== 200) continue;
    var results = JSON.parse(resp.getContentText());
    results.forEach(function(r, idx){
      var row = i + idx + 2;
      sheet.getRange(row,4).setValue(JSON.stringify(r.matches)); // write matches JSON
      sheet.getRange(row,5).setValue(r.top_score);
    });
  }
}

Notes:

EXCEL (OFFICE SCRIPT)

Office Script can perform an equivalent flow in Excel Online. The following is a concise example using the Fetch API available in Office Scripts runtimes.

async function main(workbook: ExcelScript.Workbook) {
  const sheet = workbook.getWorksheet('Matches');
  const range = sheet.getRange('A2:C101'); // adjust range
  const values = range.getValues();

  const items = values.map(r => ({id: r[0], query: r[1], candidates: JSON.parse(r[2])}));

  const resp = await fetch('https://your-embedding-service.example/api/match', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({items: items, top_k: 5})
  });

  if (!resp.ok) return;
  const results = await resp.json();
  results.forEach((r, i) => {
    sheet.getRange(`D${i+2}`).setValue(JSON.stringify(r.matches));
    sheet.getRange(`E${i+2}`).setValue(r.top_score);
  });
}

Performance and Cost Examples

Worked example: 10,000 rows, naive embedding per row vs hybrid candidate approach.

Approach Embedding calls Estimate cost (@ $0.0005/call)
Naive (one embedding per row) 10,000 $5.00
Hybrid (FLOOKUP -> avg 20 candidates) 10,000 * 20 = 200,000 $100.00
Optimised hybrid (candidate blocking + caching -> avg 5 candidates) 50,000 $25.00

Interpretation: hybrid approaches can increase embedding calls per row if candidate lists are large; candidate reduction and caching are essential to keep costs manageable. The cost figures above are illustrative; adapt pricing to the chosen provider and model size.

ANN and Production Notes

UX and Review Workflows

Recommended columns in the review sheet:

FURTHER READING