HOW TO PREPARE WIKIBASE DATA IN GOOGLE SHEETS

Messy data can hold you back from contributing meaningfully to Wikibase and Wikidata. Whether it is typos, duplicates or inconsistent formats, getting your dataset ready for these projects can feel like a chore. Flookup is a free Google Sheets add-on that streamlines data preparation with advanced fuzzy matching and AI-powered cleaning features.

This guide will walk you through cleaning a sample list of museums, formatting it to align with Wikidata's requirements, and preparing it for upload. You can confidently apply the same process to your own datasets.


Installing the Right Tools

Before you begin, ensure you have Flookup installed. If not, follow the installation guide.


Examine Your Messy Data

Let's work with a sample dataset—a list of museums destined for Wikibase. Here's what it might look like:

Sample museum data table

Highlight Duplicates with Fuzzy Matching

  1. Highlight the Museum Name column (A2:A7).
  2. Go to Extensions > Flookup Data Wrangler > Highlight duplicates.
  3. Set the similarity threshold to 0.8 for close matches.
  4. Click Highlight to execute. Duplicates will be highlighted.

Clean Up the Duplicates: Review Flookup's output, then delete duplicate rows or use the Merge Data feature to combine information.


Remove Duplicates with Fuzzy Matching

  1. Highlight the Museum Name column again.
  2. Go to Extensions > Flookup Data Wrangler > Remove duplicates.
  3. Set the similarity threshold to 0.8.
  4. Click Remove duplicates to clean your data.

Pro Tip: If fuzzy matching misses something, lower the threshold to 0.7, but double-check for false positives.


Standardise Your Data with AI

  1. Highlight the City column.
  2. Go to Extensions > Flookup Data Wrangler > Intelligent data cleaning.
  3. Set the mode to STANDARDIZE DATA.
  4. In the prompt box, type: Standardise city names to lowercase, remove commas and country names.
  5. Click Submit Prompt and review the results.

You can use this AI feature for other operations, such as transforming dates. For example, prompt: Convert years to YYYY-MM-DD format, assume January 1st to reformat years like “1753” to “1753-01-01”.

Pro Tip: Keep prompts short and clear. If the AI misses the mark, tweak your wording and try again.


Double-Check and Export

Export your clean dataset: File > Download > Comma-separated values (.csv). Your data is now ready for tools like QuickStatements or manual Wikibase entry.

Quick Tip: Test a small batch in Wikibase first to catch any formatting quirks.


Why Flookup Works for Wikibase

Flookup integrates seamlessly with Google Sheets, offering an intuitive interface that requires minimal training. By following this tutorial, you will have deduplicated, standardised, and refined your dataset, ensuring it is fully prepared for integration into Wikibase.