This page introduces a new approach to data cleaning in Google Sheets, one that uses Artificial Intelligence (AI) to understand, interpret and refine messy data with minimal input.
Rather than relying solely on traditional algorithms, it responds to natural language prompts and carries out corrections, formatting and more, based on your instructions.
Whether you are fixing typos, harmonising categories or preparing sheets for analysis, this tool adapts to your intent and streamlines the process.
Head to Extensions > Flookup Data Wrangler > Smart data cleaning in your spreadsheet menu.
Select the data cleaning mode you want to run.
Match and merge: Compare data from two different columns and return best matches.
Remove duplicates: Remove duplicates and return only unique values.
Standardize data: Standardize data by adjusting case, trimming spaces, correcting misspellings and ensuring numeric consistency.
Transform data formats: Transform data by modifying date formats, converting measurement units and applying user-specified change formats.
Fill in missing data: Fill missing data with placeholders or computed values.
Remove common outliers: Remove outliers and return the cleaned dataset.
Enter your own OpenAI™ API key. You can choose to click the button labelled "STORE" to store the key for subsequent uses or the button labelled "ERASE" to delete the key. Your key is stored securely inside your Google account and it cannot be accessed by any who does not have your login credentials.
Highlight a range of data you would like to analyse and click "Grab selected range.
Enter your prompt, making sure to reference the data in your selection above using the column number, not column letter. For example, if I selected range B2:H5000, then column B would be "column 1", column C would be "column 2" and so on. This is our recommendation.
Click an empty cell to indicate the column where you would like your results to be displayed.
Click the "Submit data cleaning prompt" button.
Be clear and direct: Describe exactly what you want to clean or modify e.g. "Remove duplicate rows" or "Standardize dates to DD-MM-YYYY".
Do not be vague: Use precise language instead of vague terms like "fix" or "clean up". Say what needs to be fixed and how e.g. "Remove duplicates in the first column and second column".
Keep it simple: Stick to one task per prompt when possible. If needed, separate complex tasks into multiple steps.
Use data cleaning keywords: Short, structured instructions work better than long, detailed explanations e.g. "Remove punctuation marks from all cells".
Specify edge cases: If certain values should be ignored or handled differently, mention that e.g. "Do not change column headers".
Test and refine: If results are not perfect, tweak the wording slightly to improve accuracy while taking note of the quality of output you are getting.
The way you use AI for data cleaning depends on your specific use case and data set. Experimenting with different approaches to find the most effective solution is recommended. Please note that the number of rows you can clean with AI is determined by the available load, but it should be at last of 50,000 rows per month.
The following are quick links to variants of some of the functions shown here that use traditional data cleaning algorithms: