HOW TO CLEAN DATA IN GOOGLE SHEETS USING AI
Data Cleaning with AI
One of the many data cleaning functions of Flookup is the ability to clean data using advanced GPT-based models like o3-mini. This feature has been carefully pre-configured to make data cleaning even easier for you at no additional cost. Here are the steps to follow:
Head to Extensions > Flookup Data Wrangler > Intelligent data cleaning in your spreadsheet menu.
Select the mode you want to run.
Match and merge: Compare data from two different columns and return best matches.
Remove duplicates: Remove duplicates and return only unique values.
Standardize data: Standardize data by adjusting case, trimming spaces, correcting misspellings and ensuring numeric consistency.
Transform data formats: Transform data by modifying date formats, converting measurement units and applying user-specified change formats.
Fill in missing data: Fill missing data with placeholders or computed values.
Remove common outliers: Remove outliers and return the cleaned dataset.
Highlight a range of data you would like to analyse and click "Grab selected range.
Enter your prompt, making sure to reference the data in your selection above using the column number, not column letter. For example, if I selected range B2:H5000, then column B would be "column 1", column C would be "column 2" and so on. This is the recommended way.
Click an empty cell to indicate the column where you would like your results to be displayed.
Click the "Submit Prompt" button.
How to Write Good Prompts
Be clear and direct: Describe exactly what you want to clean or modify e.g. "Remove duplicate rows" or "Standardize dates to DD-MM-YYYY".
Do not be vague: Use precise language instead of vague terms like "fix" or "clean up". Say what needs to be fixed and how e.g. "Remove duplicates in the first column and second column".
Keep it simple: Stick to one task per prompt when possible. If needed, separate complex tasks into multiple steps.
Use data cleaning keywords: Short, structured instructions work better than long, detailed explanations e.g. "Remove punctuation marks from all cells".
Specify edge cases: If certain values should be ignored or handled differently, mention that e.g. "Do not change column headers".
Test and refine: If results are not perfect, tweak the wording slightly to improve accuracy while taking note of the quality of output you are getting.
AI Usage Policy
The way you use AI for data cleaning depends on your specific use case and data set. Experimenting with different approaches to find the most effective solution is recommended. Please note that the number of rows you can clean with AI is determined by the available load, but it should be at last of 5000 rows per month.
Explore More
The following are quick links to variants of some of the functions shown here that use traditional data cleaning algorithms: