Teach Google Sheets to Clean Product Codes Without Writing Formulas
The Problem: Product Code Chaos
Product codes are supposed to be a universal identifier. In theory, each product gets a unique code and that code remains consistent across your inventory, procurement and e-commerce systems. In practice, product codes become a nightmare.
One warehouse enters codes as PROD-2024-001 v2. Another uses prod_2024_001_v2. A third drops the version number entirely: PROD2024001. A fourth adds extra spaces: PROD - 2024 - 001. When these codes flow into your inventory system, financial reports or customer-facing catalogues, the mismatches create havoc.
Inventory queries fail to match the same product across systems. Reports show inflated duplicate counts. APIs reject malformed codes. Customer orders reference product codes that do not exist in your system. The damage is compounded because the errors are not obvious. Your data looks present and complete, but it is lying to you.
Most organisations solve this problem in one of three ways and all three are expensive.
Why Traditional Formulas Fail
Approach 1: Manual Cleanup involves assigning someone to review each malformed code and correct it by hand. This works for 100 codes. It scales terribly for 10,000 codes. The person doing the cleaning gets bored, makes mistakes and the problem resurfaces every time new data arrives.
Approach 2: Complex Formulas attempts to encode the transformation logic using nested IF statements or regular expressions. If your transformation is straightforward (remove dashes, convert to uppercase), this approach works. But product codes rarely have simple transformations. They combine multiple operations: remove dashes, remove version numbers, trim spaces, convert to uppercase and remove duplicate letters. Writing a formula that handles all these cases is not just complex, it becomes unmaintainable.
Approach 3: Custom Scripts requires hiring a developer to write Apps Script code or importing your data into a programming environment like Python. This works, but it is slow, expensive and breaks the workflow for analysts who just need to clean data in their spreadsheet.
All three approaches share a common weakness: they require you to anticipate every transformation rule upfront. The moment a new variation of the product code format appears, your formula breaks or your manual process stalls.
Introducing Learn from Examples
The Learn from Examples feature in Flookup Data Wrangler uses pattern synthesis to solve this problem differently. Instead of anticipating transformation rules, you simply show the system what you want.
You provide a few examples of the dirty data and what the clean version should look like. The pattern synthesis engine analyses these examples, identifies the underlying transformation logic and applies that logic to your entire dataset. No formulas. No scripts. Just teach by example.
This approach has a critical advantage: it handles complexity naturally. When you show the system examples of a complex transformation, it learns all the component rules at once. Remove dashes, trim spaces, remove version numbers, convert to uppercase? The system figures it out from your examples.
Case Study Walkthrough
Step 1: Identify the Problem
The e-commerce company in our case study had over 12,000 product codes in their inventory sheet. A data audit revealed that approximately 4,800 codes (40 percent) were in non-standard formats. These codes would not match in downstream systems and were preventing accurate inventory reporting.
The team opened Google Sheets, selected the Standardize text entries function from the Flookup Data Wrangler sidebar and chose the Learn from examples operation. No setup wizard. No configuration dialogs. Just a straightforward choice to teach by example.
Step 2: Gather Example Pairs
The team created two columns in a dedicated sheet. The first column contained the dirty product codes exactly as they appeared in the live data. The second column contained the standardised version. A few examples were sufficient:
| Dirty (Original Format) | Clean (Standardised) |
|---|---|
PROD-2024-001 v2 |
PROD2024001 |
prod_2024_001_v2 |
PROD2024001 |
PROD - 2024 - 001 |
PROD2024001 |
Prod 2024 001 v3 |
PROD2024001 |
2024-001-PROD |
PROD2024001 |
The examples show the system multiple variations of the same underlying rule: extract the product identifier, year and sequence number, then reformat as uppercase without separators or version information.
Step 3: Test the Pattern
The team selected both the dirty and clean example columns and clicked Test. The pattern synthesis engine displayed the detected rule: Remove dashes, underscores and spaces. Remove version suffixes. Convert to uppercase. Reorder components to standard format.
A preview showed how the system would transform a sample of codes from the full dataset. The preview confirmed the detected logic matched the intended transformation.
Step 4: Configure the Schedule
The team selected their full product code column as the input range and specified where the standardised codes should be written. They chose to process in a loop so that whenever new products were added to the inventory, the standardisation would run automatically.
They set the frequency to weekly and clicked Schedule.
Step 5: Monitor Results
The first run processed all 12,000 codes in approximately three minutes (the function ran across multiple Google Apps Script executions using the auto-chaining feature to bypass execution time limits). The team verified that all 4,800 malformed codes were now standardised to the correct format. Subsequent runs each week automatically cleaned any new codes entered by warehouse staff.
How Pattern Synthesis Works
The Learn from Examples feature applies sophisticated analysis to your example pairs and identifies transformation rules. Here is what the system does under the hood:
- Analyse the Differences: The system compares each dirty value to its clean counterpart and identifies what changed. Characters removed? Sequences reordered? Case converted?
- Extract Common Patterns: It looks across all your example pairs to find patterns common to all transformations. If 90 percent of your examples remove dashes, that becomes part of the learned rule.
- Build a Rule Pipeline: The system constructs a pipeline of primitive transformations (remove characters, collapse whitespace, change case, reorder sections) that, when applied in sequence, convert the dirty values to clean values.
- Validate Against Examples: Before applying the rule to your full dataset, the system verifies that the discovered rule correctly transforms all your example pairs.
- Apply to Full Dataset: Once validated, the rule is applied to every row in your input range.
The system can learn combinations of 24 primitive transformations across 6 categories: case operations, character removal, whitespace handling, punctuation treatment, bracket removal and section reordering. This flexibility allows it to handle nearly any consistent transformation you can demonstrate with examples.
The Business Impact
For the e-commerce company, the results were significant:
| Metric | Before Learn from Examples | After Learn from Examples | Improvement |
|---|---|---|---|
| Product code consistency | 60% | 100% | 40 percentage points |
| Inventory query match rate | 78% | 99% | 21 percentage points |
| Manual code review hours per month | 16 | 1 | 15 hours saved monthly |
| Time to process new product codes | 15 mins per batch | Automatic | 100% automated |
| Data entry errors caught | Manual spot-check | Real-time standardisation | Zero malformed codes in reports |
The largest impact was operational. The team eliminated a recurring manual task and gained confidence that product codes flowing through their systems were in a consistent format. Downstream inventory queries, API integrations and customer-facing catalogues all benefited from clean, standardised product identifiers.
Getting Started with Your Own Data
Assess Your Transformation
Before you start, identify your dirty data and what the clean version should look like. Is it just removing special characters? Reordering sections? Converting case? Collapsing whitespace? The clearer you are about the transformation, the better your examples will be.
Create Example Pairs
In a new sheet, create two columns. Column A contains the dirty values, column B contains the clean versions. Aim for 3 to 5 examples that demonstrate the full range of variations in your data. If you have multiple distinct transformation patterns, you may need to create separate schedules for each pattern.
Test and Validate
Use the Test function in the scheduling sidebar to preview how the system will transform your data. If the detected rule matches your intention, proceed. If not, add another example pair that clarifies the rule and test again.
Schedule and Monitor
Once the rule is validated, select your full input range and configure the frequency. For ongoing data entry, choose Process in a loop so the rule runs continuously. For one-time cleanup, choose Process to end. Either way, monitor the first execution to verify results.
For detailed instructions, see the Learn from Examples documentation.
Frequently Asked Questions
How many example pairs do I need?
Minimum is 2, but 3 to 5 examples typically produce the most robust rules. The more diverse your examples (showing different variations of the same underlying pattern), the better the system can generalise to your full dataset.
What if my data has multiple distinct transformation patterns?
If some codes follow one rule and others follow a different rule, you will need to create separate schedules for each pattern. Alternatively, you can pre-filter your data to group similar transformation patterns together and run multiple Learn from Examples operations sequentially.
Can it learn very complex transformations?
Yes, as long as the transformation is consistent across your examples. The system can handle combinations of case conversion, character removal, whitespace handling, punctuation removal and section reordering. If your transformation requires context beyond what appears in the individual cell (for example, looking up a value in another sheet), you will need a different approach.
What if the learned rule does not work correctly on my full dataset?
This usually means your examples did not capture the full range of variation in your data. Add more example pairs that show different edge cases, then test again. The learned rule will update based on the expanded set of examples.
Can I edit the learned rule manually?
No, but you do not need to. The rule is automatically re-derived from your examples every time you modify them. If you want to change the rule, add or modify example pairs to better reflect your desired transformation.