Data Cleaning in the Era of AI

Data cleaning is a crucial step in any data analysis process. It involves removing duplicates, correcting errors, and standardizing data formats. However, this process can be time-consuming and tedious. But what if I told you that there is a way to automate this process using artificial intelligence? Enter GPTs, a powerful AI family of models developed by OpenAI and the technology behind ChatGPT, which can be used for data cleaning in Google Sheets.

What is a GPT Model?

GPT, which stands for Generative Pretrained Transformer, is a type of Large Language Model (LLM) and a prominent framework for generative artificial intelligence. It is capable of understanding and generating human-like responses based on the input it receives.

LLMs are part of the field of Natural Language Processing and the study of semantics. Its roots can be traced back to the pioneering work of French philologist, Michel Bréal, who introduced the concept of semantics back in 1883. The evolution of LLMs took a significant leap forward in 2017 when Google researchers unveiled the transformer architecture at the NeurIPS conference, which has since become a fundamental component of modern LLMs.

The most popular GPT is ChatGPT and it was first developed by OpenAI and publicly released in 2018

At its core, a GPT leverages machine learning techniques to process and generate text based on the input it receives. It is trained on a diverse range of internet text, enabling it to respond to a wide array of prompts in a manner that is coherent and contextually relevant. This makes it an excellent tool for automating various tasks, including data cleaning.

How Can a GPT Model Help with Data Cleaning?

Merits of using a GPT Model for Data Cleaning

Demerits of using a GPT Model for Data Cleaning

Making API Calls from Google Sheets to a GPT Model

Google Sheets, a versatile tool, is not only useful for data entry and analysis, but also for interacting with APIs, including those of GPT models. Here is a guide on how to accomplish this:

A screenshot showing how to use Google Apps Script function to POST requests to GPT models

In this code, replace "Bearer YOUR_OPENAI_API_KEY" with your actual OpenAI API key. This function makes a POST request to the OpenAI API, sending a JSON payload that contains the prompt and the maximum number of tokens to generate.

You can call this function in any cell in your Google Sheet by typing =callGPTModel().

Potential Applications

Why Traditional Data Cleaning Algorithms Still Matter

While AI models like GPT-4 offer innovative ways to automate data cleaning, using traditional algorithms via Google Apps Script for data cleaning has its own unique advantages:


Data cleaning does not have to be a tedious process. By combining the power of GPT and Google Sheets, you can automate this process, saving you time and ensuring that your data is clean and ready for analysis.

Please remember that the specifics of how you can use a GPT model for data cleaning will depend on your particular use case and dataset. It is always a good idea to experiment with different approaches and see what works best for your needs.