DATA CLEANING IN GOOGLE SHEETS USING GPT MODELS

Data Cleaning in the Era of AI

Data cleaning is a crucial step in any data analysis process. It involves removing duplicates, correcting errors, and standardizing data formats. However, this process can be time-consuming and tedious. But what if I told you that there is a way to automate this process using artificial intelligence? Enter GPTs, a powerful AI family of models developed by OpenAI and the technology behind ChatGPT, which can be used for data cleaning in Google Sheets.

What is a GPT Model?

A Generative Pretrained Transformer (GPT) is a type of Large Language Model (LLM) and a prominent framework for generative artificial intelligence. It is capable of understanding and generating human-like responses based on the input it receives.

LLMs are part of the field of Natural Language Processing and the study of semantics. Its roots can be traced back to the pioneering work of French philologist, Michel Bréal, who introduced the concept of semantics back in 1883. The evolution of LLMs took a significant leap forward in 2017 when Google researchers unveiled the transformer architecture at the NeurIPS conference, which has since become a fundamental component of modern LLMs.

At its core, a GPT model leverages machine learning techniques to process and generate text based on the input it receives. It is trained on a diverse range of internet text, enabling it to respond to a wide array of prompts in a manner that is coherent and contextually relevant. This makes it an excellent tool for automating various tasks, including data cleaning. Today, one of the most popular GPTs is ChatGPT. It was first developed by OpenAI and publicly released on the 30th of November 2022

How Can a GPT Model Help with Data Cleaning?

Merits of using a GPT Model for Data Cleaning

Demerits of using a GPT Model for Data Cleaning

Making API Calls from Google Sheets to a GPT Model

Google Sheets, a versatile tool, is not only useful for data entry and analysis, but also for interacting with APIs, including those of GPT models. Here is an example of what your code might look like:

A screenshot showing how to use Google Apps Script function to POST requests to GPT models.

In this function, we are using the powerful UrlFetchApp service make the request from inside the Apps Script environment, but you will need to replace "YOUR_OPENAI_API_KEY" with your actual OpenAI API key. This function makes a POST request to the OpenAI API, sending a JSON payload that contains your prompt and the maximum number of tokens to generate.

You can call this function in any cell in your Google Sheet by typing =callGPTModel(text) where "text" can be the actual prompt or a reference to a cell with the prompt.

Potential Applications

Data Cleaning Using Flookup and GPT Models - Coming Soon

You will soon be able to use Flookup to clean data using GPT 3.5 Turbo or GPT 4.0 Turbo. It will be preconfigured to handle tasks like removing duplicates, standardizing data and fuzzy matching. To access it, head to Extensions > Flookup Data Wrangler > Process data with OpenAI™.

Why Traditional Data Cleaning Algorithms Still Matter

While AI models like GPT-4 offer innovative ways to automate data cleaning, using traditional algorithms via Google Apps Script for data cleaning has its own unique advantages:

Final Thoughts

Data cleaning does not have to be a tedious process. By combining the power of GPT and Google Sheets, you can automate this process, saving you time and ensuring that your data is clean and ready for analysis.

Please remember that the specifics of how you can use a GPT model for data cleaning will depend on your particular use case and dataset. It is always a good idea to experiment with different approaches and see what works best for your needs.