PREPROCESS DATA BY TEXT SIMILARITY

Introduction to Data Preprocessing

In this guide, you will learn how to use two powerful Flookup functions that can make your data cleaning easier, faster and susceptible to fewer errors: NORMALIZE and FUZZYMATCH.

NORMALIZE can improve the quality and consistency of your data by removing or formatting text entries that might interfere with the fuzzy matching process.

FUZZYMATCH can help you understand your data better by showing you how similar your text entries are. It also gives you a glimpse of the underlying mechanism that drives the other Flookup functions.

NORMALIZE

Normalize function modes can be divided into two broad groups:

Here is a condensed look at what each NORMALIZE Mode does:

To normalize text entries in the "Group I" functions, follow the steps below:

To normalize text entries in the "Group II" functions, follow the steps below:

---

Key Points on NORMALIZE

FUZZYMATCH

Key Points on FUZZYMATCH