REMOVING DUPLICATES BY TEXT SIMILARITY