Comparing cells in Excel
Discussion
I have a spreadsheet with about 14,000 rows in it and I’d like to delete rows where the value in one column is a duplicate.
The trouble is the column in question contains a user entered text description which is sometimes 95% the same as another cell but not exactly the same. I’d like to remove these duplicates as well. For example I could have:
RILLENKUGELLAGER 6005 25x47x12
RILLENKUGELLAGER 6005 25 X 47 X 12
Rillenkugellager 6005 25 - 47 – 12
All of the above would be considered duplicates in my scenario but obviously a straight cell for cell comparison would not pick them up as such. So far I’m using the following formula to clean up the data and then checking for duplicates in the resulting column:
=SUBSTITUTE(TRIM(UPPER(A1))," ","" )
This strips out any spaces and converts it all to upper case which gets rid of any case sensitivity issues and instances where a user’s hit the space bar twice between words. But it does not completely resolve my problem.
Ideally I’d like to do some kind of scoring system based on the ASCII code of each letter. If more than 90% of the letters match letters in the other cell in the same order then it’s a duplicate.
Does anyone know how I could do this using Excel formulas?
>> Edited by t1grm on Tuesday 31st May 12:25
The trouble is the column in question contains a user entered text description which is sometimes 95% the same as another cell but not exactly the same. I’d like to remove these duplicates as well. For example I could have:
RILLENKUGELLAGER 6005 25x47x12
RILLENKUGELLAGER 6005 25 X 47 X 12
Rillenkugellager 6005 25 - 47 – 12
All of the above would be considered duplicates in my scenario but obviously a straight cell for cell comparison would not pick them up as such. So far I’m using the following formula to clean up the data and then checking for duplicates in the resulting column:
=SUBSTITUTE(TRIM(UPPER(A1))," ","" )
This strips out any spaces and converts it all to upper case which gets rid of any case sensitivity issues and instances where a user’s hit the space bar twice between words. But it does not completely resolve my problem.
Ideally I’d like to do some kind of scoring system based on the ASCII code of each letter. If more than 90% of the letters match letters in the other cell in the same order then it’s a duplicate.
Does anyone know how I could do this using Excel formulas?
>> Edited by t1grm on Tuesday 31st May 12:25
I think you're over complicating it!
assuming A1 is the cell in the row
2 choices:
1) An if formula in a spare colum: =if(a1="yourtext",1,0)
you can have up to 7 nested ifs so that's a few choices
use autofilter to select all 1's and delete
2) insert a module with a macro: pm me if you want that. This will give you the option to include as many variants as you wish
assuming A1 is the cell in the row
2 choices:
1) An if formula in a spare colum: =if(a1="yourtext",1,0)
you can have up to 7 nested ifs so that's a few choices
use autofilter to select all 1's and delete
2) insert a module with a macro: pm me if you want that. This will give you the option to include as many variants as you wish
Jay-Aim said:
I think you're over complicating it!
assuming A1 is the cell in the row
2 choices:
1) An if formula in a spare colum: =if(a1="yourtext",1,0)
you can have up to 7 nested ifs so that's a few choices
use autofilter to select all 1's and delete
2) insert a module with a macro: pm me if you want that. This will give you the option to include as many variants as you wish
Getting rid of the duplicates is no problem. I just sort by the column I want to check duplicates for and compare each row to the row above using an IF formula:
=IF(A3=A2,”Dup”,”Non-Dup”)
Then I filter on “Dup” and delete all those rows.
The problem is that only works for exact matches. I need to “scrub” the data as described in my first post before I can filter out the duplicates.
The soundex system looks good. However will it work for non-English words? I see the code is generated by allocating a number to groups of letters. However different groups of letters would sound different together in different languages so I’m wondering if the letter groupings suggested are for English only.
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff