|
audi321
Original Poster
2,308 posts
82 months
|
Hi all...
I have 2 sheets which have a lot of data (words) in cells A1,A2,A3,etc. Is it possible to compare the 2 sheets and for any words which aren't in BOTH spreadsheets be highlighted (or delete the duplicate words just leaving the none duplicates?)
Thanks in advance!
|
|
|
NightRunner
4,257 posts
63 months
|
Have you got the sheets that you can put up on Google docs or something?
|
|
|
audi321
Original Poster
2,308 posts
82 months
|
NightRunner said: Have you got the sheets that you can put up on Google docs or something? I can't because it's client data, but just assume it's a load of names, addresses, etc. What I want to do is establish what names are on one sheet, but not on the other. If I could delete the duplicate words, thereby leaving the none duplicate words that would be best.
|
|
|
LordGrover
18,654 posts
81 months
|
I may be over simplifying it, but s any reason why you can't copy and paste the data from sheet 2 into sheet 1? It'll make life way simpler.
|
|
|
robsa
904 posts
53 months
|
Are you saying that you want to search ALL of one spreadsheet for the details in cell A1 (for instance) or just compare cell A1 with cell A1 in the other spreadsheet to see if they are the same?
|
Advertisement
|
|
|
mrmr96
11,958 posts
73 months
|
Loads of ways to do this. The best one depends on what duplicates you're looking for. For example, I assume your data looks like this: | Name | Address 1 | Address 2 | Postcode | Phone Number | | A Alban | 123 AAA Street | Aville | AA111AA | 0123456789 | | B Brick | 456 BBB Street | Bville | BB111BB | 0987654321 | etc In which case where will the duplicates be? If you're only looking for EXACT duplicates in Col A, or if you're looking for EXACT duplicates in Col A,B,C,D,E (i.e. all cols) then you can simply use the "Remove Duplicates" function in Excel 2007 and Excel 2010. You paste the data tables one below the other, then select them both and use the function. Leave a tickmark next to the columns to check for duplicates in. Rows where all the ticked columns match will be deemed duplicates and deleted. (For example you may have two people with the same name, but you know they different as they have different post codes, so you'd leave name and postcode both ticked, so the function would only delete instances where both name and postcode are the same on a row.) If your data isn't as well orgnanised then you'd need a different technique, however none are too onerous so long as you're always looking for an exact match of somekind. What gets really hard is when you've only got a partial match, so if A Alban and Alan Alban should be identified as a match.
|
|
|
audi321
Original Poster
2,308 posts
82 months
|
Yes my data does look similar to below, however it is in 2 separate files, and the 2nd file holds far more data than the first, so I would like to run a command to identify which ones are in file 2 that aren't in file 1 by deleting the duplicate words in the 2nd file? Does that make sense or am I talking jibberish? For example, I assume your data looks like this: | Name | Address 1 | Address 2 | Postcode | Phone Number | | A Alban | 123 AAA Street | Aville | AA111AA | 0123456789 | | B Brick | 456 BBB Street | Bville | BB111BB | 0987654321 |
|
|
|
just me
5,904 posts
89 months
|
Add in the number 2 in a separate column at the end of each row on spreadsheet 2. Copy all data from spreadsheet 2 to spreadsheet 1. Check for duplicates as described above. After they have been removed, sort by the column with the number 2 in it. Copy all those records back to spreadsheet 2. Sort as needed, probably by surname.
Done.
|
|
|
mrmr96
11,958 posts
73 months
|
just me said: Add in the number 2 in a separate column at the end of each row on spreadsheet 2. Copy all data from spreadsheet 2 to spreadsheet 1. Check for duplicates as described above. After they have been removed, sort by the column with the number 2 in it. Copy all those records back to spreadsheet 2. Sort as needed, probably by surname.
Done. nice, i think that would work. just be sure to paste the stuff from sheet 2 UNDER the sheet 1 stuff in step 2.
|
|
|
just me
5,904 posts
89 months
|
Ah yes, I was assuming some knowledge of how Excel works.
To clarify, Copy the data from spreadsheet 2 to empty cells in spreadsheet 1.
Do not paste into cells that already have info, as it will get overwritten. Also, paste into the correct columns ie. surnames from spreadsheet 2 should end up under the surnames in spreadsheet 1, etc. You might want to colour code some cells before copying, so it's easy to see what ends up where.
|
|
|
just me
5,904 posts
89 months
|
If you have a LOT of duplicate records, say 500 or more, you will probably want to do the following: Sort by the duplicate fields, ie. if the surnames are duplicates, sort by surnames so that all duplicates are together. In an empty column at the top, write a formula using the @IF function to see if the surnames of the current row and the row below are the same, and if they are, it should write "Duplicate". ie. supposing the 3rd column has surnames, it would be @IF (C2=C1,"Duplicate"). You might have to do some text parsing to compare the text contents of C1 and C2--look up how to compare text.
So now you have a column that will give you duplicates. Sort by that. Now all the duplicated entries will be in one block. Delete them. Then sort by the column with the number 2. Now all the contents of spreadsheet 2 will be together, but without the duplicates. Copy these back to spreadsheet 2, if you must maintain the separate files. In spreadsheet 2, sort as needed, probably by surname.
|
|
|
mrmr96
11,958 posts
73 months
|
just me said: Ah yes, I was assuming some knowledge of how Excel works.
To clarify, Copy the data from spreadsheet 2 to empty cells in spreadsheet 1.
Do not paste into cells that already have info, as it will get overwritten. Also, paste into the correct columns ie. surnames from spreadsheet 2 should end up under the surnames in spreadsheet 1, etc. You might want to colour code some cells before copying, so it's easy to see what ends up where. Lols! I didn't expect him to paste it over the top of the stuff already in sheet 1, but I was just making the point that the stuff from sheet 2 had to be at the bottom, and not have new rows inserted and then put at the top; reason being that the remove duplicates function keeps the first occurrence only; so anything found in sheet 1 (the stuff at the top) is then removed from the sheet 2 stuff (at the bottom). This only works if they're stacked sheet 1 above and sheet 2 below.
|
|
|
mrmr96
11,958 posts
73 months
|
just me said: If you have a LOT of duplicate records, say 500 or more, you will probably want to do the following: Sort by the duplicate fields, ie. if the surnames are duplicates, sort by surnames so that all duplicates are together. In an empty column at the top, write a formula using the @IF function to see if the surnames of the current row and the row below are the same, and if they are, it should write "Duplicate". ie. supposing the 3rd column has surnames, it would be @IF (C2=C1,"Duplicate"). You might have to do some text parsing to compare the text contents of C1 and C2--look up how to compare text.
So now you have a column that will give you duplicates. Sort by that. Now all the duplicated entries will be in one block. Delete them. Then sort by the column with the number 2. Now all the contents of spreadsheet 2 will be together, but without the duplicates. Copy these back to spreadsheet 2, if you must maintain the separate files. In spreadsheet 2, sort as needed, probably by surname. That's sort of how I'd have done it; - Add a new column to both sheets, making it the concatonation of firstname, surname and postcode. Separated by a colon or something. =a1&":"&b1&":"&1c or similar. - Then add another column to sheet 2, with a countif formula which will look for occurrences of stuff from sheet 2 in sheet 1 and flag. =countif(sheet1!d  ,d1) or similar. - Then you can see what stuff from sheet 2 appears once or more or sheet 1, and then delete them. (Like you said, sort by that column, and then delete the block of rows where the count is 1 or more.) The advantage of doing it this way, is that you can add as many columns as you like to the concatonation, so you can check for duplicates only where all the columns selected are the same; so in my example I'm checking not only the name but the postcode too. (Since there could be two people with the same name but at different post codes - meaning they are distinct from each other.)
|
|
|
just me
5,904 posts
89 months
|
Sounds good. 
|
|
|
JontyR
981 posts
36 months
|
The problem you will get is looking for similar data. You will get different spellings, possibly by mistake on names, upper and lower case issues, full stops ie Dept. or Dept
So when I was working at a data warehouse we matched on name and postcode, you could also look for telephone numbers as they are reasonably unique (mobiles more than land lines)
I would also create a seperate file with both columns so that you can check to ensure you have correctly deleted. It would only take 10 minutes to write a simple macro to do the job.
If you can show a screen shot of the columns used with some random data, then I will send it back
|
|
|
audi321
Original Poster
2,308 posts
82 months
|
Hi all. Thanks very much for all your help. I've done a screenshot of some random data in a similar format to the file I have.  In here you'll see that there's a load of data in column A, of which some is duplicated in columns B and C. What I would like to do is run a formula to extract (or delete the duplicates if easier) the Non Duplicates over to Column D? e.g. Cell A15 - 'ABC - All Of My Heart' is duplicated in Cells B26 and C26, so I would want these Cells deleting, thus just leaving the non duplicates? Possible?
|
|
|
JontyR
981 posts
36 months
|
Sorry...busy, and took a little longer than first thought! Public Sub CheckDuplicates() Dim lRow As Long, lPos As Long, lFRow As Long Dim sCheck As String, sArtist As String, sTrack As String Dim sAddress As String Dim oFind As Range, oRng As Range
lRow = 1
Do Until Cells(lRow, 1) = "" sCheck = Cells(lRow, 1) lPos = InStr(1, sCheck, "-")
If lPos <> 0 Then sArtist = Trim(Left(sCheck, lPos - 1)) sTrack = Trim(Mid(sCheck, lPos + 1)) lFRow = 0 With Columns("B:B") Set oFind = .Find(What:=sArtist, LookIn:=xlValues) If Not oFind Is Nothing Then
Do lFRow = oFind.Row If oFind.Offset(0, 1) = sTrack Then oFind.Resize(1, 2).Clear Set oFind = .FindNext(oFind) Loop Until oFind Is Nothing Or oFind.Row < lFRow End If End With End If
lRow = lRow + 1 Loop End Sub
What I have done is, going from your example, taken the artist name and track name and split them into two strings. Then running down column B I find the artist name, if the cell to the right contains the same track name as the one we split out, then the cells are cleared. Does that work for you?? If you want to send me a template with the new data I can adapt the macro to suit.
|
|
|
audi321
Original Poster
2,308 posts
82 months
|
JontyR said: Sorry...busy, and took a little longer than first thought! Public Sub CheckDuplicates() Dim lRow As Long, lPos As Long, lFRow As Long Dim sCheck As String, sArtist As String, sTrack As String Dim sAddress As String Dim oFind As Range, oRng As Range
lRow = 1
Do Until Cells(lRow, 1) = "" sCheck = Cells(lRow, 1) lPos = InStr(1, sCheck, "-")
If lPos <> 0 Then sArtist = Trim(Left(sCheck, lPos - 1)) sTrack = Trim(Mid(sCheck, lPos + 1)) lFRow = 0 With Columns("B:B") Set oFind = .Find(What:=sArtist, LookIn:=xlValues) If Not oFind Is Nothing Then
Do lFRow = oFind.Row If oFind.Offset(0, 1) = sTrack Then oFind.Resize(1, 2).Clear Set oFind = .FindNext(oFind) Loop Until oFind Is Nothing Or oFind.Row < lFRow End If End With End If
lRow = lRow + 1 Loop End Sub
What I have done is, going from your example, taken the artist name and track name and split them into two strings. Then running down column B I find the artist name, if the cell to the right contains the same track name as the one we split out, then the cells are cleared. Does that work for you?? If you want to send me a template with the new data I can adapt the macro to suit. Hi Jonty. wow....all that for this simple task to remove duplicates! I was expecting some kind of formula? I've got to say that due to my stupidity, I'm not sure what to do with your formula!
|
|
|
just me
5,904 posts
89 months
|
How many records do you have in each spreadsheet? If it's less than 500, it's probably just as efficient to do it manually.
|
|
|
audi321
Original Poster
2,308 posts
82 months
|
just me said: How many records do you have in each spreadsheet? If it's less than 500, it's probably just as efficient to do it manually. No it's '000s unfortunately
|
|