Login | Register
SearchMy Stuff
My ProfileMy PreferencesMy Mates RSS Feed
2
Reply to Topic
Author Discussion

audi321

Original Poster:

2,308 posts

82 months

[news] 
Monday 9th July 2012 quote quote all
Hi all...

I have 2 sheets which have a lot of data (words) in cells A1,A2,A3,etc. Is it possible to compare the 2 sheets and for any words which aren't in BOTH spreadsheets be highlighted (or delete the duplicate words just leaving the none duplicates?)

Thanks in advance!

NightRunner

4,257 posts

63 months

[news] 
Monday 9th July 2012 quote quote all
Have you got the sheets that you can put up on Google docs or something?

audi321

Original Poster:

2,308 posts

82 months

[news] 
Monday 9th July 2012 quote quote all
NightRunner said:
Have you got the sheets that you can put up on Google docs or something?
I can't because it's client data, but just assume it's a load of names, addresses, etc.

What I want to do is establish what names are on one sheet, but not on the other. If I could delete the duplicate words, thereby leaving the none duplicate words that would be best.

LordGrover

18,654 posts

81 months

[news] 
Monday 9th July 2012 quote quote all
I may be over simplifying it, but s any reason why you can't copy and paste the data from sheet 2 into sheet 1? It'll make life way simpler.

robsa

904 posts

53 months

[news] 
Monday 9th July 2012 quote quote all
Are you saying that you want to search ALL of one spreadsheet for the details in cell A1 (for instance) or just compare cell A1 with cell A1 in the other spreadsheet to see if they are the same?


Advertisement

mrmr96

11,958 posts

73 months

[news] 
Monday 9th July 2012 quote quote all
Loads of ways to do this.

The best one depends on what duplicates you're looking for.

For example, I assume your data looks like this:
Name Address 1 Address 2 Postcode Phone Number
A Alban 123 AAA Street Aville AA111AA 0123456789
B Brick 456 BBB Street Bville BB111BB 0987654321


etc

In which case where will the duplicates be?

If you're only looking for EXACT duplicates in Col A, or if you're looking for EXACT duplicates in Col A,B,C,D,E (i.e. all cols) then you can simply use the "Remove Duplicates" function in Excel 2007 and Excel 2010. You paste the data tables one below the other, then select them both and use the function. Leave a tickmark next to the columns to check for duplicates in. Rows where all the ticked columns match will be deemed duplicates and deleted.

(For example you may have two people with the same name, but you know they different as they have different post codes, so you'd leave name and postcode both ticked, so the function would only delete instances where both name and postcode are the same on a row.)

If your data isn't as well orgnanised then you'd need a different technique, however none are too onerous so long as you're always looking for an exact match of somekind. What gets really hard is when you've only got a partial match, so if A Alban and Alan Alban should be identified as a match.

audi321

Original Poster:

2,308 posts

82 months

[news] 
Monday 9th July 2012 quote quote all
Yes my data does look similar to below, however it is in 2 separate files, and the 2nd file holds far more data than the first, so I would like to run a command to identify which ones are in file 2 that aren't in file 1 by deleting the duplicate words in the 2nd file? Does that make sense or am I talking jibberish?

For example, I assume your data looks like this:
Name Address 1 Address 2 Postcode Phone Number
A Alban 123 AAA Street Aville AA111AA 0123456789
B Brick 456 BBB Street Bville BB111BB 0987654321


just me

5,904 posts

89 months

[news] 
Monday 9th July 2012 quote quote all
Add in the number 2 in a separate column at the end of each row on spreadsheet 2.
Copy all data from spreadsheet 2 to spreadsheet 1.
Check for duplicates as described above.
After they have been removed, sort by the column with the number 2 in it.
Copy all those records back to spreadsheet 2.
Sort as needed, probably by surname.

Done.

mrmr96

11,958 posts

73 months

[news] 
Tuesday 10th July 2012 quote quote all
just me said:
Add in the number 2 in a separate column at the end of each row on spreadsheet 2.
Copy all data from spreadsheet 2 to spreadsheet 1.
Check for duplicates as described above.
After they have been removed, sort by the column with the number 2 in it.
Copy all those records back to spreadsheet 2.
Sort as needed, probably by surname.

Done.
nice, i think that would work. just be sure to paste the stuff from sheet 2 UNDER the sheet 1 stuff in step 2.

just me

5,904 posts

89 months

[news] 
Tuesday 10th July 2012 quote quote all
Ah yes, I was assuming some knowledge of how Excel works.

To clarify,
Copy the data from spreadsheet 2 to empty cells in spreadsheet 1.

Do not paste into cells that already have info, as it will get overwritten.
Also, paste into the correct columns ie. surnames from spreadsheet 2 should end up under the surnames in spreadsheet 1, etc.
You might want to colour code some cells before copying, so it's easy to see what ends up where.

just me

5,904 posts

89 months

[news] 
Tuesday 10th July 2012 quote quote all
If you have a LOT of duplicate records, say 500 or more, you will probably want to do the following:
Sort by the duplicate fields, ie. if the surnames are duplicates, sort by surnames so that all duplicates are together.
In an empty column at the top, write a formula using the @IF function to see if the surnames of the current row and the row below are the same, and if they are, it should write "Duplicate". ie. supposing the 3rd column has surnames, it would be @IF (C2=C1,"Duplicate"). You might have to do some text parsing to compare the text contents of C1 and C2--look up how to compare text.

So now you have a column that will give you duplicates. Sort by that.
Now all the duplicated entries will be in one block. Delete them.
Then sort by the column with the number 2.
Now all the contents of spreadsheet 2 will be together, but without the duplicates.
Copy these back to spreadsheet 2, if you must maintain the separate files.
In spreadsheet 2, sort as needed, probably by surname.

mrmr96

11,958 posts

73 months

[news] 
Tuesday 10th July 2012 quote quote all
just me said:
Ah yes, I was assuming some knowledge of how Excel works.

To clarify,
Copy the data from spreadsheet 2 to empty cells in spreadsheet 1.

Do not paste into cells that already have info, as it will get overwritten.
Also, paste into the correct columns ie. surnames from spreadsheet 2 should end up under the surnames in spreadsheet 1, etc.
You might want to colour code some cells before copying, so it's easy to see what ends up where.
Lols! I didn't expect him to paste it over the top of the stuff already in sheet 1, but I was just making the point that the stuff from sheet 2 had to be at the bottom, and not have new rows inserted and then put at the top; reason being that the remove duplicates function keeps the first occurrence only; so anything found in sheet 1 (the stuff at the top) is then removed from the sheet 2 stuff (at the bottom). This only works if they're stacked sheet 1 above and sheet 2 below.

mrmr96

11,958 posts

73 months

[news] 
Tuesday 10th July 2012 quote quote all
just me said:
If you have a LOT of duplicate records, say 500 or more, you will probably want to do the following:
Sort by the duplicate fields, ie. if the surnames are duplicates, sort by surnames so that all duplicates are together.
In an empty column at the top, write a formula using the @IF function to see if the surnames of the current row and the row below are the same, and if they are, it should write "Duplicate". ie. supposing the 3rd column has surnames, it would be @IF (C2=C1,"Duplicate"). You might have to do some text parsing to compare the text contents of C1 and C2--look up how to compare text.

So now you have a column that will give you duplicates. Sort by that.
Now all the duplicated entries will be in one block. Delete them.
Then sort by the column with the number 2.
Now all the contents of spreadsheet 2 will be together, but without the duplicates.
Copy these back to spreadsheet 2, if you must maintain the separate files.
In spreadsheet 2, sort as needed, probably by surname.
That's sort of how I'd have done it;
- Add a new column to both sheets, making it the concatonation of firstname, surname and postcode. Separated by a colon or something. =a1&":"&b1&":"&1c or similar.
- Then add another column to sheet 2, with a countif formula which will look for occurrences of stuff from sheet 2 in sheet 1 and flag. =countif(sheet1!dbiggrin,d1) or similar.
- Then you can see what stuff from sheet 2 appears once or more or sheet 1, and then delete them. (Like you said, sort by that column, and then delete the block of rows where the count is 1 or more.)

The advantage of doing it this way, is that you can add as many columns as you like to the concatonation, so you can check for duplicates only where all the columns selected are the same; so in my example I'm checking not only the name but the postcode too. (Since there could be two people with the same name but at different post codes - meaning they are distinct from each other.)

just me

5,904 posts

89 months

[news] 
Tuesday 10th July 2012 quote quote all
Sounds good. thumbup

JontyR

981 posts

36 months

[news] 
Tuesday 10th July 2012 quote quote all
The problem you will get is looking for similar data. You will get different spellings, possibly by mistake on names, upper and lower case issues, full stops ie Dept. or Dept

So when I was working at a data warehouse we matched on name and postcode, you could also look for telephone numbers as they are reasonably unique (mobiles more than land lines)

I would also create a seperate file with both columns so that you can check to ensure you have correctly deleted. It would only take 10 minutes to write a simple macro to do the job.

If you can show a screen shot of the columns used with some random data, then I will send it back

audi321

Original Poster:

2,308 posts

82 months

[news] 
Tuesday 10th July 2012 quote quote all
Hi all. Thanks very much for all your help. I've done a screenshot of some random data in a similar format to the file I have.





In here you'll see that there's a load of data in column A, of which some is duplicated in columns B and C. What I would like to do is run a formula to extract (or delete the duplicates if easier) the Non Duplicates over to Column D?

e.g. Cell A15 - 'ABC - All Of My Heart' is duplicated in Cells B26 and C26, so I would want these Cells deleting, thus just leaving the non duplicates?

Possible?


Edited by audi321 on Tuesday 10th July 12:41

JontyR

981 posts

36 months

[news] 
Tuesday 10th July 2012 quote quote all
Sorry...busy, and took a little longer than first thought!


Public Sub CheckDuplicates()
Dim lRow As Long, lPos As Long, lFRow As Long
Dim sCheck As String, sArtist As String, sTrack As String
Dim sAddress As String
Dim oFind As Range, oRng As Range

lRow = 1

Do Until Cells(lRow, 1) = ""
sCheck = Cells(lRow, 1)
lPos = InStr(1, sCheck, "-")

If lPos <> 0 Then
sArtist = Trim(Left(sCheck, lPos - 1))
sTrack = Trim(Mid(sCheck, lPos + 1))
lFRow = 0
With Columns("B:B")
Set oFind = .Find(What:=sArtist, LookIn:=xlValues)
If Not oFind Is Nothing Then

Do
lFRow = oFind.Row
If oFind.Offset(0, 1) = sTrack Then oFind.Resize(1, 2).Clear
Set oFind = .FindNext(oFind)
Loop Until oFind Is Nothing Or oFind.Row < lFRow
End If
End With
End If

lRow = lRow + 1
Loop
End Sub


What I have done is, going from your example, taken the artist name and track name and split them into two strings. Then running down column B I find the artist name, if the cell to the right contains the same track name as the one we split out, then the cells are cleared.

Does that work for you??

If you want to send me a template with the new data I can adapt the macro to suit.

audi321

Original Poster:

2,308 posts

82 months

[news] 
Tuesday 10th July 2012 quote quote all
JontyR said:
Sorry...busy, and took a little longer than first thought!


Public Sub CheckDuplicates()
Dim lRow As Long, lPos As Long, lFRow As Long
Dim sCheck As String, sArtist As String, sTrack As String
Dim sAddress As String
Dim oFind As Range, oRng As Range

lRow = 1

Do Until Cells(lRow, 1) = ""
sCheck = Cells(lRow, 1)
lPos = InStr(1, sCheck, "-")

If lPos <> 0 Then
sArtist = Trim(Left(sCheck, lPos - 1))
sTrack = Trim(Mid(sCheck, lPos + 1))
lFRow = 0
With Columns("B:B")
Set oFind = .Find(What:=sArtist, LookIn:=xlValues)
If Not oFind Is Nothing Then

Do
lFRow = oFind.Row
If oFind.Offset(0, 1) = sTrack Then oFind.Resize(1, 2).Clear
Set oFind = .FindNext(oFind)
Loop Until oFind Is Nothing Or oFind.Row < lFRow
End If
End With
End If

lRow = lRow + 1
Loop
End Sub


What I have done is, going from your example, taken the artist name and track name and split them into two strings. Then running down column B I find the artist name, if the cell to the right contains the same track name as the one we split out, then the cells are cleared.

Does that work for you??

If you want to send me a template with the new data I can adapt the macro to suit.
Hi Jonty. wow....all that for this simple task to remove duplicates! I was expecting some kind of formula? I've got to say that due to my stupidity, I'm not sure what to do with your formula!

just me

5,904 posts

89 months

[news] 
Tuesday 10th July 2012 quote quote all
How many records do you have in each spreadsheet? If it's less than 500, it's probably just as efficient to do it manually.

audi321

Original Poster:

2,308 posts

82 months

[news] 
Tuesday 10th July 2012 quote quote all
just me said:
How many records do you have in each spreadsheet? If it's less than 500, it's probably just as efficient to do it manually.
No it's '000s unfortunately
2
Reply to Topic