searching within multiple files
Discussion
I have loads of files from an ex employee, and I need to rifle through them to grab email addresses so that we can notify customers of her departure.
Is there an easy way of doing it? The files range from word, excel and PDF's.
I have used google desktop search in the past, but has anyone had experience limiting a search to portable drive, as in this case, and for a single character "@".
TIA
Is there an easy way of doing it? The files range from word, excel and PDF's.
I have used google desktop search in the past, but has anyone had experience limiting a search to portable drive, as in this case, and for a single character "@".
TIA
On a DOS prompt, change to the directory you want to search
Use "findstr" (type help findstr for the full options)
Something like :
findstr /S /M "@" .\*
should work though (/s means subdirectorys, /M prints matching files) although expect an awful lot of extra crap in the search result due to binary files etc. You can hide them with /P but you might find it also hides genuine results. Someone who's better with regular expression can probably expand "@" into something that matches email addresses with format something@domain.com
Use "findstr" (type help findstr for the full options)
Something like :
findstr /S /M "@" .\*
should work though (/s means subdirectorys, /M prints matching files) although expect an awful lot of extra crap in the search result due to binary files etc. You can hide them with /P but you might find it also hides genuine results. Someone who's better with regular expression can probably expand "@" into something that matches email addresses with format something@domain.com
cj_eds said:
On a DOS prompt, change to the directory you want to search
Use "findstr" (type help findstr for the full options)
Something like :
findstr /S /M "@" .\*
should work though (/s means subdirectorys, /M prints matching files) although expect an awful lot of extra crap in the search result due to binary files etc. You can hide them with /P but you might find it also hides genuine results. Someone who's better with regular expression can probably expand "@" into something that matches email addresses with format something@domain.com
1. I doubt if the OP is running anything that gives him a DOS window.Use "findstr" (type help findstr for the full options)
Something like :
findstr /S /M "@" .\*
should work though (/s means subdirectorys, /M prints matching files) although expect an awful lot of extra crap in the search result due to binary files etc. You can hide them with /P but you might find it also hides genuine results. Someone who's better with regular expression can probably expand "@" into something that matches email addresses with format something@domain.com
2. FINDSTR will only find patterns of text in files. Ever tried opening a word, excel or pdf in a dos text editor. Unreadable.
I was thinking you can use that from the windows command prompt (Start->Run->enter "cmd" or "command" depending on windows version) then it'll give you a list of filenames you can check manually in dos/excel etc. There's bound to be better solutions out there but its a starting point.
ETA: OP, if you're not familiar with the command prompt etc then easier just to go googling for something to do it for you.
ETA: OP, if you're not familiar with the command prompt etc then easier just to go googling for something to do it for you.
Edited by cj_eds on Friday 8th February 12:13
cj_eds said:
I was thinking you can use that from the windows command prompt (Start->Run->enter "cmd" or "command" depending on windows version) then it'll give you a list of filenames you can check manually in dos/excel etc. There's bound to be better solutions out there but its a starting point.
ETA: OP, if you're not familiar with the command prompt etc then easier just to go googling for something to do it for you.
Windows command prompt is not a DOS window. DOS commands as we remember them, like FINDSTR, have not been available since Windows ME (IIRC).ETA: OP, if you're not familiar with the command prompt etc then easier just to go googling for something to do it for you.
Edited by cj_eds on Friday 8th February 12:13
Certainly you cannot do this in XP and above.
In any case, FINDSTR would not find the text pattern in an Excel, Word or PDF file. They are not ascii files.
I think the regex for this is;
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Edited by dilbert on Friday 8th February 12:54
sorry, was greasing a torque meter ...
firstly, I am relatively comfortable using command prompts in Win XP as in my case, but as previously stated, it won't work as can't "open" the files as such, rather just looks at the coding of the file (similar if you open a jpeg file in word).
In answer to earlier post, yes I could look on google, but lots of hits means lots of confusion. I reckoned that there would be someone who had a similar problem and had found a great solution.
firstly, I am relatively comfortable using command prompts in Win XP as in my case, but as previously stated, it won't work as can't "open" the files as such, rather just looks at the coding of the file (similar if you open a jpeg file in word).
In answer to earlier post, yes I could look on google, but lots of hits means lots of confusion. I reckoned that there would be someone who had a similar problem and had found a great solution.
dilbert said:
I think the regex for this is;
{{{
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
}}}
Did you google that or just knock it up off the top of your head? {{{
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
}}}
Edited by dilbert on Friday 8th February 12:53
I'll bow to the superior knowledge of dos/command prompt commands. I tend to use cygwin & grep or a text editor for this sort of thing - the latter of which isn't complicated enough to distinquish file types!
cj_eds said:
dilbert said:
I think the regex for this is;
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Did you google that or just knock it up off the top of your head? \b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
I'll bow to the superior knowledge of dos/command prompt commands. I tend to use cygwin & grep or a text editor for this sort of thing - the latter of which isn't complicated enough to distinquish file types!
Disclaimer - As with all regexes, implementation is subject to regional variation, and accurate performance is not garanteed!
dilbert said:
I think the regex for this is;
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
excuse the ignorance, but what is that? How is it used? My knowledge certainly doesn't cover that ...\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Edited by dilbert on Friday 8th February 12:54
croxsons said:
dilbert said:
I think the regex for this is;
\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
excuse the ignorance, but what is that? How is it used? My knowledge certainly doesn't cover that ...\b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b
Edited by dilbert on Friday 8th February 12:54
PERL is a programming language, that adopted the wider idea of "Regular Expressions" as a way to describe formatted text to find, using a text string. There are various flavours of regex in existence, but PCRE to me seems the most consistent and intelligable.
"GREP" is a tool for doing exactly what you are looking to do, but it doesnt just find e-mails, it'll find any text. You need the regular expression to tell it what to look for. The one I posted finds e-mail addresses.
Software that does what you are looking for can be found here.
http://www.regular-expressions.info/powergrep.html
Tha site is nothing to do with me, but is (IMO) one of the best resources for working out how to use regex. More importantly it is instrumental in the ongoing effort to standardise the regex syntax.
Edited by dilbert on Friday 8th February 13:37
croxsons said:
XP Pro
In that case just ignore all this dos window, findstr and \b[A-Z0-9._%-]+?@[A-Z0-9.-]+\.[A-Z]{2,4}\b stuff, you can't use it anyway ;-)I just downloaded and tried Email Extractor Files V2.2 from this site http://www.technocomsolutions.com/products.html#tr...
and it does EXACTLY what you want.
Trial is free to try it, will cost you $24.95 to register it and be able to save the data to a file.
Simon
alock said:
FINDSTR is available on XP and Vista.
You are absolutely correct, it is indeed there in C:\WINDOWS\SYSTEM32.But, it won't help the OP.
Nor will GREP or POWERGREP.
He needs something that will search Word, Excel and PDF files.
Simon
ETA - I recognise that these will find data in these files, however managing the whole process is not straightforward and since there are tools available to do exactly what is required it seems logical to take the easy route.
Edited by sgrimshaw on Friday 8th February 13:55
Edited by sgrimshaw on Friday 8th February 14:15
sgrimshaw said:
He needs something that will search Word, Excel and PDF files.
Last time I looked, all of those formats contained plain text representations of their content, presumably in order that "grep" type functions can work.I would accept that PDF files don't have to, but they usually do!!!
dilbert said:
sgrimshaw said:
He needs something that will search Word, Excel and PDF files.
Last time I looked, all of those formats contained plain text representations of their content, presumably in order that "grep" type functions can work.I would accept that PDF files don't have to, but they usually do!!!
I did a quick test and from "text" files, the output would be usable, but with word and excel files the output needs so much work it's just not worth it.
BTW - I just looked more closely at Powergrep, that does have more of a chance, but frankly the dedicated software is so much easier to use "why bother".
Simon
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff