Extracting hyperlinks in a webpage EXCEL help needed
|
View this Thread in Original format
jdat |
Do any of you know an easy way to generate lists of all hyperlinks contained inside a webpage?
I have pages with hundreds of links and I need to copy them all and I need a quick and easy way to do it ..
I'm sure I might be able to generate lists through a web editor or something?
please pretty please? :)
( look at the bottom for the excel help I need ) |
|
|
jdat |
humm regex editting with what?
I know what regex is but never done anything involving massive regex editting and crap
and no this isn't a list on a webpage it's webpages with various pictures text etc ... like real pages you know :p
this pisses me off
I can open the webpages in something like Rapid php and it displays all the hyperlinks in a code explorer window but there's no way to copy the whole list .... bah I'm gonna go bootleg and copy screen and ocr the stuff :wtf: |
|
|
Akridrot |
Major regex editing? DUDE, it's really not that that hard. Maybe if you showed me the page?
And if they are ALL hyperlinks, it's safe to assume that they'd all be in the same kind of tags: a href
So then you'd do a regex in the source to copy all text between a href and /a, regardless of what it is and output it to a file.
edit: Btw, I'd like to know what OCR stuff you plan on using. |
|
|
jdat |
well what the hell do I use to do this <[^<>]> ?
Sounds lovely |
|
|
Akridrot |
quote: | Originally posted by jdat
well what the hell do I use to do this <[^<>]> ?
Sounds lovely |
http://notepad-plus.sourceforge.net/uk/site.htm
because you don't have PHP.
I could do this for you in php if you coughed up a link. |
|
|
jdat |
ok how the heck do I run this?
<[^<>]> |
|
|
jdat |
ok I found something that works great!
Little app called Selected links from http://mikos.boom.ru/
works only with iexplorer.
You select the hyperlinks you want to extract and it copies them to the clipboard.
Does just what I need :D
Now I have another problem .... I just noticed I need to compare the pages I am getting all the urls from with a page of already used urls...
Need an automated way to do this :(
Long story short current page of urls looks like this:
A
D
E
G
H
New catches look like
A
B
C
D
F
H
I
I need a way to take B, C, I from the new list and put that in a separate list as these are the links I need to check individually..
bah this is doing my head in
And the letters don't reflect anywhere near the actual amount of links I am working with .... I was on this for 20 minutes and I already got 2500 links :wtf:
of which 800 or so are new and these are the ones I need to put somewhere else...
hummm gonna try finding some spreadsheet formulas as I have already been using that to remove duplicates |
|
|
jdat |
quote: | Originally posted by josh4
you're doing this in excel? are the pages of urls different excel files? |
I'm generating the lists from a website with some external app that extracts all the urls.
Then I paste everything in excel just to cleanup ... it's really just straight forward text with no link name only the hyperlink
I suck at excel formulas ( forgot them all ) and I'm using csved to remove dups ... yeah n00b :( |
|
|
|
|