Home All Groups Group Topic Archive Search About
Author
31 Dec 2006 11:10 AM
Just Me
This is a bunch of bull cr*p.  I have tried copying tables out on the web
and there are so many variations that its not feasable to write a single
regex for every situation.

So, I give up.

Author
31 Dec 2006 3:17 PM
rdrunner
Hello...

Please try to keep your related posts together in one thread ;)

And now a sugestion:

Try the HTML-DOM and look at the tags there... They have a property of inner
text, which can be used to extract the text out of any HTML-Node or even the
whole document... Or you can examine all tables or table.row or tabledata
fields and extract the information from there. But scrapping information
from websites is usually quite hard ;)
Author
31 Dec 2006 7:01 PM
Mudhead
This will get all the tables:    Set IgnoreCase and SingleLine options. Use
groups.

<table .*?</table>


Show quoteHide quote
"Just Me" <news.microsoft.com> wrote in message
news:%23iJpLxMLHHA.1008@TK2MSFTNGP06.phx.gbl...
> This is a bunch of bull cr*p.  I have tried copying tables out on the web
> and there are so many variations that its not feasable to write a single
> regex for every situation.
>
> So, I give up.
>
Author
31 Dec 2006 11:26 PM
Hal Rosser
"Just Me" <news.microsoft.com> wrote in message
news:%23iJpLxMLHHA.1008@TK2MSFTNGP06.phx.gbl...
> This is a bunch of bull cr*p.  I have tried copying tables out on the web
> and there are so many variations that its not feasable to write a single
> regex for every situation.
>

Well shux, why don't you just read the file one char at a  time and use use
"if" statements and comparison operators?
It won't be a minimal task, but it won't be that tough, either.
Author
1 Jan 2007 4:28 AM
Mudhead
HTML Parser

http://www.codeproject.com/dotnet/apmilhtml.asp


Show quoteHide quote
"Just Me" <news.microsoft.com> wrote in message
news:%23iJpLxMLHHA.1008@TK2MSFTNGP06.phx.gbl...
> This is a bunch of bull cr*p.  I have tried copying tables out on the web
> and there are so many variations that its not feasable to write a single
> regex for every situation.
>
> So, I give up.
>
Author
1 Jan 2007 6:22 AM
Cor Ligthert [MVP]
Just Me,

Why than using Regex, MSHTML is much easier to get information about
webdocuments. Be aware that a page can exist from more documents (frames)

http://www.vb-tips.com/dbpages.aspx?ID=541adf13-d9c0-435c-893f-56dbb63fdf1c

Be aware that our website is extremely in reconstruction these weeks.

I hope this helps,

Cor

Show quoteHide quote
"Just Me" <news.microsoft.com> schreef in bericht
news:%23iJpLxMLHHA.1008@TK2MSFTNGP06.phx.gbl...
> This is a bunch of bull cr*p.  I have tried copying tables out on the web
> and there are so many variations that its not feasable to write a single
> regex for every situation.
>
> So, I give up.
>