Home All Groups Group Topic Archive Search About
Author
27 May 2006 2:11 PM
graphicsxp
Hi,

I have to update a text field of a SQL table. For that I retrieve the
records in a dataset and I loop through the rows of the dataset. The
field is a very long text string that basically contains the source
code of html pages. I need to do the following on each record :

1. Find the <body> tag in the string. Problem is that the body tag
could look like:
<body onload="some func" ....> so I need to find <body and then the
next closing '<'.

2.Find the </body> tag in the string.

3. get rid of everything outside the <body></body> tags in order to
retain only the content of the html page.

How could I do that in vb.net ?

Author
27 May 2006 3:01 PM
IdleBrain
Hai,
You should be able to accomplish your task by using code like:
intbodytagstart = String.IndexOf("<body", intsearchstartlocation)
intbodytagend = string.indexOf(">", intbodytagStart)
intbodyend = string.indexOf("</body>", intbodytagend)
strbody = string.Substring(intbodytagend+1, intbodyend - intbodystart
-1)

Worth giving a try.
Author
27 May 2006 4:27 PM
graphicsxp
>Hai,
>You should be able to accomplish your task by using code like:
>intbodytagstart = String.IndexOf("<body", intsearchstartlocation)
>intbodytagend = string.indexOf(">", intbodytagStart)
>intbodyend = string.indexOf("</body>", intbodytagend)
>strbody = string.Substring(intbodytagend+1, intbodyend - intbodystart
>-1)

>Worth giving a try.


Brilliant ! It works perfectely. Thank you !
Author
27 May 2006 5:47 PM
Martin Milan
Show quote Hide quote
"graphicsxp" <samuelberthe***@googlemail.com> wrote in
news:1148739085.071911.155100@y43g2000cwc.googlegroups.com:

> Hi,
>
> I have to update a text field of a SQL table. For that I retrieve the
> records in a dataset and I loop through the rows of the dataset. The
> field is a very long text string that basically contains the source
> code of html pages. I need to do the following on each record :
>
> 1. Find the <body> tag in the string. Problem is that the body tag
> could look like:
> <body onload="some func" ....> so I need to find <body and then the
> next closing '<'.
>
> 2.Find the </body> tag in the string.
>
> 3. get rid of everything outside the <body></body> tags in order to
> retain only the content of the html page.
>
> How could I do that in vb.net ?
>

Could you throw it at an xml parser?