Home All Groups Group Topic Archive Search About

[VB.NET express 2005] - Regular expressions

Author
14 Jan 2006 11:25 AM
Movie_Maniac®
Hi

I'm writing a little app and I need some html parsing through regular
expressions.

I wrote this piexe of code

Private Function findImgTags(ByVal strHTML As String) As Object
    dim MatchObj As Match = Regex.Match(strHTML, "<img[^>]*>",
RegexOptions.IgnoreCase)
    return MatchObj
End Function

This should return an object containing all the img tags, unfortunately
returns only the first img :(

Plus, I need to parse the whole html and find retain only thos img that
contains the string 'edit', as for example:

<img src="myimage.gif" title="edit" border="0">

How can I manage to do this?

Thanks in advance

Author
14 Jan 2006 11:47 AM
Herfried K. Wagner [MVP]
"Movie Maniac®"
<isotopiPUSSAVIABRUTTOSPI***@PUSSAVIABRUTTOSPIDERhotmail.com> schrieb:
> Private Function findImgTags(ByVal strHTML As String) As Object
>    dim MatchObj As Match = Regex.Match(strHTML, "<img[^>]*>",
> RegexOptions.IgnoreCase)
>    return MatchObj
> End Function
>
> This should return an object containing all the img tags, unfortunately
> returns only the first img :(

Use 'Regex.Matches' instead of 'Regex.Match'.

--
M S   Herfried K. Wagner
M V P  <URL:http://dotnet.mvps.org/>
V B   <URL:http://classicvb.org/petition/>
Author
14 Jan 2006 12:48 PM
Movie_Maniac®
Herfried K. Wagner [MVP] ha scritto:
Show quoteHide quote
> "Movie Maniac®"
> <isotopiPUSSAVIABRUTTOSPI***@PUSSAVIABRUTTOSPIDERhotmail.com> schrieb:
>
>> Private Function findImgTags(ByVal strHTML As String) As Object
>>    dim MatchObj As Match = Regex.Match(strHTML, "<img[^>]*>",
>> RegexOptions.IgnoreCase)
>>    return MatchObj
>> End Function
>>
>> This should return an object containing all the img tags,
>> unfortunately returns only the first img :(
>
>
> Use 'Regex.Matches' instead of 'Regex.Match'.

ok, this should return an array of matches, right?
So this solves the first question.

But, how can I retrieve only those img tag containing the 'edit' string?
I know this is more a matter of reg_exp, but hope someone can help :)

Thanks
Author
14 Jan 2006 7:25 PM
Homer J Simpson
"Movie Maniac®"
<isotopiPUSSAVIABRUTTOSPI***@PUSSAVIABRUTTOSPIDERhotmail.com> wrote in
message news:43c8f248$0$1085$4fafbaef@reader1.news.tin.it...

> But, how can I retrieve only those img tag containing the 'edit' string?
> I know this is more a matter of reg_exp, but hope someone can help :)

Can you post a couple of lines?
Author
15 Jan 2006 12:01 AM
Movie_Maniac®
Homer J Simpson ha scritto:
> "Movie Maniac®"
> <isotopiPUSSAVIABRUTTOSPI***@PUSSAVIABRUTTOSPIDERhotmail.com> wrote in
> message news:43c8f248$0$1085$4fafbaef@reader1.news.tin.it...
>
>
>>But, how can I retrieve only those img tag containing the 'edit' string?
>>I know this is more a matter of reg_exp, but hope someone can help :)
>
>
> Can you post a couple of lines?


a couple of lines of what? Of VB code? of html code?

well, as for VB, here you go:

Private Function findImgTags(ByVal strHTML As String) As Object
  'Returns the img tag we are searching
   Dim Match = Regex.Matches(strHTML, "<img[^>]*>", RegexOptions.IgnoreCase)
   Dim str As Object
   Dim a As String
   For Each str In Match
       a = str.ToString()
       If InStr(a, "edit", CompareMethod.Text) Then Return a
   Next
   Return False
End Function

while if you meant html code, well imagine an html page with a serie of
images, like this:

<img src="image1.gif" title="join">
<img src="image1.gif" title="leave">
<img src="image1.gif" title="add">
<img src="image1.gif" title="edit">
<img src="image1.gif" title="delete">

I would like to retrieve just the last one. As you can see, I retrieve
all the images, and then do a loop on the object to find the right one.
But I know I can avoid this with the proper regular expression.
The question is: how can I do?

Thanks a lot
Author
15 Jan 2006 2:10 AM
Homer J Simpson
"Movie Maniac®"
<isotopiPUSSAVIABRUTTOSPI***@PUSSAVIABRUTTOSPIDERhotmail.com> wrote in
message news:43C990DF.6030109@PUSSAVIABRUTTOSPIDERhotmail.com...

> <img src="image1.gif" title="join">
> <img src="image1.gif" title="leave">
> <img src="image1.gif" title="add">
> <img src="image1.gif" title="edit">
> <img src="image1.gif" title="delete">
>
> I would like to retrieve just the last one. As you can see, I retrieve all
> the images, and then do a loop on the object to find the right one. But I
> know I can avoid this with the proper regular expression.
> The question is: how can I do?

To match

<img src="image1.gif" title="delete">

anywhere in the line you need something like

\<img src=[^>]* title=\"delete\"\>

Can't say this is exactly right but it should be close.