Home All Groups Group Topic Archive Search About

Second Try - Regex Question

Author
30 Dec 2006 7:40 PM
Just Me
I need a regex to do this.


Ignore  < possibleWhiteSpace  htmlTag

Replace   whitespace anything >

With        >

Basically I need to remove anything following the html tag up to and
including the closing tag

Any help is appreciated.

Author
31 Dec 2006 9:14 AM
Spam Catcher
"Just Me" <news.microsoft.com> wrote in news:u1kfspELHHA.3552
@TK2MSFTNGP03.phx.gbl:

> I need a regex to do this.
>
>
> Ignore  < possibleWhiteSpace  htmlTag
>
> Replace   whitespace anything >
>
> With        >
>
> Basically I need to remove anything following the html tag up to and
> including the closing tag
>
> Any help is appreciated.

Hmmm are you trying to do this?

<   TAG      > becomes <   TAG>?

You can try this to match the entire tag (and the parts within the tag):

\<(?<leading>\s)*(?<tag>\w)+(?<trailing>\s)*\>

The regex above uses named groups so that you can references parts of
the matches in code. Take a look at RegEx.Match.Groups for details.

If you want to do pure search, replace, this should work:

RegEx.Replace(MyHTML, "(\s)*\>", ">")

(\s)+ matches zero or more spaces. \> matches the trailing tag.

I hope that's what you want.
Author
31 Dec 2006 10:08 AM
Just Me
parsing ""(\s)*>", ">")" - Too many )'s. // Error from regex.


[\w|\d]*=.*> This almost works, but it leaves out all the text contained by
the element.



Show quoteHide quote
"Spam Catcher" <spamhoneypot@rogers.com> wrote in message
news:Xns98AA2B08745B8usenethoneypotrogers@127.0.0.1...
> "Just Me" <news.microsoft.com> wrote in news:u1kfspELHHA.3552
> @TK2MSFTNGP03.phx.gbl:
>
>> I need a regex to do this.
>>
>>
>> Ignore  < possibleWhiteSpace  htmlTag
>>
>> Replace   whitespace anything >
>>
>> With        >
>>
>> Basically I need to remove anything following the html tag up to and
>> including the closing tag
>>
>> Any help is appreciated.
>
> Hmmm are you trying to do this?
>
> <   TAG      > becomes <   TAG>?
>
> You can try this to match the entire tag (and the parts within the tag):
>
> \<(?<leading>\s)*(?<tag>\w)+(?<trailing>\s)*\>
>
> The regex above uses named groups so that you can references parts of
> the matches in code. Take a look at RegEx.Match.Groups for details.
>
> If you want to do pure search, replace, this should work:
>
> RegEx.Replace(MyHTML, "(\s)*\>", ">")
>
> (\s)+ matches zero or more spaces. \> matches the trailing tag.
>
> I hope that's what you want.
Author
31 Dec 2006 10:31 AM
Just Me
Done it.

[A-Za-z0-9]*=.*?>| [A-Za-z0-9]*=.*?\sp


Show quoteHide quote
"Just Me" <news.microsoft.com> wrote in message
news:OD85cOMLHHA.1240@TK2MSFTNGP03.phx.gbl...
> parsing ""(\s)*>", ">")" - Too many )'s. // Error from regex.
>
>
> [\w|\d]*=.*> This almost works, but it leaves out all the text contained
> by the element.
>
>
>
> "Spam Catcher" <spamhoneypot@rogers.com> wrote in message
> news:Xns98AA2B08745B8usenethoneypotrogers@127.0.0.1...
>> "Just Me" <news.microsoft.com> wrote in news:u1kfspELHHA.3552
>> @TK2MSFTNGP03.phx.gbl:
>>
>>> I need a regex to do this.
>>>
>>>
>>> Ignore  < possibleWhiteSpace  htmlTag
>>>
>>> Replace   whitespace anything >
>>>
>>> With        >
>>>
>>> Basically I need to remove anything following the html tag up to and
>>> including the closing tag
>>>
>>> Any help is appreciated.
>>
>> Hmmm are you trying to do this?
>>
>> <   TAG      > becomes <   TAG>?
>>
>> You can try this to match the entire tag (and the parts within the tag):
>>
>> \<(?<leading>\s)*(?<tag>\w)+(?<trailing>\s)*\>
>>
>> The regex above uses named groups so that you can references parts of
>> the matches in code. Take a look at RegEx.Match.Groups for details.
>>
>> If you want to do pure search, replace, this should work:
>>
>> RegEx.Replace(MyHTML, "(\s)*\>", ">")
>>
>> (\s)+ matches zero or more spaces. \> matches the trailing tag.
>>
>> I hope that's what you want.
>
>