|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
A question about a failing regular expressionHello Everyone,
My application needs to parse some HTML. As is usual in HTML parsing, I just need the data between two HTML tags. So here is my regular expression: Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right"" valign=""bottom"">" & _ "((.|\n)*?)<sup>", RegexOptions.IgnoreCase) Now, this is suppose to get the text between the <td headers tag> and the <sup> tag. But, instead, it returns the entire tag including all of the attributes. What am I doing wrong? Thanks! Sorry, I forgot to add that I am also doing the required myMatch =
myRegex2.Match(sContent) after the expression thereby performing the match against the string sContent. The expression is returning what you have asked for. Maybe not what you are
interested in, but what you have asked for. You need to look at what the author of my favorite reference (Balena) calls "zero width positive/negative look-ahead/behind assertions". These are "grouping constructs". Maybe you could use a "noncapturing group" - I don't think I've used that construct. (I'd like to be more specific but I am at the wrong computer at the moment.) ALSO ... do yourself a favor and get a FREE product named Expresso from Ultrapico. It is WONDERFUL for developing regular expressions. Regular expressions are very useful but not very intuitive. Ask if you have further questions. Good Luck, Bob Show quoteHide quote "Anthony P." <papill***@gmail.com> wrote in message news:b9ff511d-1961-46ca-9b89-1a6853c02257@o36g2000vbi.googlegroups.com... > Hello Everyone, > > My application needs to parse some HTML. As is usual in HTML parsing, > I just need the data between two HTML tags. So here is my regular > expression: > > Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right"" > valign=""bottom"">" & _ > "((.|\n)*?)<sup>", > RegexOptions.IgnoreCase) > > Now, this is suppose to get the text between the <td headers tag> and > the <sup> tag. But, instead, it returns the entire tag including all > of the attributes. What am I doing wrong? > > Thanks! Anthony P. wrote:
<snip> > My application needs to parse some HTML. As is usual in HTML parsing, <snip>> I just need the data between two HTML tags. So here is my regular > expression: > > Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right"" > valign=""bottom"">" & _ > "((.|\n)*?)<sup>", > RegexOptions.IgnoreCase) > > Now, this is suppose to get the text between the <td headers tag> and > the <sup> tag. But, instead, it returns the entire tag including all > of the attributes. What am I doing wrong? You probably figured it out at this point, but it seems you need to retrieve the grouped text from the Match's Groups property (the groups collection is 0 based, but the 0th item is the full matched text, thus you need to retrieve group(1): <example> Dim M As Match = MyRegex2.Match(sContent) Do While M.Success '//// Dim Text As String = M.Groups(1).Value '//// '... 'Do something with Text '... M = M.NextMatch Loop </example> HTH Regards, Branco <snip>
> You probably figured it out at this point, but it seems you need to Hi Branco,> retrieve the grouped text from the Match's Groups property (the groups > collection is 0 based, but the 0th item is the full matched text, thus > you need to retrieve group(1): <snip? No, I hadn't figured it out yet and I thank you for your help. I saw something about the match's groups the other day but it didn't click that was what I needed thank you sir! Anthony
Problem with Windows Server 2003 Standard R2
Variable String Limit form top most when ran from sub main FILE I/O Inheritance? Interface? Creating a list of a class Is this Normal? Error adding a data provider to a datagridview control Autosize the last column in a ListView control using WndProc OT: App respository [2008] Collection of DipSwitches |
|||||||||||||||||||||||