Home All Groups Group Topic Archive Search About

[Regular Expression] extraction when bounds are vbCr and vbLf

Author
5 Sep 2006 4:52 PM
teo
Hallo

I need to extract a subtext from a text.
The subtext must contain a given word.

The subtext bounds are:

vbCr     (return)
vbLf     (new line)
vbCrLf   (return+new line)
the very beginning of the text
the very ending of the text


I tried with:

^
\n
\r
$

so to have:

Dim myText As String
Dim myPattern As String = "^\n\r" & myWord & "\n\r$"

Dim match As Match = Regex.Match(myText, myPattern, RegexOptions.Multiline
Or RegexOptions.IgnoreCase)

but I had problems.

Author
5 Sep 2006 6:04 PM
Chris
Try this, where Text is the subtext

Dim FoundMatch As Boolean
Try
FoundMatch = Regex.IsMatch(SubjectString, "^$Text$",
RegexOptions.Multiline)
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

HTH

Chris

Show quoteHide quote
"teo" <t**@inwind.it> wrote in message
news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
> Hallo
>
> I need to extract a subtext from a text.
> The subtext must contain a given word.
>
> The subtext bounds are:
>
> vbCr     (return)
> vbLf     (new line)
> vbCrLf   (return+new line)
> the very beginning of the text
> the very ending of the text
>
>
> I tried with:
>
> ^
> \n
> \r
> $
>
> so to have:
>
> Dim myText As String
> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>
> Dim match As Match = Regex.Match(myText, myPattern, RegexOptions.Multiline
> Or RegexOptions.IgnoreCase)
>
> but I had problems.
Author
5 Sep 2006 7:05 PM
Chris
Hi Teo,

Just to clarify, are you trying to find all the lines in a given file that
contain a particular word?

What does your data look like, are these strictly text files? Can you give
me an example that I can test on. Where ever there is a VbLf, VbCr, or
VbCrLf you can just make note of it

This is some text VbCrLf
that I want to test VbCrLf
against.

Regulazy or the Regulator by Roy Osherove might help as well.
http://tools.osherove.com/

Chris

Show quoteHide quote
"teo" <t**@inwind.it> wrote in message
news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
> Hallo
>
> I need to extract a subtext from a text.
> The subtext must contain a given word.
>
> The subtext bounds are:
>
> vbCr     (return)
> vbLf     (new line)
> vbCrLf   (return+new line)
> the very beginning of the text
> the very ending of the text
>
>
> I tried with:
>
> ^
> \n
> \r
> $
>
> so to have:
>
> Dim myText As String
> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>
> Dim match As Match = Regex.Match(myText, myPattern, RegexOptions.Multiline
> Or RegexOptions.IgnoreCase)
>
> but I had problems.
Author
5 Sep 2006 10:31 PM
teo
I uploaded a zip file (2 Kb) that contains a .rtf file
with the explanation and a sample,
here:
http://www.zshare.net/download/regexsmp-zip.html
(no java required)



Show quoteHide quote
>Hi Teo,
>
>Just to clarify, are you trying to find all the lines in a given file that
>contain a particular word?
>
>What does your data look like, are these strictly text files? Can you give
>me an example that I can test on. Where ever there is a VbLf, VbCr, or
>VbCrLf you can just make note of it
>
>This is some text VbCrLf
>that I want to test VbCrLf
>against.
>
>Regulazy or the Regulator by Roy Osherove might help as well.
>http://tools.osherove.com/
>
>Chris
>
>"teo" <t**@inwind.it> wrote in message
>news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
>> Hallo
>> I need to extract a subtext from a text.
>> The subtext must contain a given word.
>>
>> The subtext bounds are:
>>
>> vbCr     (return)
>> vbLf     (new line)
>> vbCrLf   (return+new line)
>> the very beginning of the text
>> the very ending of the text
>>
>>
>> I tried with:
>>
>> ^
>> \n
>> \r
>> $
>>
>> so to have:
>>
>> Dim myText As String
>> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>>
>> Dim match As Match = Regex.Match(myText, myPattern, RegexOptions.Multiline
>> Or RegexOptions.IgnoreCase)
>>
>> but I had problems.
>
Author
6 Sep 2006 1:23 AM
Chris
Hi Teo,

Thanks for putting that up there. It helped nicely.


Try the following code:

Imports System.Text.RegularExpressions
Imports System.Windows.Forms
Imports System.IO
Public Module Module1

    Public Sub main()
        Dim fileName As String = InputBox("Give me the file to parse", _
                                          "File name input box")
        CheckContents(fileName)

    End Sub

    ''' <summary>
    ''' Check the contents of a file
    ''' </summary>
    ''' <param name="Filename"></param>
    ''' <remarks>Could be expanded to check against multiple
    ''' keywords by adding another argument that contains the
    ''' keyword and inserting it in place of the DIO characters</remarks>
    Public Sub CheckContents(ByVal Filename As String)

        'Declare RegExp
        Dim dioRegex As New Regex(".*DIO.*(\n|\r|\r\n)",
RegexOptions.IgnoreCase)

        'Make sure the file is really there
        Dim fileExists As Boolean
        fileExists = My.Computer.FileSystem.FileExists(Filename)

        'Throw exception if the file is not there
        If Not fileExists Then Throw New FileNotFoundException

        'Get the contents of the file
        Dim fileContents As String
        fileContents = My.Computer.FileSystem.ReadAllText(Filename)

        'Check File Contents Against Regex
        Dim dioMatches As MatchCollection = dioRegex.Matches(fileContents)

        'Loop though all of the matches and do something cool with them
        For Each dioMatch As Match In dioMatches

            'Your cool code goes here :o)

            'I'm just going to print the results to a messagebox
            MsgBox(dioMatch.Value)

        Next

    End Sub
End Module


Please keep in mind that some of the RTF formatting characters are left. I
didn't know if you wanted them left in, but you should be able to easily
strip out the /p and other character combinations using Str.Replace(oldChar,
newChar) where Str is the your data.

Best regards,

Chris


Show quoteHide quote
"teo" <t**@inwind.it> wrote in message
news:ljurf21nv19sgmncmaca6i5fobrot9pv4r@4ax.com...
>I uploaded a zip file (2 Kb) that contains a .rtf file
> with the explanation and a sample,
> here:
> http://www.zshare.net/download/regexsmp-zip.html
> (no java required)
>
>
>
>>Hi Teo,
>>
>>Just to clarify, are you trying to find all the lines in a given file that
>>contain a particular word?
>>
>>What does your data look like, are these strictly text files? Can you give
>>me an example that I can test on. Where ever there is a VbLf, VbCr, or
>>VbCrLf you can just make note of it
>>
>>This is some text VbCrLf
>>that I want to test VbCrLf
>>against.
>>
>>Regulazy or the Regulator by Roy Osherove might help as well.
>>http://tools.osherove.com/
>>
>>Chris
>>
>>"teo" <t**@inwind.it> wrote in message
>>news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
>>> Hallo
>>> I need to extract a subtext from a text.
>>> The subtext must contain a given word.
>>>
>>> The subtext bounds are:
>>>
>>> vbCr     (return)
>>> vbLf     (new line)
>>> vbCrLf   (return+new line)
>>> the very beginning of the text
>>> the very ending of the text
>>>
>>>
>>> I tried with:
>>>
>>> ^
>>> \n
>>> \r
>>> $
>>>
>>> so to have:
>>>
>>> Dim myText As String
>>> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>>>
>>> Dim match As Match = Regex.Match(myText, myPattern,
>>> RegexOptions.Multiline
>>> Or RegexOptions.IgnoreCase)
>>>
>>> but I had problems.
>>
>
Author
6 Sep 2006 2:33 PM
teo
I made few tests and I faced one problem:

the last sentence is never matched

(that is
if the word is in the last sentence
I'm not able to extract the sentence;
while if it is in the first sentence, it is all OK...)




Show quoteHide quote
>Hi Teo,
>
>Thanks for putting that up there. It helped nicely.
>
>
>Try the following code:
>
>Imports System.Text.RegularExpressions
>Imports System.Windows.Forms
>Imports System.IO
>Public Module Module1
>
>    Public Sub main()
>        Dim fileName As String = InputBox("Give me the file to parse", _
>                                          "File name input box")
>        CheckContents(fileName)
>
>    End Sub
>
>    ''' <summary>
>    ''' Check the contents of a file
>    ''' </summary>
>    ''' <param name="Filename"></param>
>    ''' <remarks>Could be expanded to check against multiple
>    ''' keywords by adding another argument that contains the
>    ''' keyword and inserting it in place of the DIO characters</remarks>
>    Public Sub CheckContents(ByVal Filename As String)
>
>        'Declare RegExp
>        Dim dioRegex As New Regex(".*DIO.*(\n|\r|\r\n)",
>RegexOptions.IgnoreCase)
>
>        'Make sure the file is really there
>        Dim fileExists As Boolean
>        fileExists = My.Computer.FileSystem.FileExists(Filename)
>
>        'Throw exception if the file is not there
>        If Not fileExists Then Throw New FileNotFoundException
>
>        'Get the contents of the file
>        Dim fileContents As String
>        fileContents = My.Computer.FileSystem.ReadAllText(Filename)
>
>        'Check File Contents Against Regex
>        Dim dioMatches As MatchCollection = dioRegex.Matches(fileContents)
>
>        'Loop though all of the matches and do something cool with them
>        For Each dioMatch As Match In dioMatches
>
>            'Your cool code goes here :o)
>
>            'I'm just going to print the results to a messagebox
>            MsgBox(dioMatch.Value)
>
>        Next
>
>    End Sub
>End Module
>
>
>Please keep in mind that some of the RTF formatting characters are left. I
>didn't know if you wanted them left in, but you should be able to easily
>strip out the /p and other character combinations using Str.Replace(oldChar,
>newChar) where Str is the your data.
>
>Best regards,
>
>Chris
>
>
>"teo" <t**@inwind.it> wrote in message
>news:ljurf21nv19sgmncmaca6i5fobrot9pv4r@4ax.com...
>>I uploaded a zip file (2 Kb) that contains a .rtf file
>> with the explanation and a sample,
>> here:
>> http://www.zshare.net/download/regexsmp-zip.html
>> (no java required)
>>
>>
>>
>>>Hi Teo,
>>>
>>>Just to clarify, are you trying to find all the lines in a given file that
>>>contain a particular word?
>>>
>>>What does your data look like, are these strictly text files? Can you give
>>>me an example that I can test on. Where ever there is a VbLf, VbCr, or
>>>VbCrLf you can just make note of it
>>>
>>>This is some text VbCrLf
>>>that I want to test VbCrLf
>>>against.
>>>
>>>Regulazy or the Regulator by Roy Osherove might help as well.
>>>http://tools.osherove.com/
>>>
>>>Chris
>>>
>>>"teo" <t**@inwind.it> wrote in message
>>>news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
>>>> Hallo
>>>> I need to extract a subtext from a text.
>>>> The subtext must contain a given word.
>>>>
>>>> The subtext bounds are:
>>>>
>>>> vbCr     (return)
>>>> vbLf     (new line)
>>>> vbCrLf   (return+new line)
>>>> the very beginning of the text
>>>> the very ending of the text
>>>>
>>>>
>>>> I tried with:
>>>>
>>>> ^
>>>> \n
>>>> \r
>>>> $
>>>>
>>>> so to have:
>>>>
>>>> Dim myText As String
>>>> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>>>>
>>>> Dim match As Match = Regex.Match(myText, myPattern,
>>>> RegexOptions.Multiline
>>>> Or RegexOptions.IgnoreCase)
>>>>
>>>> but I had problems.
>>>
>>
>
Author
6 Sep 2006 7:29 PM
Chris
Hi Teo,

I missed the case if there is not a line feed, carriage return or
combination.

Try replacing the dioRegex, in the CheckContents sub, with the following:

        Dim dioRegex As New Regex(".*DIO.*((\n|\r|\r\n)|.*)",
RegexOptions.IgnoreCase)

Hope that helps,

Chris


Show quoteHide quote
"teo" <t**@inwind.it> wrote in message
news:t1ntf2divnv4c9s6r3slb95nnirin3aq1p@4ax.com...
>I made few tests and I faced one problem:
>
> the last sentence is never matched
>
> (that is
> if the word is in the last sentence
> I'm not able to extract the sentence;
> while if it is in the first sentence, it is all OK...)
>
>
>
>
>>Hi Teo,
>>
>>Thanks for putting that up there. It helped nicely.
>>
>>
>>Try the following code:
>>
>>Imports System.Text.RegularExpressions
>>Imports System.Windows.Forms
>>Imports System.IO
>>Public Module Module1
>>
>>    Public Sub main()
>>        Dim fileName As String = InputBox("Give me the file to parse", _
>>                                          "File name input box")
>>        CheckContents(fileName)
>>
>>    End Sub
>>
>>    ''' <summary>
>>    ''' Check the contents of a file
>>    ''' </summary>
>>    ''' <param name="Filename"></param>
>>    ''' <remarks>Could be expanded to check against multiple
>>    ''' keywords by adding another argument that contains the
>>    ''' keyword and inserting it in place of the DIO characters</remarks>
>>    Public Sub CheckContents(ByVal Filename As String)
>>
>>        'Declare RegExp
>>        Dim dioRegex As New Regex(".*DIO.*(\n|\r|\r\n)",
>>RegexOptions.IgnoreCase)
>>
>>        'Make sure the file is really there
>>        Dim fileExists As Boolean
>>        fileExists = My.Computer.FileSystem.FileExists(Filename)
>>
>>        'Throw exception if the file is not there
>>        If Not fileExists Then Throw New FileNotFoundException
>>
>>        'Get the contents of the file
>>        Dim fileContents As String
>>        fileContents = My.Computer.FileSystem.ReadAllText(Filename)
>>
>>        'Check File Contents Against Regex
>>        Dim dioMatches As MatchCollection = dioRegex.Matches(fileContents)
>>
>>        'Loop though all of the matches and do something cool with them
>>        For Each dioMatch As Match In dioMatches
>>
>>            'Your cool code goes here :o)
>>
>>            'I'm just going to print the results to a messagebox
>>            MsgBox(dioMatch.Value)
>>
>>        Next
>>
>>    End Sub
>>End Module
>>
>>
>>Please keep in mind that some of the RTF formatting characters are left. I
>>didn't know if you wanted them left in, but you should be able to easily
>>strip out the /p and other character combinations using
>>Str.Replace(oldChar,
>>newChar) where Str is the your data.
>>
>>Best regards,
>>
>>Chris
>>
>>
>>"teo" <t**@inwind.it> wrote in message
>>news:ljurf21nv19sgmncmaca6i5fobrot9pv4r@4ax.com...
>>>I uploaded a zip file (2 Kb) that contains a .rtf file
>>> with the explanation and a sample,
>>> here:
>>> http://www.zshare.net/download/regexsmp-zip.html
>>> (no java required)
>>>
>>>
>>>
>>>>Hi Teo,
>>>>
>>>>Just to clarify, are you trying to find all the lines in a given file
>>>>that
>>>>contain a particular word?
>>>>
>>>>What does your data look like, are these strictly text files? Can you
>>>>give
>>>>me an example that I can test on. Where ever there is a VbLf, VbCr, or
>>>>VbCrLf you can just make note of it
>>>>
>>>>This is some text VbCrLf
>>>>that I want to test VbCrLf
>>>>against.
>>>>
>>>>Regulazy or the Regulator by Roy Osherove might help as well.
>>>>http://tools.osherove.com/
>>>>
>>>>Chris
>>>>
>>>>"teo" <t**@inwind.it> wrote in message
>>>>news:gparf293biklfnpte16i7707ghi3n83v8i@4ax.com...
>>>>> Hallo
>>>>> I need to extract a subtext from a text.
>>>>> The subtext must contain a given word.
>>>>>
>>>>> The subtext bounds are:
>>>>>
>>>>> vbCr     (return)
>>>>> vbLf     (new line)
>>>>> vbCrLf   (return+new line)
>>>>> the very beginning of the text
>>>>> the very ending of the text
>>>>>
>>>>>
>>>>> I tried with:
>>>>>
>>>>> ^
>>>>> \n
>>>>> \r
>>>>> $
>>>>>
>>>>> so to have:
>>>>>
>>>>> Dim myText As String
>>>>> Dim myPattern As String = "^\n\r" & myWord & "\n\r$"
>>>>>
>>>>> Dim match As Match = Regex.Match(myText, myPattern,
>>>>> RegexOptions.Multiline
>>>>> Or RegexOptions.IgnoreCase)
>>>>>
>>>>> but I had problems.
>>>>
>>>
>>
>
Author
8 Sep 2006 1:11 PM
teo
It seems all ok,
thanks.