|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
string extractioncommand is terminated with ";". The ";" can also be found within the command but, only enclosed within delimiters (' or ""). Example: INSERT INTO nation (code, name) VALUES(700448768, "za; sdfhsd''"sdfa"); INSERT INTO nation (code, name) VALUES(701464576, 'msd; vasdvas ""hjh"" u'); My question is: what is the best code to extract, one at a time, these commands. The result should be (2 commands): INSERT INTO nation (code, name) VALUES(700448768, "za; sdfhsd'"sdfa"); INSERT INTO nation (code, name) VALUES(701464576, 'msd; vasdvas ""hjh"" u'); I was thinking about regex, but it may be tricky to find the right one. Any ideas? -tom You should take a look at parsing algorithms related to theory
surrounding compilers. There's a common command found in many programming languages called Split or Tokenize that allows you to specify a delimting character, and returns some sort of collection of objects. Something like: Array arrayCommands =Split(stringCommands, ";") And then you would do something like: foreach(String command in arrayCommands) { if(command ends with a ", then there was a quoted ';') { //so we add the quoted ; back in and combine again with the next command which is really part of this command and shouldn't have been split up command = command + ';' + (next command in array) delete next command in array } } Thi is just psuedo code of course. (next command in array) could be found by getting index of current command, adding one, and indexing into the array. You'll need to find out what VB.NET's tokenize or split function is and how it works. I'm sure there is something like that. Hi snozz,
thank you very much for your advice: I will look for these functions. About the logic you kindly suggest I am not clear and I have a question. When I wrote: << The ";" can also be found within the command but, only enclosed within delimiters (' or "") >> I meant something like for instance 1. " ;; some string containing; semicolon; within " not necessarily something like: 2. " ";"';' some string "";"" containing semicolon ... " I have the impression that you are assuming the situation 2 and not 1. Is that so or I am missing something? Another point is that the file can be several Gigs and I need a kind of "buffered" logic. But I guess I could read a bounce of lines at a time. -tom Snozz ha scritto: Show quoteHide quote > You should take a look at parsing algorithms related to theory > surrounding compilers. > > There's a common command found in many programming languages called > Split or Tokenize that allows you to specify a delimting character, and > returns some sort of collection of objects. Something like: > Array arrayCommands =Split(stringCommands, ";") > > And then you would do something like: > > foreach(String command in arrayCommands) > { > if(command ends with a ", then there was a quoted ';') > { > //so we add the quoted ; back in and combine again with the next > command which is really part of this command and shouldn't have been > split up > command = command + ';' + (next command in array) > delete next command in array > } > } > > Thi is just psuedo code of course. (next command in array) could be > found by getting index of current command, adding one, and indexing > into the array. You'll need to find out what VB.NET's tokenize or > split function is and how it works. I'm sure there is something like > that. It might be worth trying using Regex, but your delimiters don't seem to
have any symmetry. In this line, for instance : INSERT INTO nation (code, name) VALUES(700448768, "za; sdfhsd''"sdfa"); there are 3 double quotes, not 4 as one would expect. You seem to be opening with a double quote and closing with a single quote. So, I couldn't get far with constructing a Regex. Hi Cerebrus,
what I mean is that string follow exactly the same rules as in VB.NET or SQL the string "za; sdfhsd''"sdfa" in the command you refer to is ok because the string content: <za; sdfhsd''"sdfa> is meant to be rendered as: <za; sdfhsd''sdfa> that is the double quotes "" that are within the string are rendered as single quotes. Just the same as in VB.NET. You are however right about example 2 it should have been: 2. " "";""';' some string "";"" containing semicolon ... " Yes I have tried often to use regex, but it's complicate to deal even with sImple cases of quotes enclosed within quotes. ------------------------ Put it simply, my question is: how do I extract commands of the type myCommand ; each command ends where a ; (not enclosed in a string) is found. The commands are freely put within a very large file. myCommand can contain internally strings which contain the semicolon char. String can be delimited by either " or ' and can contain internally the delimiter char. In such a case the delimiter is doubled (as in VB.NET, SQL, ...) and will be rendered as a single char. -tom Your best option is to probably use a .indexof methods on the total char
string looking for " and ;. Flags can tell you when to skip the ; inclosed in ""'s -- Show quoteHide quoteDennis in Houston "tommaso.gasta***@uniroma1.it" wrote: > I have a file containing some commands in free format. Each > command is terminated with ";". The ";" can also be found within the > command but, only enclosed within delimiters (' or ""). Example: > > INSERT INTO nation (code, name) VALUES(700448768, > "za; sdfhsd''"sdfa"); > > INSERT INTO nation (code, name) > > VALUES(701464576, 'msd; vasdvas ""hjh"" u'); > > > My question is: what is the best code to extract, one at a time, these > commands. > The result should be (2 commands): > INSERT INTO nation (code, name) VALUES(700448768, "za; sdfhsd'"sdfa"); > INSERT INTO nation (code, name) VALUES(701464576, 'msd; vasdvas ""hjh"" > u'); > I was thinking about regex, but it may be tricky to find the right one. > Any ideas? > > -tom > > Thanks Dennis,
Actually I am not completely persuaded it can be done that way in general as you could have something like : .... (" my preferred keywords ""work; work ; work"" ") ; mmm I am afraid that all chars must be parsed so that one could put flags to distinguish when an ; occurs within string delimiters and when, instead is a command separator .... -tom Dennis ha scritto: Show quoteHide quote > Your best option is to probably use a .indexof methods on the total char > string looking for " and ;. Flags can tell you when to skip the ; inclosed > in ""'s > -- > Dennis in Houston > > > "tommaso.gasta***@uniroma1.it" wrote: > > > I have a file containing some commands in free format. Each > > command is terminated with ";". The ";" can also be found within the > > command but, only enclosed within delimiters (' or ""). Example: > > > > INSERT INTO nation (code, name) VALUES(700448768, > > "za; sdfhsd''"sdfa"); > > > > INSERT INTO nation (code, name) > > > > VALUES(701464576, 'msd; vasdvas ""hjh"" u'); > > > > > > My question is: what is the best code to extract, one at a time, these > > commands. > > The result should be (2 commands): > > INSERT INTO nation (code, name) VALUES(700448768, "za; sdfhsd'"sdfa"); > > INSERT INTO nation (code, name) VALUES(701464576, 'msd; vasdvas ""hjh"" > > u'); > > I was thinking about regex, but it may be tricky to find the right one. > > Any ideas? > > > > -tom > > > > Since compilers already deal with this quite efficiently, then you
really will found solid practical algorithms if you look at some of the theory that addresses programming languages, syntax, and parsing. Might want to try a search for "recursive decent parser" I think your type of parsing fallss under "lexical analysis" although it might be "syntax analysis"
VB.NET INI or XML file for path locations
catching a specific exception Q: GIF image on a form Detect if compiling as a console application updating control on form2 from form1 Get calling function in a function? VB.NET: Implementing RasGetErrorString Send Image in .NET Remoting Sorting a collection of collections? Copying a project into another solution |
|||||||||||||||||||||||