|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Reading XML DocumentsDoes anyone have code or can point to usefull snippets to allow me to
traverse the xml "Elements" of an xmlDocument. What I want to do is to move through the entire document and when I hit <table> <tr> <td> clear the attributes for these Elements and also remove all Elelments NOT nested inside a <table> I'm sure my head is going to explode soon :-( Cheers Now that you're starting to get on the right track, you will find that the
documentation on the XmlDocument class has extensive examples of what you are looking for. Assuming, (and that might be dangerous), that the HTML source that you are dealing with is 'well-formed' in terms of XML, then you can, quite happily, deal with the source as an XmlDocument. If, it is not 'well-formed' then you are back to square one. Show quoteHide quote "Just Me" <news.microsoft.com> wrote in message news:e0TjS8PLHHA.3668@TK2MSFTNGP02.phx.gbl... > Does anyone have code or can point to usefull snippets to allow me to > traverse the xml "Elements" of an xmlDocument. > > What I want to do is to move through the entire document and when I hit > <table> <tr> <td> clear the attributes for these Elements and also > remove all Elelments NOT nested inside a <table> > > I'm sure my head is going to explode soon :-( > > Cheers > Just Me wrote:
> Does anyone have code or can point to usefull snippets to allow me to If you use System.Xml.XmlDocument and SelectNodes then you have a > traverse the xml "Elements" of an xmlDocument. > > What I want to do is to move through the entire document and when I hit > <table> <tr> <td> clear the attributes for these Elements and also remove > all Elelments NOT nested inside a <table> powerful tool to select the nodes you are looking for, then you can use the DOM methods to remove nodes. Example to remove all attributes on table, tr, and td elements is like this Dim XmlDoc As XmlDocument = New XmlDocument XmlDoc.Load("XMLFile1.xml") Console.WriteLine("Initial Document:") XmlDoc.Save(Console.Out) Console.WriteLine() Dim AttributesToRemove As XmlNodeList = _ XmlDoc.SelectNodes("//table/@* | //tr/@* | //td/@*") For I As Integer = AttributesToRemove.Count - 1 To 0 Step -1 Dim Attribute As XmlAttribute = _ CType(AttributesToRemove(I), XmlAttribute) Attribute.OwnerElement.RemoveAttributeNode(Attribute) Next Console.WriteLine("Changed document:") XmlDoc.Save(Console.Out) Example output: Initial Document: <?xml version="1.0" encoding="ibm850"?> <html lang="en"> <head> <title>Example</title> </head> <body> <table border="1" class="some-class" id="t1"> <tbody> <tr class="odd"> <td id="cell1"> </td> </tr> </tbody> </table> </body> </html> Changed document: <?xml version="1.0" encoding="ibm850"?> <html lang="en"> <head> <title>Example</title> </head> <body> <table> <tbody> <tr> <td> </td> </tr> </tbody> </table> </body> </html> An XPath expression to select all elements inside of the document body that are not nested in a table is e.g. /html/body//*[not(ancestor-or-self::table)] Brilliant Martin !
Thanks for this post. I have tried it out and it works a treat. So I can use this. If I may one more question.? The approach I was taking was to read the xmldocument into a stream and instantiate an xmlreader and use the xmlreader.read method to go through the document. The problem I had was that althought I could cycle through the nodes, I couldnt determine how to read the node into an xmlNode from the xmlreader. It doesent seem possible. Any idea what would be the best approach for this ? Many thanks Show quoteHide quote "Martin Honnen" <mahotr***@yahoo.de> wrote in message news:u5gJVFbLHHA.320@TK2MSFTNGP06.phx.gbl... > Just Me wrote: >> Does anyone have code or can point to usefull snippets to allow me to >> traverse the xml "Elements" of an xmlDocument. >> >> What I want to do is to move through the entire document and when I hit >> <table> <tr> <td> clear the attributes for these Elements and also >> remove all Elelments NOT nested inside a <table> > > If you use System.Xml.XmlDocument and SelectNodes then you have a powerful > tool to select the nodes you are looking for, then you can use the DOM > methods to remove nodes. Example to remove all attributes on table, tr, > and td elements is like this > > Dim XmlDoc As XmlDocument = New XmlDocument > XmlDoc.Load("XMLFile1.xml") > Console.WriteLine("Initial Document:") > XmlDoc.Save(Console.Out) > Console.WriteLine() > Dim AttributesToRemove As XmlNodeList = _ > XmlDoc.SelectNodes("//table/@* | //tr/@* | //td/@*") > For I As Integer = AttributesToRemove.Count - 1 To 0 Step -1 > Dim Attribute As XmlAttribute = _ > CType(AttributesToRemove(I), XmlAttribute) > Attribute.OwnerElement.RemoveAttributeNode(Attribute) > Next > > Console.WriteLine("Changed document:") > XmlDoc.Save(Console.Out) > > Example output: > > Initial Document: > <?xml version="1.0" encoding="ibm850"?> > <html lang="en"> > <head> > <title>Example</title> > </head> > <body> > <table border="1" class="some-class" id="t1"> > <tbody> > <tr class="odd"> > <td id="cell1"> > </td> > </tr> > </tbody> > </table> > </body> > </html> > Changed document: > <?xml version="1.0" encoding="ibm850"?> > <html lang="en"> > <head> > <title>Example</title> > </head> > <body> > <table> > <tbody> > <tr> > <td> > </td> > </tr> > </tbody> > </table> > </body> > </html> > > An XPath expression to select all elements inside of the document body > that are not nested in a table is e.g. > /html/body//*[not(ancestor-or-self::table)] > > > -- > > Martin Honnen --- MVP XML > http://JavaScript.FAQTs.com/ Just Me wrote:
> The approach I was taking was to read the xmldocument into a stream and It is not clear what you want to do. If you want to load your complete > instantiate an xmlreader and use the xmlreader.read method to go through the > document. The problem I had was that althought I could cycle through the > nodes, I couldnt determine how to read the node into an xmlNode from the > xmlreader. It doesent seem possible. > > Any idea what would be the best approach for this ? XML document into an System.Xml.XmlDocument instance then simply use the Load method and pass in a file name or URL. There is no need to use an XmlReader explictly. If you have both an XmlDocument instance and an XmlReader and you want to import data from the reader into the document then you can use the ReadNode method <http://msdn2.microsoft.com/en-us/library/system.xml.xmldocument.readnode.aspx> to create a node owned by the XmlDocument instance from the node the reader is positioned on. The XmlNode returned from ReadNode can then be inserted into the XmlDocument instance with e.g. AppendChild or InsertBefore called on the intended parent node. OK, thanks again for your help.
Show quoteHide quote "Martin Honnen" <mahotr***@yahoo.de> wrote in message news:ep5bbVoLHHA.1044@TK2MSFTNGP02.phx.gbl... > Just Me wrote: > >> The approach I was taking was to read the xmldocument into a stream and >> instantiate an xmlreader and use the xmlreader.read method to go through >> the document. The problem I had was that althought I could cycle through >> the nodes, I couldnt determine how to read the node into an xmlNode from >> the xmlreader. It doesent seem possible. >> >> Any idea what would be the best approach for this ? > > It is not clear what you want to do. If you want to load your complete XML > document into an System.Xml.XmlDocument instance then simply use the Load > method and pass in a file name or URL. There is no need to use an > XmlReader explictly. > > If you have both an XmlDocument instance and an XmlReader and you want to > import data from the reader into the document then you can use the > ReadNode method > <http://msdn2.microsoft.com/en-us/library/system.xml.xmldocument.readnode.aspx> > to create a node owned by the XmlDocument instance from the node the > reader is positioned on. The XmlNode returned from ReadNode can then be > inserted into the XmlDocument instance with e.g. AppendChild or > InsertBefore called on the intended parent node. > > > -- > > Martin Honnen --- MVP XML > http://JavaScript.FAQTs.com/ |
|||||||||||||||||||||||