Home All Groups Group Topic Archive Search About

Reading XML Documents

Author
31 Dec 2006 5:13 PM
Just Me
Does anyone have code or can point to usefull snippets to allow me to
traverse the xml "Elements" of an xmlDocument.

What I want to do is to move through the entire document and when I hit
<table>  <tr>  <td> clear the attributes for these Elements and also remove
all Elelments NOT nested inside a <table>

I'm sure my head is going to explode soon :-(

Cheers

Author
31 Dec 2006 7:43 PM
Stephany Young
Now that you're starting to get on the right track, you will find that the
documentation on the XmlDocument class has extensive examples of what you
are looking for.

Assuming, (and that might be dangerous), that the HTML source that you are
dealing with is 'well-formed' in terms of XML, then you can, quite happily,
deal with the source as an XmlDocument.

If, it is not 'well-formed' then you are back to square one.


Show quoteHide quote
"Just Me" <news.microsoft.com> wrote in message
news:e0TjS8PLHHA.3668@TK2MSFTNGP02.phx.gbl...
> Does anyone have code or can point to usefull snippets to allow me to
> traverse the xml "Elements" of an xmlDocument.
>
> What I want to do is to move through the entire document and when I hit
> <table>  <tr>  <td> clear the attributes for these Elements and also
> remove all Elelments NOT nested inside a <table>
>
> I'm sure my head is going to explode soon :-(
>
> Cheers
>
Author
1 Jan 2007 2:30 PM
Martin Honnen
Just Me wrote:
> Does anyone have code or can point to usefull snippets to allow me to
> traverse the xml "Elements" of an xmlDocument.
>
> What I want to do is to move through the entire document and when I hit
> <table>  <tr>  <td> clear the attributes for these Elements and also remove
> all Elelments NOT nested inside a <table>

If you use System.Xml.XmlDocument and SelectNodes then you have a
powerful tool to select the nodes you are looking for, then you can use
the DOM methods to remove nodes. Example to remove all attributes on
table, tr, and td elements is like this

         Dim XmlDoc As XmlDocument = New XmlDocument
         XmlDoc.Load("XMLFile1.xml")
         Console.WriteLine("Initial Document:")
         XmlDoc.Save(Console.Out)
         Console.WriteLine()
         Dim AttributesToRemove As XmlNodeList = _
           XmlDoc.SelectNodes("//table/@* | //tr/@* | //td/@*")
         For I As Integer = AttributesToRemove.Count - 1 To 0 Step -1
             Dim Attribute As XmlAttribute = _
               CType(AttributesToRemove(I), XmlAttribute)
             Attribute.OwnerElement.RemoveAttributeNode(Attribute)
         Next

         Console.WriteLine("Changed document:")
         XmlDoc.Save(Console.Out)

Example output:

Initial Document:
<?xml version="1.0" encoding="ibm850"?>
<html lang="en">
   <head>
     <title>Example</title>
   </head>
   <body>
     <table border="1" class="some-class" id="t1">
       <tbody>
         <tr class="odd">
           <td id="cell1">
           </td>
         </tr>
       </tbody>
     </table>
   </body>
</html>
Changed document:
<?xml version="1.0" encoding="ibm850"?>
<html lang="en">
   <head>
     <title>Example</title>
   </head>
   <body>
     <table>
       <tbody>
         <tr>
           <td>
           </td>
         </tr>
       </tbody>
     </table>
   </body>
</html>

An XPath expression to select all elements inside of the document body
that are not nested in a table is e.g.
   /html/body//*[not(ancestor-or-self::table)]


--

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/
Author
1 Jan 2007 7:06 PM
Just Me
Brilliant Martin !

Thanks for this post. I have tried it out and it works a treat. So I can use
this.

If I may one more question.?

The approach I was taking was to read the xmldocument into a stream and
instantiate an xmlreader and use the xmlreader.read method to go through the
document. The problem I had was that althought I could cycle through the
nodes, I couldnt determine how to read the node into an xmlNode from the
xmlreader. It doesent seem possible.

Any idea what would be the best approach for this ?


Many thanks






Show quoteHide quote
"Martin Honnen" <mahotr***@yahoo.de> wrote in message
news:u5gJVFbLHHA.320@TK2MSFTNGP06.phx.gbl...
> Just Me wrote:
>> Does anyone have code or can point to usefull snippets to allow me to
>> traverse the xml "Elements" of an xmlDocument.
>>
>> What I want to do is to move through the entire document and when I hit
>> <table>  <tr>  <td> clear the attributes for these Elements and also
>> remove all Elelments NOT nested inside a <table>
>
> If you use System.Xml.XmlDocument and SelectNodes then you have a powerful
> tool to select the nodes you are looking for, then you can use the DOM
> methods to remove nodes. Example to remove all attributes on table, tr,
> and td elements is like this
>
>         Dim XmlDoc As XmlDocument = New XmlDocument
>         XmlDoc.Load("XMLFile1.xml")
>         Console.WriteLine("Initial Document:")
>         XmlDoc.Save(Console.Out)
>         Console.WriteLine()
>         Dim AttributesToRemove As XmlNodeList = _
>           XmlDoc.SelectNodes("//table/@* | //tr/@* | //td/@*")
>         For I As Integer = AttributesToRemove.Count - 1 To 0 Step -1
>             Dim Attribute As XmlAttribute = _
>               CType(AttributesToRemove(I), XmlAttribute)
>             Attribute.OwnerElement.RemoveAttributeNode(Attribute)
>         Next
>
>         Console.WriteLine("Changed document:")
>         XmlDoc.Save(Console.Out)
>
> Example output:
>
> Initial Document:
> <?xml version="1.0" encoding="ibm850"?>
> <html lang="en">
>   <head>
>     <title>Example</title>
>   </head>
>   <body>
>     <table border="1" class="some-class" id="t1">
>       <tbody>
>         <tr class="odd">
>           <td id="cell1">
>           </td>
>         </tr>
>       </tbody>
>     </table>
>   </body>
> </html>
> Changed document:
> <?xml version="1.0" encoding="ibm850"?>
> <html lang="en">
>   <head>
>     <title>Example</title>
>   </head>
>   <body>
>     <table>
>       <tbody>
>         <tr>
>           <td>
>           </td>
>         </tr>
>       </tbody>
>     </table>
>   </body>
> </html>
>
> An XPath expression to select all elements inside of the document body
> that are not nested in a table is e.g.
>   /html/body//*[not(ancestor-or-self::table)]
>
>
> --
>
> Martin Honnen --- MVP XML
> http://JavaScript.FAQTs.com/
Author
2 Jan 2007 3:47 PM
Martin Honnen
Just Me wrote:

> The approach I was taking was to read the xmldocument into a stream and
> instantiate an xmlreader and use the xmlreader.read method to go through the
> document. The problem I had was that althought I could cycle through the
> nodes, I couldnt determine how to read the node into an xmlNode from the
> xmlreader. It doesent seem possible.
>
> Any idea what would be the best approach for this ?

It is not clear what you want to do. If you want to load your complete
XML document into an System.Xml.XmlDocument instance then simply use the
Load method and pass in a file name or URL. There is no need to use an
XmlReader explictly.

If you have both an XmlDocument instance and an XmlReader and you want
to import data from the reader into the document then you can use the
ReadNode method
<http://msdn2.microsoft.com/en-us/library/system.xml.xmldocument.readnode.aspx>
to create a node owned by the XmlDocument instance from the node the
reader is positioned on. The XmlNode returned from ReadNode can then be
inserted into the XmlDocument instance with e.g. AppendChild or
InsertBefore called on the intended parent node.


--

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/
Author
2 Jan 2007 5:44 PM
Just Me
OK, thanks again for your help.




Show quoteHide quote
"Martin Honnen" <mahotr***@yahoo.de> wrote in message
news:ep5bbVoLHHA.1044@TK2MSFTNGP02.phx.gbl...
> Just Me wrote:
>
>> The approach I was taking was to read the xmldocument into a stream and
>> instantiate an xmlreader and use the xmlreader.read method to go through
>> the document. The problem I had was that althought I could cycle through
>> the nodes, I couldnt determine how to read the node into an xmlNode from
>> the xmlreader. It doesent seem possible.
>>
>> Any idea what would be the best approach for this ?
>
> It is not clear what you want to do. If you want to load your complete XML
> document into an System.Xml.XmlDocument instance then simply use the Load
> method and pass in a file name or URL. There is no need to use an
> XmlReader explictly.
>
> If you have both an XmlDocument instance and an XmlReader and you want to
> import data from the reader into the document then you can use the
> ReadNode method
> <http://msdn2.microsoft.com/en-us/library/system.xml.xmldocument.readnode.aspx>
> to create a node owned by the XmlDocument instance from the node the
> reader is positioned on. The XmlNode returned from ReadNode can then be
> inserted into the XmlDocument instance with e.g. AppendChild or
> InsertBefore called on the intended parent node.
>
>
> --
>
> Martin Honnen --- MVP XML
> http://JavaScript.FAQTs.com/