|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
HTML linksI am trying to extract all the links and the image URLs from an HTML file. I
tried to read byte - byte the information in order to detect the URLs but it did not work because some JavaScript or other information in the HTML file caused problems. Is there any class which works ok with this kind of data extraction? Aristotelis Hello Aristotelis,
> I am trying to extract all the links and the image URLs from an HTML You could load the page into System.Windows.Forms.HtmlDocument. It has a > file. I tried to read byte - byte the information in order to detect > the URLs but it did not work because some JavaScript or other > information in the HTML file caused problems. Is there any class which > works ok with this kind of data extraction? GetElementsByTagName method that you can use to get all of the links. -- Jared Parsons [MSFT] jared***@online.microsoft.com All opinions are my own. All content is provided "AS IS" with no warranties, and confers no rights. I tried it but the System.Windows.Forms.HtmlDocument object does not have a
constructor. How can I set the URL of the page in order to collect the various information? Aristotelis Show quoteHide quote ? "Jared Parsons [MSFT]" <jared***@online.microsoft.com> ?????? ??? ?????? news:61f143b3fc88c86c94c088c47e@msnews.microsoft.com... > > Hello Aristotelis, > >> I am trying to extract all the links and the image URLs from an HTML >> file. I tried to read byte - byte the information in order to detect >> the URLs but it did not work because some JavaScript or other >> information in the HTML file caused problems. Is there any class which >> works ok with this kind of data extraction? > > You could load the page into System.Windows.Forms.HtmlDocument. It has a > GetElementsByTagName method that you can use to get all of the links. > > -- > Jared Parsons [MSFT] > jared***@online.microsoft.com > All opinions are my own. All content is provided "AS IS" with no > warranties, and confers no rights. > > Hello Aristotelis,
> I tried it but the System.Windows.Forms.HtmlDocument object does not It looks like you'll have to create an instance of the WebBrowser control. > have a constructor. How can I set the URL of the page in order to > collect the various information? That will give you access to the underlying HtmlDocument which you can then query. -- Jared Parsons [MSFT] jared***@online.microsoft.com All opinions are my own. All content is provided "AS IS" with no warranties, and confers no rights. I think that there will be a problem with the javascripts. If I load a page
which contains a Javascript Alert message box, this will have as a result to stop the whole prosess, and the user will see this window on the screen. Is there a way to disable the javascript execution for a WebBrowser control? Aristotelis Show quoteHide quote ? "Jared Parsons [MSFT]" <jared***@online.microsoft.com> ?????? ??? ?????? news:61f143b41448c86cb9fc878fae@msnews.microsoft.com... > > Hello Aristotelis, > >> I tried it but the System.Windows.Forms.HtmlDocument object does not >> have a constructor. How can I set the URL of the page in order to >> collect the various information? > > It looks like you'll have to create an instance of the WebBrowser control. > That will give you access to the underlying HtmlDocument which you can > then query. > -- > Jared Parsons [MSFT] > jared***@online.microsoft.com > All opinions are my own. All content is provided "AS IS" with no > warranties, and confers no rights. > >
http://www.regular-expressions.net/examples.html
This has a great tutorial about grabbing html tags. -Allen Aristotolis,
They (we and others) use those JavaScript to prevent things as spamming, including me. You want that we give you a method to overcome that and put that on this board. Even if I did know it than was the answer. No way. :-) Cor"Aristotelis Pitaridis" <pitari***@hotmail.com> schreef in bericht news:1151923589.806196@athnrd02...Show quoteHide quote >I am trying to extract all the links and the image URLs from an HTML file. >I tried to read byte - byte the information in order to detect the URLs but >it did not work because some JavaScript or other information in the HTML >file caused problems. Is there any class which works ok with this kind of >data extraction? > > > > Aristotelis > > "Aristotelis Pitaridis" <pitari***@hotmail.com> schrieb: I suggest to use an HTML parser instead of regular expressions for this >I am trying to extract all the links and the image URLs from an HTML file. >I tried to read byte - byte the information in order to detect the URLs but >it did not work because some JavaScript or other information in the HTML >file caused problems. Is there any class which works ok with this kind of >data extraction? purpose: Parsing an HTML file: MSHTML Reference <URL:http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/mshtml/reference/reference.asp> - or - ..NET Html Agility Pack: How to use malformed HTML just like it was well-formed XML... <URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx> Download: <URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip> - or - SgmlReader 1.4 <URL:http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC> If the file read is in XHTML format, you can use the classes contained in the 'System.Xml' namespace for reading information from the file. -- M S Herfried K. Wagner M V P <URL:http://dotnet.mvps.org/> V B <URL:http://classicvb.org/petition/>
Pls Help me ,about GDI+ fill some image area
HTML edit Vb.net how to delete a file when the windows start Windows Service and StreamWriter programmically open a pdf file in vb.net problem with string function in vb.net code stopping program Looping through directories ClientScript.RegisterClientScriptBlock in ASP.NET 2.0 WebBrowser: Programmaticaly click an image button |
|||||||||||||||||||||||