|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Screen Scraping a Password Protected Siteaccess the site's login page in my browser and view the source, I see that it does not contain a viewstate. When my program posts the login information, the response I get is the same page as if I had logged in using my browser. In the page it says "Welcome" followed by my name. The cookie collection returned doesn't contain any cookies (response.cookies.count = 0). When I access other pages, the login screen is returned instead of the desired page. Obviously, I need to somehow maintain the session in subsequent calls, but how do I do that when there are no cookies and there is no viewstate? If I use Fiddler to see what happens when I access the site from my browser, I can see that the first line for the site (where the result is 200 and the host says "CONNECT") says "SessionID: empty" under Session Inspector - Textview for the request. For the response it says "SessionID: " then several bytes of data. Subsequent 200/CONNECT lines have that same data for both the request and the response. This must be what I need to maintain my session. If anyone can help me figure out how to get this information and use it, I'll be very grateful. (I'm using VB in VS2003.) Thanks. Check out the System.Net.CookieContainer class.
You can override a System.Net.WebClient class to store and retrieve cookies to a singleton CookieContainer and the once you have logged in to the website you will stay logged in. something like this (untested) ... '============================================ Imports System.Net Public Class CookieWebClient : Inherits WebClient ' overridden to add cookie headers to http requests. Protected Overrides Function GetWebRequest(ByVal address As System.Uri) As System.Net.WebRequest Dim request As WebRequest = MyBase.GetWebRequest(address) If TypeOf request Is HttpWebRequest Then DirectCast(request, HttpWebRequest).CookieContainer = _cookies End If Return request End Function ' overridden to save cookies to the container for http requests. Protected Overrides Function GetWebResponse(ByVal request As System.Net.WebRequest) As System.Net.WebResponse Dim response As WebResponse = MyBase.GetWebResponse(request) If TypeOf response Is HttpWebResponse Then _cookies.Add(response.ResponseUri, DirectCast(response, HttpWebResponse).Cookies) End If Return response End Function ' overridden to save cookies to the container for async http requests. Protected Overrides Function GetWebResponse(ByVal request As System.Net.WebRequest, ByVal result As System.IAsyncResult) As System.Net.WebResponse Dim response As WebResponse = MyBase.GetWebResponse(request, result) If TypeOf response Is HttpWebResponse Then _cookies.Add(response.ResponseUri, DirectCast(response, HttpWebResponse).Cookies) End If Return response End Function Private Shared _cookies As CookieContainer = New CookieContainer End Class '============================================ Then just use the ExWebClient class to make your requests; Dim c As New ExWebClient Dim s as string = c.DownloadString("http://www.somesite.com") Works for me :-) -Blake Gregory A Greenman wrote: Show quoteHide quote > I'm trying to screen scrape a site that requires a password. If I > access the site's login page in my browser and view the source, I > see that it does not contain a viewstate. > > When my program posts the login information, the response I get > is the same page as if I had logged in using my browser. In the > page it says "Welcome" followed by my name. The cookie collection > returned doesn't contain any cookies (response.cookies.count = > 0). > > When I access other pages, the login screen is returned instead > of the desired page. > > Obviously, I need to somehow maintain the session in subsequent > calls, but how do I do that when there are no cookies and there > is no viewstate? > > If I use Fiddler to see what happens when I access the site from > my browser, I can see that the first line for the site (where the > result is 200 and the host says "CONNECT") says "SessionID: > empty" under Session Inspector - Textview for the request. For > the response it says "SessionID: " then several bytes of data. > Subsequent 200/CONNECT lines have that same data for both the > request and the response. This must be what I need to maintain my > session. If anyone can help me figure out how to get this > information and use it, I'll be very grateful. > > (I'm using VB in VS2003.) > > Thanks. > > > -- > Greg > ---- > http://www.spencerbooksellers.com > greg00 -at- spencersoft -dot- com i Should have stated before that to login you will need to call
CookieWebClient.UploadValues() to post to your sites login form first. -Blake Blake wrote: Show quoteHide quote > Check out the System.Net.CookieContainer class. > > You can override a System.Net.WebClient class to store and retrieve > cookies to a singleton CookieContainer and the once you have logged in > to the website you will stay logged in. > > something like this (untested) ... > > '============================================ > Imports System.Net > > Public Class CookieWebClient : Inherits WebClient > > ' overridden to add cookie headers to http requests. > Protected Overrides Function GetWebRequest(ByVal address As > System.Uri) As System.Net.WebRequest > Dim request As WebRequest = MyBase.GetWebRequest(address) > If TypeOf request Is HttpWebRequest Then > DirectCast(request, HttpWebRequest).CookieContainer = > _cookies > End If > Return request > End Function > > ' overridden to save cookies to the container for http requests. > Protected Overrides Function GetWebResponse(ByVal request As > System.Net.WebRequest) As System.Net.WebResponse > Dim response As WebResponse = MyBase.GetWebResponse(request) > If TypeOf response Is HttpWebResponse Then > _cookies.Add(response.ResponseUri, DirectCast(response, > HttpWebResponse).Cookies) > End If > Return response > End Function > > ' overridden to save cookies to the container for async http > requests. > Protected Overrides Function GetWebResponse(ByVal request As > System.Net.WebRequest, ByVal result As System.IAsyncResult) As > System.Net.WebResponse > Dim response As WebResponse = MyBase.GetWebResponse(request, > result) > If TypeOf response Is HttpWebResponse Then > _cookies.Add(response.ResponseUri, DirectCast(response, > HttpWebResponse).Cookies) > End If > Return response > End Function > > Private Shared _cookies As CookieContainer = New CookieContainer > > End Class > '============================================ > > Then just use the ExWebClient class to make your requests; > > > Dim c As New ExWebClient > > Dim s as string = c.DownloadString("http://www.somesite.com") > > > Works for me :-) > > -Blake > > > > Gregory A Greenman wrote: > > I'm trying to screen scrape a site that requires a password. If I > > access the site's login page in my browser and view the source, I > > see that it does not contain a viewstate. > > > > When my program posts the login information, the response I get > > is the same page as if I had logged in using my browser. In the > > page it says "Welcome" followed by my name. The cookie collection > > returned doesn't contain any cookies (response.cookies.count = > > 0). > > > > When I access other pages, the login screen is returned instead > > of the desired page. > > > > Obviously, I need to somehow maintain the session in subsequent > > calls, but how do I do that when there are no cookies and there > > is no viewstate? > > > > If I use Fiddler to see what happens when I access the site from > > my browser, I can see that the first line for the site (where the > > result is 200 and the host says "CONNECT") says "SessionID: > > empty" under Session Inspector - Textview for the request. For > > the response it says "SessionID: " then several bytes of data. > > Subsequent 200/CONNECT lines have that same data for both the > > request and the response. This must be what I need to maintain my > > session. If anyone can help me figure out how to get this > > information and use it, I'll be very grateful. > > > > (I'm using VB in VS2003.) > > > > Thanks. > > > > > > -- > > Greg > > ---- > > http://www.spencerbooksellers.com > > greg00 -at- spencersoft -dot- com Triple post. Yay!.
It's also worth noting that you dont need to use Fiddler to see the http traffic. The System.Net classes have been compiled with TRACE turned on, so you can add a .config file like this; <?xml version="1.0" encoding="utf-8" ?> <configuration> <system.diagnostics> <sources> <source name="System.Net" switchValue="Information"/> </sources> </system.diagnostics> </configuration> ....and you will see the http headers going back and forth in the output window. If you set the level to verbose you can also see the data. -Blake Blake wrote: Show quoteHide quote > i Should have stated before that to login you will need to call > > CookieWebClient.UploadValues() to post to your sites login form first. > > -Blake > > Blake wrote: > > Check out the System.Net.CookieContainer class. > > > > You can override a System.Net.WebClient class to store and retrieve > > cookies to a singleton CookieContainer and the once you have logged in > > to the website you will stay logged in. > > > > something like this (untested) ... > > > > '============================================ > > Imports System.Net > > > > Public Class CookieWebClient : Inherits WebClient > > > > ' overridden to add cookie headers to http requests. > > Protected Overrides Function GetWebRequest(ByVal address As > > System.Uri) As System.Net.WebRequest > > Dim request As WebRequest = MyBase.GetWebRequest(address) > > If TypeOf request Is HttpWebRequest Then > > DirectCast(request, HttpWebRequest).CookieContainer = > > _cookies > > End If > > Return request > > End Function > > > > ' overridden to save cookies to the container for http requests. > > Protected Overrides Function GetWebResponse(ByVal request As > > System.Net.WebRequest) As System.Net.WebResponse > > Dim response As WebResponse = MyBase.GetWebResponse(request) > > If TypeOf response Is HttpWebResponse Then > > _cookies.Add(response.ResponseUri, DirectCast(response, > > HttpWebResponse).Cookies) > > End If > > Return response > > End Function > > > > ' overridden to save cookies to the container for async http > > requests. > > Protected Overrides Function GetWebResponse(ByVal request As > > System.Net.WebRequest, ByVal result As System.IAsyncResult) As > > System.Net.WebResponse > > Dim response As WebResponse = MyBase.GetWebResponse(request, > > result) > > If TypeOf response Is HttpWebResponse Then > > _cookies.Add(response.ResponseUri, DirectCast(response, > > HttpWebResponse).Cookies) > > End If > > Return response > > End Function > > > > Private Shared _cookies As CookieContainer = New CookieContainer > > > > End Class > > '============================================ > > > > Then just use the ExWebClient class to make your requests; > > > > > > Dim c As New ExWebClient > > > > Dim s as string = c.DownloadString("http://www.somesite.com") > > > > > > Works for me :-) > > > > -Blake > > > > > > > > Gregory A Greenman wrote: > > > I'm trying to screen scrape a site that requires a password. If I > > > access the site's login page in my browser and view the source, I > > > see that it does not contain a viewstate. > > > > > > When my program posts the login information, the response I get > > > is the same page as if I had logged in using my browser. In the > > > page it says "Welcome" followed by my name. The cookie collection > > > returned doesn't contain any cookies (response.cookies.count = > > > 0). > > > > > > When I access other pages, the login screen is returned instead > > > of the desired page. > > > > > > Obviously, I need to somehow maintain the session in subsequent > > > calls, but how do I do that when there are no cookies and there > > > is no viewstate? > > > > > > If I use Fiddler to see what happens when I access the site from > > > my browser, I can see that the first line for the site (where the > > > result is 200 and the host says "CONNECT") says "SessionID: > > > empty" under Session Inspector - Textview for the request. For > > > the response it says "SessionID: " then several bytes of data. > > > Subsequent 200/CONNECT lines have that same data for both the > > > request and the response. This must be what I need to maintain my > > > session. If anyone can help me figure out how to get this > > > information and use it, I'll be very grateful. > > > > > > (I'm using VB in VS2003.) > > > > > > Thanks. > > > > > > > > > -- > > > Greg > > > ---- > > > http://www.spencerbooksellers.com > > > greg00 -at- spencersoft -dot- com
I want to KEEP trailing zeros
Setting focus to the Form Sending an XML Node to a Function for Processing SP1 Install - was it successful? SQL Server Authentication issues! Problem with datagridview Implement Icomparable Windows Service, Process.Start Need some For Loop Next Item How to use Shell Extensions class from Eduardo Morcillo |
|||||||||||||||||||||||