[Python-talk] Can python program read index.jsp web page?

Alex Hewitt hewitt_tech at comcast.net
Mon Aug 13 07:50:12 EDT 2007


On Fri, 2007-08-10 at 09:41 -0400, Lloyd Kvam wrote:
> On Fri, 2007-08-10 at 07:55 -0400, Kent Johnson wrote:
> > >> What happens when you go to the page in the browser? Is there any
> > kind 
> > >> of authentication? What headers & status do you get from
> > >> curl -i http://my.server.com/index.jsp
> > >> ?\
> > > 
> > > I don't see anything active in the browser, no popup or anything
> > like
> > > that but get the impression that somehow I've been logged in
> > silently.
> > > If I can capture that traffic, assuming there is a handshake going
> > on, I
> > > might be able to write Python code to mimic what's going on.
> > 
> > Possibly the browser has an authentication cookie that is allowing
> > you 
> > to bypass some kind of login. If you are on a windows client there
> > may 
> > be other magic methods to authenticate, I'm not sure.
> > 
> > It would be very useful to see the HTTP status and headers coming
> > back 
> > from the server on your request. Presumably they include either an 
> > authentication request or a redirect to a login page. Did you try 
> > looking at them in curl?
> 
> The Firefox plugins Firedrop and TamperData can be very helpful in
> tracing through the browser's conversation.  Once you can follow the
> conversation over the wire, you can code it into Python.
> 

I downloaded and installed TamperData. Another really nice tool. It
reminds me of Wireshark/Ethereal in the way it works. This and curl
should go a long way to shedding light on what's going on under the
covers.


> I've done some programming that parallels what you are trying to do, so
> I may be able to provide some sample code.  Mixing cookies,
> authentication, keepalives, referrer headers and more can get
> complicated.  However, an HTTP conversation can be handled from Python.


I haven't had time yet to see what effect cookies might be having but
based on a quick test Friday I think that's why I'm able to pull the
application up in my browser. The web page says "logged in: Alex Hewitt"
which tells me that it likely read a cookie on my system because it
never asked me to log in.

> I've sometimes found it easier to try ideas with wget before coding them
> into Python.
> 
> 
> (Background notes)
> I've been using DAV when I need to provide write-access to a server for
> remote Windows computers.  (Linux folks use SSH and give me their public
> key.)  I can give them a certificate to import into IE and they open the
> DAV site clicking File/Open and checking the web folder box.  My Python
> code jumps through the equivalent hoops when accessing DAV sites.
> 
> I also process some baseball stats which requires cookies and
> authentication in urllib2, and BeautifulSoup to extract the data.
> 



More information about the Python-talk mailing list