[Python-talk] Can python program read index.jsp web page?
Alex Hewitt
hewitt_tech at comcast.net
Mon Aug 13 07:50:12 EDT 2007
On Fri, 2007-08-10 at 09:41 -0400, Lloyd Kvam wrote:
> On Fri, 2007-08-10 at 07:55 -0400, Kent Johnson wrote:
> > >> What happens when you go to the page in the browser? Is there any
> > kind
> > >> of authentication? What headers & status do you get from
> > >> curl -i http://my.server.com/index.jsp
> > >> ?\
> > >
> > > I don't see anything active in the browser, no popup or anything
> > like
> > > that but get the impression that somehow I've been logged in
> > silently.
> > > If I can capture that traffic, assuming there is a handshake going
> > on, I
> > > might be able to write Python code to mimic what's going on.
> >
> > Possibly the browser has an authentication cookie that is allowing
> > you
> > to bypass some kind of login. If you are on a windows client there
> > may
> > be other magic methods to authenticate, I'm not sure.
> >
> > It would be very useful to see the HTTP status and headers coming
> > back
> > from the server on your request. Presumably they include either an
> > authentication request or a redirect to a login page. Did you try
> > looking at them in curl?
>
> The Firefox plugins Firedrop and TamperData can be very helpful in
> tracing through the browser's conversation. Once you can follow the
> conversation over the wire, you can code it into Python.
>
I downloaded and installed TamperData. Another really nice tool. It
reminds me of Wireshark/Ethereal in the way it works. This and curl
should go a long way to shedding light on what's going on under the
covers.
> I've done some programming that parallels what you are trying to do, so
> I may be able to provide some sample code. Mixing cookies,
> authentication, keepalives, referrer headers and more can get
> complicated. However, an HTTP conversation can be handled from Python.
I haven't had time yet to see what effect cookies might be having but
based on a quick test Friday I think that's why I'm able to pull the
application up in my browser. The web page says "logged in: Alex Hewitt"
which tells me that it likely read a cookie on my system because it
never asked me to log in.
> I've sometimes found it easier to try ideas with wget before coding them
> into Python.
>
>
> (Background notes)
> I've been using DAV when I need to provide write-access to a server for
> remote Windows computers. (Linux folks use SSH and give me their public
> key.) I can give them a certificate to import into IE and they open the
> DAV site clicking File/Open and checking the web folder box. My Python
> code jumps through the equivalent hoops when accessing DAV sites.
>
> I also process some baseball stats which requires cookies and
> authentication in urllib2, and BeautifulSoup to extract the data.
>
More information about the Python-talk
mailing list