[Python-talk] Can python program read index.jsp web page?

Alex Hewitt hewitt_tech at comcast.net
Fri Aug 10 07:17:04 EDT 2007

On Fri, 2007-08-10 at 00:24 -0400, Kent Johnson wrote:
> Alex Hewitt wrote:
> > At work we have applications written in Java that are accessed from a
> > web browser. Many of the pages have names like index.jsp which I believe
> > are Java servlets.
> JSP is Java Server Pages which is a Java-based template language for web 
> page generation.
> > Can a Python program read these pages? I tried
> > accessing them using urllib but no joy. 
> Should be able to. It's just text on the wire after all. What did you 
> try? What happened?
> > It may be that there is some
> > kind of user authentication going on under the covers but I don't get a
> > prompt from urllib as I would when I access my Linksys router's built-in
> > web server. When you access the Linksys router's web interface urllib
> > pops up a dialog box requesting your username/password. 
> What happens when you go to the page in the browser? Is there any kind 
> of authentication? What headers & status do you get from
> curl -i http://my.server.com/index.jsp
> ?\

I don't see anything active in the browser, no popup or anything like
that but get the impression that somehow I've been logged in silently.
If I can capture that traffic, assuming there is a handshake going on, I
might be able to write Python code to mimic what's going on.

> > My motive in
> > doing this is to use a Python program to exercise some of the
> > application functions but I can't do that if I can't read the pages in
> > the first place.
> You should be able to read the web page from Python. urllib2 is a bit 
> more flexible than urllib. You might also be interested in mechanize:
> http://wwwsearch.sourceforge.net/mechanize/

Thanks. This is really what I'm going after. I want to automate
interaction with the application web interface. Ultimately I'd like to
create synthetic transactions that could be used to exercise the system.
I'm already using the pexpect module to reach out to systems in the
network and essentially do for the user what they would normally need to
perform by hand.

> > One other thing, when some of the pages do come up in
> > Firefox or Internet Explorer, the java console starts in the background.
> Well that could be a deal killer, if the functionality of the page is 
> implemented in Java applets you can't touch that from Python except 
> maybe with GUI automation stuff - push this button, click this..

The more I think about it, I'm pretty sure there is only one Java applet
and everything else is done with the standard browser. For that matter I
might even be able to mechanize the applet to some degree.

> Kent

As always Kent, very useful pointers.



More information about the Python-talk mailing list