[Python-talk] Can python program read index.jsp web page?
hewitt_tech at comcast.net
Mon Aug 13 07:45:32 EDT 2007
On Fri, 2007-08-10 at 07:55 -0400, Kent Johnson wrote:
> Alex Hewitt wrote:
> > On Fri, 2007-08-10 at 00:24 -0400, Kent Johnson wrote:
> >>> Can a Python program read these pages? I tried
> >>> accessing them using urllib but no joy.
> >> Should be able to. It's just text on the wire after all. What did you
> >> try? What happened?
> Still wondering...
> >> What happens when you go to the page in the browser? Is there any kind
> >> of authentication? What headers & status do you get from
> >> curl -i http://my.server.com/index.jsp
> >> ?\
> > I don't see anything active in the browser, no popup or anything like
> > that but get the impression that somehow I've been logged in silently.
> > If I can capture that traffic, assuming there is a handshake going on, I
> > might be able to write Python code to mimic what's going on.
> Possibly the browser has an authentication cookie that is allowing you
> to bypass some kind of login. If you are on a windows client there may
> be other magic methods to authenticate, I'm not sure.
> It would be very useful to see the HTTP status and headers coming back
> from the server on your request. Presumably they include either an
> authentication request or a redirect to a login page. Did you try
> looking at them in curl?
I tried curl and that was certainly a revelation. I think I should be
able to see what's going on now. Thanks for the pointer.
> >>> My motive in
> >>> doing this is to use a Python program to exercise some of the
> >>> application functions but I can't do that if I can't read the pages in
> >>> the first place.
> Not necessarily true. For example you can make a POST or GET request of
> the app providing parameters that make it do something useful without
> having read the page containing the form or link that a user would use
> in a browser.
> Can you get any cooperation from someone who knows the server app? Cuz
> what you really want to know is, what is the API the server exposes to
> the client (in the form of URLs).
The developers are basically just down the hall but they also are up to
their ears in work so I'm reluctant to take too much of their time. I'll
probably wait until I have some specific questions before approaching
> deal-killer. But if it is plain HTML you should be able to do what you want.
Again, thanks for the curl pointer. That will really help.
More information about the Python-talk