[Python-talk] Accessing Web site via Python

Bill Sconce sconce at in-spec-inc.com
Mon May 7 14:38:04 EDT 2007


On Fri, 4 May 2007 06:06:19 -0700 (PDT)
Peter Courlis <neat_gent at yahoo.com> wrote:

> Good Morning,
> 
> A Python question?
> 
> I would like to  be able to go to a Web Site and 
> grab a piece of data. For example, to go to
> a specific Yahoo Finance page such as, 
> URL:http://finance.yahoo.com/q?s=AAPL
>   [...]
> Is there a mechanism, methods or library code in Python to
> perform this capture without writing spaghetti code?
> 
> Thanks in advance
> 
> Jim


Hi, Peter (Jim?) -

Yes there is.  For how you might pull down the HTML, see below.

Lloyd (the guy who signs his name as "Python") is right about parsing HTML;
i.e., using BeautifulSoup.  And of course what you do with the quote data
after you've extracted them may be any of a number of different processes. 

The code below should give you a flavor of Python.  It works.  (Although you
probably don't want to be hitting Yahoo! every 5 seconds...  :)

-Bill

P.S.  Do come around and hang out on the list, as Lloyd suggests.  You'll
be welcome.  And welcome at our PySIG meetings too!

_____________________________________________________________________
#! /usr/bin/env python2.5
"""Demonstrate one possible beginning to solving Jim Courlis's poser
   about Python and the Web.
   (c) In Spec, Inc., Milford, NH 03055-0085 USA.
   Released under GPL V2.
"""
import urllib2, time

def get_AAPL():
    """Get and return current AAPL quote from Yahoo! Finance page.
    
    Use ad-hoc parse based on simple manual inspection of the page.
    """
    quote_reader = urllib2.urlopen('http://finance.yahoo.com/q?s=AAPL')
    for text in quote_reader:
        whereis = text.find('class="ygtb"><b>Apple Inc. (AAPL)')
        if whereis > -1:
            # HTML where quote appears looks like "<big><b>102.88</b></big>"
            start, end = '<big><b>', '</b></big>'
            start = text.find(start, whereis) + len(start)
            end = text.find('</b></big>', start)
            quote_reader.close()
            return text[start:end]


if __name__ == '__main__':
    # Unit-test by getting and displaying a few quotes every 5 seconds
    more = 3
    while 1:
        print 'AAPL now trading at', get_AAPL()
        more -= 1
        if more:
            time.sleep(5)
        else:
            break


More information about the Python-talk mailing list