[Python-talk] Accessing Web site via Python
Bill Sconce
sconce at in-spec-inc.com
Mon May 7 14:38:04 EDT 2007
On Fri, 4 May 2007 06:06:19 -0700 (PDT)
Peter Courlis <neat_gent at yahoo.com> wrote:
> Good Morning,
>
> A Python question?
>
> I would like to be able to go to a Web Site and
> grab a piece of data. For example, to go to
> a specific Yahoo Finance page such as,
> URL:http://finance.yahoo.com/q?s=AAPL
> [...]
> Is there a mechanism, methods or library code in Python to
> perform this capture without writing spaghetti code?
>
> Thanks in advance
>
> Jim
Hi, Peter (Jim?) -
Yes there is. For how you might pull down the HTML, see below.
Lloyd (the guy who signs his name as "Python") is right about parsing HTML;
i.e., using BeautifulSoup. And of course what you do with the quote data
after you've extracted them may be any of a number of different processes.
The code below should give you a flavor of Python. It works. (Although you
probably don't want to be hitting Yahoo! every 5 seconds... :)
-Bill
P.S. Do come around and hang out on the list, as Lloyd suggests. You'll
be welcome. And welcome at our PySIG meetings too!
_____________________________________________________________________
#! /usr/bin/env python2.5
"""Demonstrate one possible beginning to solving Jim Courlis's poser
about Python and the Web.
(c) In Spec, Inc., Milford, NH 03055-0085 USA.
Released under GPL V2.
"""
import urllib2, time
def get_AAPL():
"""Get and return current AAPL quote from Yahoo! Finance page.
Use ad-hoc parse based on simple manual inspection of the page.
"""
quote_reader = urllib2.urlopen('http://finance.yahoo.com/q?s=AAPL')
for text in quote_reader:
whereis = text.find('class="ygtb"><b>Apple Inc. (AAPL)')
if whereis > -1:
# HTML where quote appears looks like "<big><b>102.88</b></big>"
start, end = '<big><b>', '</b></big>'
start = text.find(start, whereis) + len(start)
end = text.find('</b></big>', start)
quote_reader.close()
return text[start:end]
if __name__ == '__main__':
# Unit-test by getting and displaying a few quotes every 5 seconds
more = 3
while 1:
print 'AAPL now trading at', get_AAPL()
more -= 1
if more:
time.sleep(5)
else:
break
More information about the Python-talk
mailing list