[Python-talk] Kent's Korner?
Kent Johnson
kent37 at tds.net
Mon Oct 22 19:27:13 EDT 2007
Ric Werme wrote:
> Typical HTML to deal with is:
> <tr>
> <td headers="full" height="34" valign="top" class="main"
> style="background:url(images/table_gray.gif); padding:5px 10px;">
>
> 83 N MAIN ST<br />
>
> BOSCAWEN NH 03303-1235
> <br />
>
> </td>
> <td style="background:url(images/table_gray.gif);"> </td>
> <td height="34" align="right" valign="top" class="main"
> style="background:url(images/table_gray.gif); padding:5px 10px;">
> <a title="Mailing Industry Information" href="#"
> onClick="mailingIndustryPopup2('C071',
> 'MERRIMACK',
> '83',
> '9',
> '',
> '0051',
> 'D',
> 'S',
> '',
> '',
> '',
> '',
> 'Y');" >Mailing Industry Information</a>
> </tr>
>
> Note I have to fish the "delivery point" (83) from a Javascript
> subroutine call. That gets used in the USPS bar codes (which I
> currently don't generate).
>
>> What do you think?
>
> I'd be glad to use Kent's code. :-) However, I have no idea whether BS would
> be a better choice for this task.
Here is a start using BS:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(data)
tag = soup.find('td', headers='full')
first = tag.next
print first.strip()
second = first.next.next
print second.strip().replace(' ', ' ')
click = soup.find('a', title='Mailing Industry Information')['onclick']
print click.splitlines()[2].strip()[1:-2]
If data is your test data, this prints
83 N MAIN ST
BOSCAWEN NH 03303-1235
83
Kent
More information about the Python-talk
mailing list