[Python-talk] Kent's Korner?

Kent Johnson kent37 at tds.net
Mon Oct 22 19:27:13 EDT 2007


Ric Werme wrote:
> Typical HTML to deal with is:
>     <tr>
>       <td headers="full" height="34" valign="top" class="main" 
> style="background:url(images/table_gray.gif); padding:5px 10px;">
> 
> 		83 N MAIN ST<br />
> 
> 		BOSCAWEN&nbsp;NH&nbsp;&nbsp;03303-1235
> 		<br />
> 
> 		</td>
>       <td style="background:url(images/table_gray.gif);">&nbsp;</td>
>       <td height="34" align="right" valign="top" class="main" 
> style="background:url(images/table_gray.gif); padding:5px 10px;">
> 		<a title="Mailing Industry Information" href="#" 
> onClick="mailingIndustryPopup2('C071',
> 			'MERRIMACK',
> 			'83',
> 			'9',
> 			'',
> 			'0051',
> 			'D',
> 			'S',
> 			'',
> 			'',
> 			'',
> 			'',
> 			'Y');" >Mailing Industry Information</a>
> 	</tr>
> 
> Note I have to fish the "delivery point" (83) from a Javascript
> subroutine call.  That gets used in the USPS bar codes (which I
> currently don't generate).
> 
>> What do you think?
> 
> I'd be glad to use Kent's code.  :-)  However, I have no idea whether BS would
> be a better choice for this task.

Here is a start using BS:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(data)

tag = soup.find('td', headers='full')
first = tag.next
print first.strip()
second = first.next.next
print second.strip().replace('&nbsp;', ' ')

click = soup.find('a', title='Mailing Industry Information')['onclick']
print click.splitlines()[2].strip()[1:-2]


If data is your test data, this prints
83 N MAIN ST
BOSCAWEN NH  03303-1235
83

Kent


More information about the Python-talk mailing list