[Python-talk] Generating form data multipart MIME for httplib,

Bill Freeman f at ke1g.mv.com
Thu Apr 27 15:00:50 EDT 2006


> On Thu, 2006-04-27 at 13:37 -0400, Bill Freeman wrote:
> > I'm attempting to use python to submit --- NOT SERVICE --- forms data
> > using POST.
> > 
> So you are talking HTTP here using the POST method.  The function you
> want is urlencode in the urllib module.  Pass it a list of key-value
> pairs or a dictionary.  It builds what's needed.  Pass it as postdata to
> the urllib (or urllib2) open function along with the URL you want to
> POST to.  The presence of the postdata changes the method from GET to
> POST.

No, I had actually tried that first, before I got a look at what the (working)
Perl sample supplied by the target site produces.  urlencode produces an
ampersand separated string of name=value pairs, with all interesting characters
in the names and values suitably encoded to keep them from being interpreted
as =  or &, and to keep them valid for a url.  The result is suitable for
appending, after a question mark, to a url for use in a GET request.  (A GET
request doesn't work because the image data that comprises one of the form
fields is much to long to reliably send in a url.)

What the site in question expects is a POST request whose Content-Type is
multipart/form-data with an extra parameter of boundary=... (as with all
multipart content).  There also has to be a Content-Length header (I think,
at least all the multipart units I've seen have one).  After headers end,
with a blank line, the content is by putting each subpart in sequence, with
each subpart preceeded by a line containing two hyphens ('--') and the
specified boundary (from the Content-Type header).  After the boundary
line, each subpart has a set of headers of its own, in the forms case
always including a "Content-Disposition: form-data; name=..." header.
In the case of the image data, there is an additional parameter of the
content disposition header "filename=...", but this is the filename on
the source machine and probably isn't necessary in my application.  Also
only for the image data subpart, there is a "Content-Type: image/jpeg"
header.  The headers of each subpart end, like other sets of headers, with
a blank line.  The data, or value, of the subpart follows the blank line.
Neither the line boundary that ends the blank line nor the line boundary
at the beginning of the boundary sequence that follows the data are
considered to be part of the data.  After the last part is another boundary
line, but this one with two hypens after the boundary sequence as well
as before it.  All very MIME-like.

And all of this is easy enough to implement poorly, but then I have to maintain
it.  Even if I implement it well I have to maintain it.  So I was hoping for
a standard python library that would produce the desired request.

Bill



More information about the Python-talk mailing list