[Python-talk] Generating form data multipart MIME for httplib, urllib{, 2} Request objects

Bill Freeman f at ke1g.mv.com
Thu Apr 27 13:37:26 EDT 2006


First, if I miss the meeting tonight, "hello" to everyone, and I'm sorry, being
a TECO hack from way back (~1969).  If I feel really sick still, I'll stay home.
If I feel really good, I have a rehearsal to attend for performances this weekend.
If I feel just right I'll come to the meeting and all the other people with hacking
coughs infect the rest of you.

I'm attempting to use python to submit --- NOT SERVICE --- forms data using POST.
This requires something that at least looks like MIME multipart stuff in the
POST body, each form value being a separate part.  And there is the non-depricated
email.MIME* stuff, which can certainly help in generating such a body.  Probably
all the surperflous extra headers that get generated in each part don't matter.
And maybe all web servers could deal with having uploaded jpeg files encoded in
base64, even though HTTP is 8 bit clean.  I've gotten around these anyway by
deriving from email.Message.Message directly, adding the headers that I want
myself, and place the unencoded image data directly as the body of the message.

However, this library, when producing the final single string for the whole
multipart object body, makes the protocol dependent line boundaries just '\n',
as opposed to '\r\n', which, I'm currently theorizing, is why the server
can't seem to find the value of one of the simple text form data fields (it
only complains about the first one and quits, I'll bet that it can't parse
the others either).

[httplib, on the other hand, correctly puts '\r\n' at the end of the main
header lines of the request.]

Fixing these line endings isn't quite so easy.  Message.as_string() uses
email.Generator so that it can do print commands to produce the result.
print is what chooses the line boundries of interest.  Hacking the Generator
is more than I can face at the moment.  Besides, I'm not sure what that would
do to the occasional '\n' inside of an image's data, which should not be
translated.

Possible '\n' inside image data is also much of what's wrong with much of
just doing a string substitution of the result of as_string().  I'm convinced
that such a substition would not invalidate the choice of boundary sequence,
but it still changes Content-Length (though I could handle that).

Basically, using the email.MIME* stuff seems to be more trouble than it is
worth.

I haven't found suitable multipart generation stuff in the http or url
libraries.  Unless someone can point out something that I've missed, I'll
be reinventing the wheel.  Suggestions?

Bill



More information about the Python-talk mailing list