[Python-talk] Module to massage text file for better output?
Lloyd Kvam
python at venix.com
Sat Jul 14 10:51:20 EDT 2007
On Sat, 2007-07-14 at 09:51 -0400, Hewitt_Tech wrote:
> Before I go to the trouble of writing a customer report processor I was
> wondering if Python has a module that might expedite writing the
> program.
Of course it does ;)
> Specifically I have a file generated by a SQL utility that
> creates a large text file with lot's of data aligned in rows/columns.
> The problem is that some of the data is aligned in fixed size fields
> that are much larger than necessary. What I'd like to have for output is
> data in fixed size fields that are more optimal.
I assume that redoing the report directly from SQL is too much work. If
the data can be organized into a single select, the itertools.groupby
function (nest calls as necessary) makes report processing fairly
painless.
> So for example I
> might have a user's name field that is 80 characters wide but the
> largest data never exceeds 35 characters.
The struct module allows you to define fixed width character fields.
inlinefmt = '5x80c80c50c' # struct format string
# skip indent of 5 spaces, first name, last name, address line
firstname,lastname,addressline = linedata = struct.unpack(inlinefmt, inline)
outfmt = '%35s,%35s,%20s'
outline = outfmt % linedata
> I want the extra characters
> gone but still have good alignment to make the report more readable. The
> utility also outputs a header line with the names of the fields
> underligned.
Is this really a text file or is it fancier? The above logic will work
with ASCII. I've never used struct with unicode data. I suspect that
slicing might be simpler with unicode.
uniline = inline.decode('utf8') # or whatever
firstname = uniline[6:86]
...
The same set of slices should work with headers and data.
> That also should be adjusted to fit better with the data.
> Does Python already have something that would do this or should I just
> write the program?
I think you'll be programming either way, but Python should keep things
simpler.
> I suspect by the time I fiddle with the output I'd
> probably end up spending 4 or more hours getting something that would do
> the job.
>
> -Alex
>
> P.S. If I wrote this program I could probably read the report to EOF
> recording the maximum size of each datum for a particular field and then
> rewrite the data making it fit the fields better.
So you'd build the outfmt string(s) in the first pass and produce the
pretty report in the second pass. (double the % to escape it)
outfmt = '%%%ds,%%%ds,%%%ds' % (35,35,20)
produces '%35s,%35s,%20s'
I probably gave you way more detail than you wanted; probably should
have stopped when I said struct module.
> _______________________________________________
> Python-talk mailing list
> Python-talk at dlslug.org
> http://dlslug.org/mailman/listinfo/python-talk
--
Lloyd Kvam
Venix Corp
More information about the Python-talk
mailing list