[Python-talk] Module to massage text file for better output?

Lloyd Kvam python at venix.com
Sat Jul 14 10:51:20 EDT 2007


On Sat, 2007-07-14 at 09:51 -0400, Hewitt_Tech wrote:
> Before I go to the trouble of writing a customer report processor I was 
> wondering if Python has a module that might expedite writing the 
> program. 
Of course it does ;)
> Specifically I have a file generated by a SQL utility that 
> creates a large text file with lot's of data aligned in rows/columns. 
> The problem is that some of the data is aligned in fixed size fields 
> that are much larger than necessary. What I'd like to have for output is 
>    data in fixed size fields that are more optimal. 

I assume that redoing the report directly from SQL is too much work.  If
the data can be organized into a single select, the itertools.groupby
function (nest calls as necessary) makes report processing fairly
painless.

> So for example I 
> might have a user's name field that is 80 characters wide but the 
> largest data never exceeds 35 characters. 

The struct module allows you to define fixed width character fields.
        inlinefmt = '5x80c80c50c'	# struct format string
                # skip indent of 5 spaces, first name, last name, address line
        firstname,lastname,addressline = linedata = struct.unpack(inlinefmt, inline)
        outfmt = '%35s,%35s,%20s'
        outline = outfmt % linedata
        
> I want the extra characters 
> gone but still have good alignment to make the report more readable. The 
> utility also outputs a header line with the names of the fields 
> underligned. 
Is this really a text file or is it fancier?  The above logic will work
with ASCII.  I've never used struct with unicode data.  I suspect that
slicing might be simpler with unicode.
        uniline = inline.decode('utf8') # or whatever
        firstname = uniline[6:86]
        ...
The same set of slices should work with headers and data.
> That also should be adjusted to fit better with the data. 
> Does Python already have something that would do this or should I just 
> write the program? 

I think you'll be programming either way, but Python should keep things
simpler.

> I suspect by the time I fiddle with the output I'd 
> probably end up spending 4 or more hours getting something that would do 
> the job.
> 
> -Alex
> 
> P.S. If I wrote this program I could probably read the report to EOF 
> recording the maximum size of each datum for a particular field and then 
>   rewrite the data making it fit the fields better.

So you'd build the outfmt string(s) in the first pass and produce the
pretty report in the second pass.  (double the % to escape it)
        outfmt = '%%%ds,%%%ds,%%%ds' % (35,35,20)
        produces '%35s,%35s,%20s'


I probably gave you way more detail than you wanted; probably should
have stopped when I said struct module.

> _______________________________________________
> Python-talk mailing list
> Python-talk at dlslug.org
> http://dlslug.org/mailman/listinfo/python-talk
-- 
Lloyd Kvam
Venix Corp



More information about the Python-talk mailing list