[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Just when you think that you've got your mailing list under control,
some new Linux user (LUSER?) signs on and turns everything upside
down.  Oh well...

Hi guys!

There resides on one of the servers at my employer's office a total of
1,500 TIFF files ranging in size between 2-11 meg each.  The files are
scanned images of MSDSs (Material Safety Data Sheets) varying in size
between 1-7 pages per file.  Why are these files so large?  I dunno. 
My predecessor apparently configured the scanner software for the
highest possible resolution.  These 1,500 files consume 7 gigs of
space on the server, and the owner of the server would REALLY like for
my department to return some of that real estate to her - immediately
if not sooner.

The server owner was kind enough to burn 86 of the MSDSs (550 megs)
onto a CD, which I brought home with me this afternoon.  

The first thing that I did was to pick a MSDS at random, open it in
GIMP, print to file, and use "ps2pdf" to convert to the desired
format.  I opened the file in Acrobat Reader to discover that only the
first of the seven pages in the input file had survived.  I played
with GIMP for half an hour but could never figure out how to get it to
recognize multi-page documents.

Plan "B": The Imagemagick package includes an application called
"convert" that is said to be able to convert files at the command
prompt from TIFF format to PDF format.  So I gave it a whirl. 

  harold@outland$  convert  MSDS7009.TIF  msds7009.pdf

(No, I don't have any idea why the names of the files residing on the
server are all in caps.) 

Instead of getting a reduction in the size of these files, I got a
tenfold increase.  7 megs in; 70 megs out.  Perhaps there's a way to
fine-tune the conversion, but you've surmised by now that I stake no
claim to any special expertise with graphics. 

So, third time's a charm.  I went poking though the various Debian
packages and found one called "libtiff-tools".  I installed it and
there to my delight is a utility called "tiff2ps".  YES!!!

  harold@outland$  tiff2ps  -a  MSDS7009.TIF  >  msds7009.ps
  harold@outland$  ps2pdf  msds7009.ps  [inputfilename.pdf]

  input file:  MSDS7009.TIF = 7,367,788 bytes.
  intermediate file:  msds7009.ps = 14,941,198 bytes. 
  output file:  msds7009.pdf = 326,385 bytes (= 4.4% of input file)
  PDF quality is good!  
  LIFE IS GOOD!  :-) 

I tried piping the two commands...

  tiff2ps -a MSDS7009.TIF  |  ps2pdf

...but it didn't work.  I assume that it failed because ps2pdf didn't
have an input filename and therefore couldn't generate an output
filename.  I don't see any way to specify an output filename for
ps2pdf without preceeding it with an input filename, so it looks like
piping the two commands doesn't simplify anything.  But this is no big

Only 1,499 conversions to go.   :-(

Time to start thinking about how to automate this puppy.  What I meant
by that last sentence is that this is as good a time as any to start
learning shell scripting.  I have "Learning the Bash Shell" in my
library, but I must admit that the info hasn't soaked in as well as I
hoped it would.  If there's a good book out there on shell scripting,
I've yet to find it.  (Although I'm thinking about investing $40 for
O'Reilly's "Mastering Regular Expressions," which I'm told will also
give me a gentle intro into Perl).  I'm grasping here, guys.  If you
know a good shell scripting tutorial for a non-programmer, please
share the title with me.

As for what I have to work with, I have a series of input files with
the following naming convention:


where nnnn does *not* always increment by 1. In fact, the numbers of
the 86 files that I have on the CD run between 7000 and 7200.  

Obviously I can do all manipulations in a single directory, then
harvest the output files later.

If I were a programmer (which I'm not) I think that I would want to
create a loop of some type that... 

 1.  started with n=7000, 
 2.  incremented by 1, 
 3.  checked to see if the input file existed,
 4.  converted the input file if it *did* exist, or
 5.  jumped to the top if the input file did *not* exist, and
 6.  stopped at n=7201. 

Can I persuade you guys to help a shell-scripting wanna-be get started
here?  If I pull this off, MIS can't help but be impressed (and we'll
have negated their excuse to purchase the full blown Acrobat
application from the company that put Dmitry Sklyarov in the slammer).

- Harold  

In case you're wondering who Harold is, he's an engineer (chemical)
who just moved to Effingham, and you can visit his LUG at

To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.