[Coco] [Color Computer] the secret to scanning magazines for archival purposes
stinger30au
stinger30au at yahoo.com.au
Tue May 19 06:18:46 EDT 2009
The secret to high quality scanning by Dez the Coconut in Australia
Ok folks, there has been much talking about the scanning that everyone has done and there is a number of scans for the rainbow magazine. There is some ok quality, good quality and pretty amazing quality.
Here's how I do my scanning and hopefully someone might learn from my experiences thus for from the last 5 months
First off, start off with a copy of Ubuntu Linux from www.ubuntu.com
you can download it via torrent or ftp or even request a free cd copy of it or hunt and see if there is a Linux user group and maybe someone there will have a copy they will gladly let you have a copy of
the choice is yours if you want to install ubuntu to internal hdd or usb memory stick or what ever. Theres tons of ways to do it. Help is at www.ubuntuforums.org or you can email me. Im no expert at it but I have installed it to external hdd, external thumbdrive, dual boot off one hdd and dual hdd too.
And for what we want to do you could even argue the fact that you dont even need to install ubuntu
anyhow
so once ubuntu is installed, in using 9.04 currently on one and 8.04 on the other you need to fireup shell
shell is the command line, just like OS9 or messy, oops , msdos, only shell is much more powerful
with a bit of luck you can plug your usb scanner in and ubuntu will recognise it and were in business. If it doesn't work and you need to install drivers.... don't ask me. I have no idea. Ask at www.ubuntuforums.org
ok, now we need to install the scanner software that I use its called "gscan2pdf" it is an open-source bit of gear and it does exactly what I need standing on is head
its homepage is here
http://gscan2pdf.sourceforge.net/
to install it from shell type in
sudo apt-get install gscan2pdf
while your at it , install pdftk as well, its the PDF slicer/dicer/boxer I use to manipulate all my PDF files with
sudo apt-get install pdftk
and we will also need pdf2djvu so once we have our high quality PDF we can convert to djvu at 400 dpi and save mobs of space and have very detailed documents as well
soa gin from shell we type in
sudo apt-get install pdf2djvu
ok well that's all the tools we need
lets go scanning
fire-up gscan2pdf and click the scan button. With a bit of luck your usb scanner will be auto-magically picked and you will see it and some settings to change
my main scanner im using is a HP scan-jett 6300 with 25 sheet adf automatic document feeder like a fax machine for the ones whodon'tt know what an adf is.
I can select the speed I want to scan at I always select the fastest
next is resolution I always select 300
next is the mode of scan we have
line-art
half-tone
grey-scale
colour
line-art is essentially a black and white scan with very little difference in the colour of black/grey
this is great to use on pages that are essentially just plain black. DONT USE THIS MODE IF PHOTOS ARE ON THE PAGE. They look horrid in this mode. This mode takes up only a little space.
half-tone will take a very dark black original piece of paper and when scan turn it in to a very dull looking grey picture on your pc. Who knows they you will ever need this mode for.
Grey scale. Use this mode if you have a black/white mag or newspaper and there's photos on the page. This mode will give you quite good looking b/w reproduction and looks magic.this mode takes up a fair-bit of space, but not as bad as full colour
full colour.... well I think this is self explanatory
ok so here's how I crank out a scanned magazine
scan about 10 to 20 pages and then save
now when saving there is a number of options to save your scanned pages youuu can save aindividualll page or save the entire lot and you have the choice as to wehter or not you use jpeg or a feotherer formats as well.
>From my experiments I have found that when saving to PDF I use the compression method of jpeg.
Jpeg is a "lossy format" so to combat the loss of quality I save at 84% quality. If I save at 85% quality the file size jumps to amazing proportions, so I leave it at 84%
so repeat this scanning process for your book and you will wind up with your data saved in a directory and file names that look like this
my.magazine.part1.pdf
my.magazine.part2.pdf
my.magazine.part3.pdf
my.magazine.part4.pdf
my.magazine.part5.pdf
ok so for argument sake we will say that each file was 20 pages in it and is 20 megabyte long so when all joined together we will have a single PDF 100meg long and all pages in numerical order
so to archive this we fire-up shell again and go in to the directory where we saved our files and fire-up pdftk
http://www.accesspdf.com/pdftk/
pdftk lets us do all sorts of groovy stuff to PDF files. We are going to use it to join our individual files together and make one. It will do this standing on its head. It has numerous groove features, but I wont go in to that here
ok so from shell we type in
pdftk my.
And now e press the tab button on our keyboard and like magic we will now have in front of our very eyes
pdftk my.magazine.part
told you shell was powerful. It scanned the directory and magically added in the "magazine.part" for us
now we press 1 so we now have
pdftk my.magazine.part1
now we press tab button again and we now have in-front of us
pdftk my.magazine.part1.pdf
smart ain't it. For very little time/effort we are getting our files in
so no we repeat the process of hitting tab and pressing 2 3 4 5 when needed and in a matter of seconds we scan type in this command line
pdftk my.magazine.part1.pdf my.magazine.part2.pdf my.magazine.part3.pdf my.magazine.part4.pdf my.magazine.part5.pdf
lets see you type this in on a standard windows command line and see how slowly you do it. LOL!
Next we need to tell pdftk that we are going to join all files together in to one giant file so we add this
cat output my.magazine.pdf verbose
this is the bit we add to the end of what we typed so our entire command line will read like this
pdftk my.magazine.part1.pdf my.magazine.part2.pdf my.magazine.part3.pdf my.magazine.part4.pdf my.magazine.part5.pdf cat output my.magazine.pdf
the verbose command at the end tells shell to echo to the screen what the program is doing. This saves you guessing what the heck is going on. Cos if you don't do this when you press enter you get no feedback from the program
so press enter and watch the pages fly by. In a matter of seconds you will back at your command prompt with a flashing cursor.
No look at the directory and you will see the final bit of gear called
my.magazine.pdf
open it up with document viewer and scroll down and admire the 100 page document you have pieced together.
Now.. have a lookie at the file size. I get it will be around 110 megabyte or perhaps a little more.
so. now to convert this to djvu format and keep the high quality pages but lose the file size and make it smaller
so fire-up shell and type in
pdf2djvu -o my.magazine.djvu -d400 -v my.magazine.pdf
ok to explain this a little we have told the program that the output file will be called
my.magazine.djvu
we told it we want it to compress using 400 dpi with this
-d400
we told it we want to see some output to the screen so we know that the heck is going on
-v
and we told it the name of our original file is
my.magazine.pdf
now press enter.
You will see something like this
aussie.coco.june.1988.pdf:
- page #1 -> #1:
- image size: 3199x4332
- 353010 bytes out
- page #2 -> #2:
image size: 3199x4332
(I have deleted many pages out of this bit)
- 341857 bytes out
- page #76 -> #76:
- image size: 3167x4332
- 450144 bytes out
0.210 bits/pixel; 3.858:1, 74.08% saved, 105702515 bytes in, 27394816 bytes out
so you get the idea.
Now look in the directory and you will see your .djvu file and your original PDF pieces and your final PDF.
Ditch the files that are the .part1.pdf stuff and keep the pdf and djvu files.
Dont destroy the PDF files. PDF file originals are easier to work with then djvu files. So do any editing to the PDF and then remake the djvu file.
Piece of cake.
Hope this might help someone else who wants to do their own scanning
More information about the Coco
mailing list