Poor quality of scanned text on web pages.

  Jim Thing 17:25 31 Aug 2006

Many of the pages of my website consist of a .GIF graphics file (i.e. a page of scanned text) enclosed within a standardised house-style frame (click here for an example).

My problem is that I've never been happy with the display quality of the scanned text. I'm currently using an Epson Perfection 3490 Photo scanner (bought recently to replace an HP Scanjet 3500c) and I always use a resolution of 72ppi; also, in the interests of faster downloads, I try to keep the size of each .GIF to less than 30kB wherever possible.

My question is: how can I achieve consistent, crisp black text on white background ? Should I scan initially at, say, 300, then adjust contrast and brightness, then optimise to 72? Or would I be better to start at 72 and keep it that way? Or is there a better way still? I've tried all kinds of different things — in an admittedly unsystematic way — but nothing seems to make much difference. The sole purpose of each web page is to enable people to read the
text, so there's not much point in displaying a low-resolution thumbnail and inviting the reader to click on it and wait for a better-quality graphic to download.

Ideally I'd like a solution that involves as little Photoshop manipulation as possible, as I'm about to start work on a batch of about 600-700 scans.

What am I doing wrong? Or am I simply expecting too much from an inexpensive scanner?

All advice gratefully received.


  Jim Thing 20:04 31 Aug 2006

Apologies for delayed response — I've been trying out your suggestion. Unfortunately it didn't work too well for me.

However, I've also tried scanning a typical page as a 72dpi .tif right from the start, then setting the image size to 110% and optimising without any further 'treatment' at all. I tried it both as .gif and .jpg; neither of them is just as good as I'd hoped for, but both are more legible than earlier efforts and certainly acceptable as far as I'm concerned. The .jpg weighs in at 69.95kB (whereas I try to limit them to 30kB max) while the .gif is only 23.5kB, so I think I'll go with that method.

Why choose .gif in the first place? Because I figured I wasn't dealing with zillions of colours, just two tones (black and white) plus perhaps a couple of intermediate greys to help the definition along a bit.

Many thanks fourm member. I'll leave the box unticked for a while to see if anything else turns up

  ade.h 20:21 31 Aug 2006

Saving the scan in an uncompressed format is essential as it gives you the flexibility to decide how to compress it and to what degree it is compressed.

Do some research online to get more familiar with image formats.

  splatter 20:27 31 Aug 2006

Although I'm not sure exactly what method your using you could always try scanning the text and then running it through an OCR program so you end up with standard (although formatted) text. Once that is done you could then paste it into your webpage.

I appreciate that this may be more bothersome as you MAY have to read each scanned text after running it through the OCR program to make sure its all OK, but you will end up with a better site (in my opinion) as the text will scale down on lower resolutions and can be viewed on mobile devices, which nowdays are quite popular.

  De Marcus™ 20:36 31 Aug 2006

You may want to rethink how you display this text to your users, reducing something that's originally designed to be readable at A4 paper size down to less than 50% it's original size is never going to be easy on the eyes.

How about using OCR software and displaying the text as actual text, perhaps making your frames a little bigger to accomodate? It would take a bit longer (correcting OCR mistakes, processing, etc) but ultimately take up less server space and be much, much easier to read (and not slanted either ;-)

click here

  De Marcus™ 20:37 31 Aug 2006

Snap, splatter and pop!

Sorry :-)

  Jim Thing 21:37 31 Aug 2006

That sounds like good advice. I'll see what I can swot up before I start this monster scanning job (it's a complete set of school magazines and other stuff from my old school, dating from 1946 to 1982 when the school was closed by the local Political Correctness Gestapo. The hoard totals 600-700 pages (maybe more) and is eagerly awaited by a world-wide readership of at least twenty people :-)

I've tried the OCR solution in the past when I was unable to get any kind of acceptable result from some ancient originals scanned as graphics. It was very successful — but also very labour-intensive and to be honest I don't fancy the drudgery of doing it on 600-700 pages.

De Marcus™:
The originals I'm dealing with are from the days before ISO sizes became the norm, and I'm not reducing them. The page size is actually 6-3/8" x 9-1/8" (remember inches and eighths?) and I'm putting them onto the website at 110%.

Many thanks everyone.


This thread is now locked and can not be replied to.

Elsewhere on IDG sites

Galaxy Note 8 vs iPhone X

Graphic tees: 14 best websites to find your next T-shirt

How to update iOS on iPhone or iPad

Les meilleures applications pour enfants 2017