Tune
Up your PDFs!You'll
find examples of derelict PDFs everywhere, from the simplest to the most sophisticated
websites. Ever
wonder why PDFs often get (and earn) a bad rap for usability? Missing navigation
and interactive features (bookmarks, links, form-fields, optimal display settings,
security and accessibility features) force every user to spend more time getting
the information they came for. What's
the most common single complaint? Files are simply far too large, or their size
is unaddressed and therefore poorly handled, contributing needlessly to server
and network loads. The result is poor performance, which can exclude low-bandwidth
users, and frustrate even those with lots of bandwidth. Worse, most crude attempts
to control file size often leads to poor quality, fuzzy and distorted images. What's
the simple problem nearly everyone gets wrong? Believe it or not, missing or bogus
Document Information metadata is one of the chief causes of user frustration.
Why? From Google to IIS to other Content Management Systems, file metadata (usually,
the Title) is used to display the file in search results to users looking for
information. lacking a Title, the user often can't pick out their file from the
other search results. To
ensure that you deploy quality PDF files, first determine precisely what you need
them to do. If you envision them downloaded for future reference, for example,
then you'll want to include links back to your website. If your users typically
possess low-bandwidth connections, you'll spend more time ensuring your files
are as small as possible. If good images are very important to your users (for
example, with product literature), then you'll want to focus on retaining quality
over shrinking the file- size. Some
other key considerations for preparing PDF files for online deployment: Decide
what version of Reader you need to support. Reader versions 4, 5, 6 and 7
give you different options for image compression and security, but not all users
have upgraded. Perhaps you should prompt them to do so? Perhaps some of them can't
upgrade? Consider
the length of the document. Longer documents have special characteristics
that imply different handling. Bookmarks really help to alleviate user frustration.
Longer documents, especially those with many images, can be much larger in size,
so you might be choosing between a reliance on byteserving, delivery by the chapter,
offering high and low resolution versions, or simply prompting users to download
before opening locally. You may want to provide a "portal" PDF as well. Make
every PDF open in a way that appropriate to the content. For presentation
PDFs, make them open Fit Screen. For portrait-oriented documents, Fit Width is
usually best... unless the document opens to a full spread, in which case different
options are in order. Users are put off by PDF files that don't "present"
well. Ensure
Document Information fields are filled. Missing this step will compromise
search results forever - don't do it. To
reduce file size, try higher compression before reducing resolution. Reducing
image resolution is your last resort for reducing file size. Perform
a "Save As" as your last operation, and be sure Fast Web View is
set to "Yes". What's
your PDF Strategy? Give us a call. We're here to help. Find
out about DSI's unique PDF Tune Up Consulting SUCCESS
STORY: The Journal of Light ConstructionThe
Journal of Light Construction is the only 100% paid-subscription magazine read
my remodeling and building pros nationwide. The staff and writers of the Journal,
based near Burlington, Vermont, have spent almost two decades building a huge
repository of advisory, educational and how-to information for the building and
construction trades. In
1998, the JLC began a program to leverage their content with a for-sale CD-ROM.
The full-scale conversion of their legacy content from paper back- issues to publication-quality
PDF files formed the bulk of the initial expense, but, JLC has covered this cost
many times since. A
huge hit with the readership since introduction, JLC's CD-ROM now contributes
a regular revenue stream, boasts it's own subscription base and helps drive web-site
sales. Almost as importantly, the disc has helped cement JLC's reputation as a
leader in serving the construction industry with timely and valuable information. In
2005, the JLC will release the eighth CD-ROM set in the series. Spanning two discs,
the collection may be installed to Windows or Mac computers, or simply accessed
from the CDs themselves. Updated and expanded, the disc is set to make another
substantial contribution to the bottom line for the 8th year running. Want
to know more? Read the Case Study. (PDF,
297kb) Visit
the Journal of Light Construction web
site, and see how they market the JLCD-ROM. Is
there an opportunity for your publication? Learn
more about DSI's solutions for periodicals. What you
need to know about OCRThere
are a few simple truths about OCR that software manufacturers and service bureaus
generally don't bother to mention. Keeping our eye on these balls is part of how
DSI has stayed on the leading edge of publication imaging since 1996. OCR
for Search or for Extraction? These are two radically different requirements,
but OCR software usually doesn't assist users in making this decision. If you
are OCRing to make a document searchable, then automated OCR will often deliver
acceptable results, particularly on simple or very clean documents. However, if
you are OCRing to extract and reuse the text in Word or as XML, you should factor
in the cost of a manual correction phase before the OCR work is done. It's far
better to correct OCR errors using capable OCR software than afterwards! Know
your text. Have you really looked at the text you are trying to OCR? If there
is handwriting present, you may need to aggressively clean the scans pre-OCR.
Do you see lots of italics? These tend to be search terms on many types of content,
and italics are tough on OCR engines. Consider scanning at a higher resolution
(400 or 600 dpi). If you are OCRing articles, perhaps you'll need to capture the
image-captions as well. Often, caption text is small, or located on a shaded background.
If this text must be captured, you'll need to descreen your images and allow for
some manual correction to assure good OCR results. Tables add other challenges,
because the grid-lines can interfere with the OCR process. The only solution is
to TEST your engine and EXAMINE the results closely before proceeding. Know
your images. It is vital to consider the implications of using color or bitonal
scans or deciding whether and how they should be mixed. OCR results on color images
are FAR more sensitive to image- quality than are black-and-white scans. Color
scans are far larger, and require more attention to resolution and compression
in the final product. Consider a process such as DSI's MultiResolution for documents
with occasional color images. Learn
more about DSI's Imaging Services! |