Google
lets your PDFs look their worst!Doubtless,
most readers of this newsletter use Google regularly. Many of us don't let an
hour go by without at least one Google search. Almost
every time I do this, lots of PDF files come up in the search results. And why
not? Google indexes PDF files, and they represent a large portion of the content
that's actually used online, day in and day out. However,
in Google's results, most of the search results listing PDF files looked like
HELL! The blue text displayed by Google (see the image to the right) in search
results comes directly from the PDF Document Information field "Title".
This vital information is either absent, bogus or malformed in a HUGE proportion
of online PDF files. On
ten recent more or less random searches, searches, Google's search results included
an average of 4.3 PDF files on the first search results page of each search. Of
those PDF files, an average of 60% were displayed with TOTALLY meaningless Titles. The
result of this apparent inability or unwillingness of content managers to quality-control
the metadata contained in the PDF files they post online is simple: user frustration. Without
Title metadata, search engines tend to rely on the first text (or gibberish) they
encounter to "stand in" as the Title for the purposes of Search Results.
Worse, many authoring applications leave nonsense information in the Document
Information fields that just looks flat-out unprofessional online. Key
Take Away: Check each PDF's "Description" (in Document Properties)
before posting! See
what the PDFs on IBM's web site look like to Google. Then try your own site! Users
prefer PDF over HTML for scholarly articlesAnnual
Reviews, of Palo Alto, California, has allowed Document Solutions to publish results
of an internal study, showing the distribution of article downloads in 2004 by
file-type. This
data does not include downloads of "legacy" PDF files: materials from
before 1996, and available to Annual Reviews subscribers only in PDF form. If
the legacy content is included, total PDF downloads increase by over 30%. The
data DOES exclude downloads from Google and other search engines. We
clearly see a strong user preference for PDF over HTML, especially in the biomedical
sciences. However, the preference for PDF appears to cross all disciplines, strongly
implying that users prefer to go direct to the PDF rather than "browsing"
the HTML version before accessing the PDF. These
findings are all the more surprising when one considers that it is HTML files
that usually contain interactive features such as hyperlinks and other features
designed to add utility to the document. More page-requests might be expected
from HTML as a result. 2004 downloads of Annual Reviews titles published in 2003
(not shown in the chart) do show a modest gain for HTML vs. PDF usage. It
remains unclear whether users consider the features and seamlessness of HTML browsing
to outweigh the print-readiness, reliability and familiarity of PDF. Indeed, with
Reader 6.0 and now with the latest 7.0, online PDF usage has become increasingly
seamless with the web-browsing experience. With the addition of enhanced capabilities
for PDF files including embedded hyperlinks, 3D objects and movies, there's little
reason to believe the trend will not continue. Visit
Annual Reviews at annualreviews.org. The Many
Uses of CD and DVD-ROMsEven
though they have enjoyed wide usage in direct mail, product and software delivery,
many publishers consider discs passé, a technology that had its day before
the flowering of the internet. Until
recently, however, few publishers have found the Web a major source of profits.
Meanwhile, publishers who added CD or DVD products to their offerings, or leveraged
their marketplace and physical distribution potential on behalf of their advertisers,
have found new revenues AND attractive opportunities to leverage their online
initiatives. For
many publications, a "backstart disc" - the most recent five or ten
years of publication on a single CD ROM - is an easy upsell or promotion for new
subscribers, a potent premium for securing long term renewals, and a valuable
distribution medium for advertisers, often all at the same time. For
magazines with "evergreen" content, historical collections may be a
superb premium ancillary product. One Document Solutions client sells $500,000
worth of CD-ROMs year after year, with largely the same content. Document
Solutions specializes in all-PDF discs, believing that the stability, reliability,
cross-platform usability, flexibility and cost-effectiveness of PDF makes it an
ideal choice for a wide variety of electronic publishing and content distribution
applications. Learn more about
DSI's CD and DVD-ROM development services. |