A document is a record of some (typically written) content - a publication, a contract, a statement, a painting - at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such records included papyrus and vellum, which is basically leather. For a thousand years, more or less, paper has been the media of choice.
That began to change in the 1980s.
PDF became the document format of choice for business, government and the general public because it delivers the key qualities of paper in a digital format. PDF is fixed, self-contained, readily shareable and relatively hard to change. It’s not just PDF’s innate characteristics that make it successful, but the fact that PDF pages interoperate smoothly with paper documents. “PDF it, send it, print it, sign it and return it” workflows introduced new efficiencies when the format surfaced into public consciousness in the mid-to-late 1990s. Even then, such workflows utilized only the most basic of PDF's capabilities, but it was enough to dramatically accelerate the transition to digital documents. Within a few years, PDF files and email decimated document courier services.
Before long, users were scanning the signature page and adding it to (or replacing) the original page in the PDF; the cycle back to a digital document was complete. This new workflow, of course, was an extremely crude approach to facilitating document approvals, but the fact that end-users could do this very easily made PDF very tolerant of variations in workflow and records-keeping practices in a way that’s hard to imagine for databases and HTML.
PDF continues to evolve far beyond a simulacrum for paper. There's a broad suite of features – tagging, XML-based metadata, attachments, 3D support, digital signatures and more – that support advanced document-handling and consuming workflows. PDF is so capable and so reliable, that some wonder why bother with an archival subset at all.
Not every PDF is designed with reliability in mind. For all its well-deserved reputation for reliably conveying the author's intent to any viewer, PDF allows developers to make files that rely on external resources, or use encryption; both capabilities are non-starters for the preservation community. If the world preserves PDF files as documents – and it does – then preservationists need PDF/A.
Introduced in 2005 as ISO 19005, PDF/A is now required or best-practice in workflows that generate valuable documents. Filing cabinets and storage boxes are disappearing as ECM systems, cloud storage and local capacity swallow the documents that used to exist only on paper. When new documents are shared, the common-ground is PDF. When finalized for records-retention purposes, ideally, they are PDF/A.
Some think HTML will “beat” PDF because it’s more flexible and less static, but this misconstrues both formats’ respective purposes and fails to appreciate that browser developers are (slowly) augmenting their support for PDF. PDF continues to gain in mind-share: Google's Trends data shows clearly that the number of searches for PDF documents relative to all other searches continues going up.
PDF’s purpose is to serve in the role of "document", with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. PDF is how you keep it, and PDF/A is how you keep it forever.
Preserving the file’s actual bytes, of course, is up to you.
This is not only the present, it’s also the future. PDF, an open, standardized, broadly-capable digital document technology, has proven equal to the transition from paper to the electronic world. PDF’s advanced metadata, authentication, semantic tagging, attachments, 3D and other features provide a proven framework for future development of digital documents. PDF has no competitors. Even in the world of SharePoint, OpenText, Office 365 and Google Docs, PDF and PDF/A represent the only sufficiently flexible and capable technology for archiving the gamut of digital document content.
(This piece was adapted from a recent blog post)
Founder of Document Solutions, Inc. in 1996, Duff Johnson is a 23 year veteran of the electronic document space and a recognized leader in the electronic document technology industry. Now an independent consultant, Duff serves the PDF industry as ISO Project co-Leader (and US TAG chair) for ISO 32000 and ISO 14289. Previously Vice Chairman of the Board, Duff Johnson …
Founder of Document Solutions, Inc. in 1996, Duff Johnson is a 23 year veteran of the electronic document space and …