First set of candidate veraPDF corpus files delivered

Duff Johnson // May 18, 2015

News


As the veraPDF project gets under way the project is generating the first test files for PDF/A-1, complementing the Isartor test suite.

Dual Lab, the veraPDF consortium’s lead developer, has loaded the first set of 49 candidate test files to the public veraPDF github repository.

The test files can be found at the veraPDF corpus for PDF/A-1b (under development) along with the wiki page describing the set.

All test files follow the pattern of the Isartor Test Suite:

  • naming convention refers to the corresponding subsection in ISO 19005-
  • they are all atomic
  • they are self-documented via PDF bookmarks

However, unlike Isartor, these files also contain “pass” tests.

There is one remarkable file to note:

6-1-12-t07-fail-a: Maximum number of Indirect objects (8,388,607) in PDF file is exceeded (the file is about 40Mb zipped)

Screenshot of File Being Repaired dialog.The document cross reference table contains more than maximum allowed number of records, violating PDF/A-1 implementation limits.

Warning: Be careful trying to validate this file in Adobe Acrobat! It will probably open after 30 seconds of thrashing, but it will hang on preflight checks.

ABOUT THE AUTHORS

Duff Johnson

Founder of Document Solutions, Inc. in 1996, Duff Johnson is a 23 year veteran of the electronic document space and a recognized leader in the electronic document technology industry. Now an independent consultant, Duff serves the PDF industry as ISO Project co-Leader (and US TAG chair) for ISO 32000 and ISO 14289. Previously Vice Chairman of the Board, Duff Johnson …

ABOUT THE AUTHORS

Duff Johnson

Founder of Document Solutions, Inc. in 1996, Duff Johnson is a 23 year veteran of the electronic document space and …

© 2019 Assosiation for Digital Document Standards e.V. | Privacy Policy | Imprint