You know the story. Your client or boss wants to put a report online, it’s just a few pages long, but they want exact print output control of the document or they just want to ensure that the cost is kept to a minimum. So you end up putting the document online as a PDF. Now I ask you is it really readable by everyone. That is normal viewers, assistative technology viewers, search engine bots and viewers in remote areas on slow connections. I can bet that in most cases one of these groups is missing out
So you ignore the problems and just get the job done and get the PDF online. If you have been in the industry for a while you would have done this. Yes even I a have done this, shame on me. Come on, I bet you have too. Well I’m going to discuss how to get over some of the problems you will have created with PDF distribution of information on the Web.
Suitability of PDFs – when to use them
I’m a business realist; I know there are times that you have to use a PDF. But let’s have a close look at the times you should use a PDF the and times you shouldn’t.
When to use
- Large reports, papers, documents
- Books or book chapters
- Legal or procedural forms intended to be printed.
When not to use
- Short brochures and specification sheets (include a HTML version of the information)
- Product catalogue pictures and brochures (or consider an online catalogue)
- Short one of two page documents. These should be in HTML.
The Acrobat distribution problem
There is generally not a major problem with version control of the PDF reader. In that previous versions of the reader can read the base features of the new versions (as long as the security features aren’t activated. However there is one version where all bets are off, as there is a distinct problem.
If you have been working with PDF a while you would have come across this error:
Error: Database: corrupt database
First off you think, hang on – where are the database within a PDF. But there are several data tables of information within a PDF if you think about it. I have found from experience that this only tends to occur if the document is being read by Acrobat reader below version 5.02.
Problem is there are a lot of computers in the corporate world; especially government, that still have Acrobat Readers that are version 5.0. It seems there was massive push with distribution of version 5.0 This was for Adobe a double edged sword. As this leaves them with a large number of corrupt readers that would nor read version 7+ files. The release of the voluntary patch to 5.01 and 5.02 was frankly a waste of time as it’s well know that IT sections don’t apply non operating systems patches. Hence till the OS is updated the problem remains for most corporate users.
Or does it? Try removing some of the accessibility settings when you make the PDF. The main one in case is the “Enable accessibility and reflow with Tagged PDF”. You will find this under Settings, in the Application Settings section. Ensure it is NOT checked. That’s right I know it’s insane but uncheck it.
Of course the other alternative is to get the user to upgrade to version 7 or above. Good luck with that. So here is a classic case of good distribution can come back to bite you. But what do you do with Quark or InDesign created documents. Well I don’t have a solution for that case, yet.
You have a PDF it’s well over 5 Meg in size and you really want it to be about 1 Meg or less. So how do you get it smaller? You have tried the optimise settings in Acrobat, but still it’s just way too big. First things first, you may have to have the PDF rebuild. There are a lot of things that we as designers do, in our rush to get the job done, that make the resultant PDF, as a vector based document, bloat to an unworkable size. Here is a short list of things you can do to make the document smaller file size wise.
- Check the InDesign files for hidden images or vector shapes. You are looking for things that are hidden under other layered images or objects. Look for items off to the side of the Layout, this is still placed in the PDF, if its not required remove it.
- If you have an image that has a mask over it within InDesign, there can be wasted pixels there. You need to redo the image so that it is clipped or cropped as close as you can get it to the mask edge. As the complete image is rendered underneath the mask layer in the PDF even if you only see 10% of it.
- Ensure all gradients are vector generated and not bitmap images (Tiff, Jpeg etc). This is especially true of that groovy Illustrator gradient you are just been working on; in this case render them as a true vector EPS file and place this into InDesign.
- Use Jpegs that are optimised instead of Tiffs. Jpegs have a smaller initial footprint.
- Use pictures that have large blocks of the same colour in them and not great amounts of detail or different colours. Bright and colourful is often better.
- Use black and white or duo tone images.
- Use Adobe smart object when you can for cross platform object they seem to help a little, especially in CS2. The compression on Smart Object is very good for some reason (I have no idea why).
- Then export file as PDF, assuming you have all the accessibility tags in place too.
- Optiumise all images as if you are doing it for the web before you insert them into the word document
Now start work in Acrobat, use the PDF Optimiser tool under Advanced > PDF Optimizer. Remember to save versions of your file as you optimise. Things to consider while optimising:
- Select Flatten transparency
- Clip complex regions
- Don’t disable embedded fonts unless you really have to. You still want you typography to remain as you designed it.
- Down sample the images step by step, soon as you notice a difference at a glance take the previous version, you have optimised too much
- Remember to clean up any referenced structure elements like bookmarks that are not referencing the pages in the document. This is important if you are chopping a document into different chapters.
- Don’t ever try and combine two documents together or build a document from separate InDesign page exports it will always be a lot larger than the single run InDesign export.
Yes Google and most search engines will index your PDFs. But please ensure you do a few things to help them along. It does help if you complete the Metadata section of the Description tab in the PDF (Ctrl-D) :
- Complete the Title field, this becomes the link text
- The Subject field, this maybe become the description depends on the search engine and the version of PDF.
- The keyword (delimited with semi colons)
You should if you have the time consider completing the Additional Metadata section of this dialog. You can import and export these via an xml file. This will give you a better metadata footprint, and is very handy if you are doing a lot of documents with a standard baseline of metadata.
Now the really interesting bit, did you know up to till version 7 Adobe has produced a different method of presenting the metadata within the file format for nearly every version of Acrobat. In version 6 it is presented in version 5 format and version 6 formats, just to make life interesting. In Version 7+ we have an enhanced version 6 format. Luckily a well written search bot can tell what version the PDF document is written in. Version 8 follows the version 7 format for the most part.
I’m not going to go into a great detail on this, but the bottom line is all the accessibility must occur in almost all cases before you create the PDF. You can do it afterwards, but it does tend to more time consuming the later in the process you add in the accessibility.
Pre Production – MS Word
- Ensure that the document structure is written in using the Heading styles
- That a table of contents is defined,
- That all images and diagrams have an alternative text defined.
- That all links have an screen tip text defined
- With tables that all header rows are check under Table Properties as “Repeat as Header Row at top of each page”
- Ensure the colour contrast is readable for extremes in contrast and colour blindness. Light grey on white is not a contrasting colour scheme.
Pre Production – InDesign
- You have to tag the document either via File Export or by turning on tagging by default for all exported files are tagged. You do this via File > Adobe PDF Presets > Define and click on the presets you want to be tagged and check Create Tagged PDF . If you are doing a high quality print this is the default anyway, but it’s a good idea to check.
- When building the document you need to define headers (semantic structures) that use styles that have the right names, that is h1 to h6. Nothing else, use exactly h1 to h6.
- Check the Structure Panel to ensure you are progressing well with the semantic structure.
- Add alterative text to images, right click the image tag in the Structure Panel and selecting New Attribute. Add “Alt” in the name field (note capital A) and the alternative text in the value field.
- Links and lists will have to be edited post production. Although this may have changed with InDesign CS3.
Post Production – Acrobat
- Ensure the fonts have not been rasterised (scanned), check, you must be able to copy the text at least.
- Use Acrobat 7 or above for the base line, JAWS and other assistative technology may have issues with anything below version 7.
- You can add tags to the document via the Acrobat, but it is not an easy issue, it will take time to complete. View the tags by selecting View > Navigation Tags > Tags.
- If there are no tags, select Advanced > Accessibility > Add tags to Document, to add them
- Select the Content Tab and use the right click on the structure elements to edit the accessibility elements.
- Within the Touchup Properties dialogue remember to use the Tag tab and fill in the elements required there.
- Build the bookmarks structure in the Bookmarks tab for the document via Acrobat. This just takes time.
- Build the thumbnails in the Pages tab. Don’t forget, to set the Page Properties (right mouse click) to Use Document Structure.
PDFs are not just simple build, link and upload document format. If you must use them, use with caution and optimise and make them accessible at all times. Suddenly you will find that quick fix by using a PDF, isn’t really that much of a quick fix after all.
These are the notes from the mini talk “PDF is not your Friend” I gave at the Perth July 2007 Australian Web Industry Association (AWIA) Meeting. There is a podcast and the slides are available as a PDF document (852 k), on slideshare or below.