Sunday, November 18, 2007

PDF (and DTP) rant

Yes, you probably already guessed what the subject of this rant is.

The usual nightmare - the client sends you a text to translate, but it's in PDF! And, what's even worse, you need to deliver not only the translation, but they want full DTP service - meaning they want the finalized PDF of the translated material, ready for printing :(

Of course, the PDF is a low-resolution version, and the illustrations will have to be "upped" to 300DPI, as required by the printing service vendor.

So, you try to explain that it's not done that way, that you cannot simply "reconstruct" the translated PDF from the low-res PDF you received. You try to argue that it simply cannot be done - at least not at the usual rates and within the usual timeframe.


However, the end client isn't aware of the technical problems, and you need to "simplify" the explanation of the problem, without getting too technical.

(As an aside, I'm constantly amazed by the fact that so many people think that, in order to get a translation of a 200-page manual you just need to press a few keys on the keyboard and - presto! - the perfectly printed (and bound) 200-page full-color book comes out of your office printer!)

But, to get back to the point, I usually have several "colorful" analogies in reserve, in order to explain WHY it isn't so easy to do. One of my favorites goes something like this:

"Well, to put it simply, you're providing me with 400 pounds of pork sausages and expecting me to return a 400-pound pig that happily runs around the pigsty.... Can't be done, sorry...."

So, somehow you manage to make them understand that it's no easy feat to get the final PDF which is just barely acceptable for printing, you agree on the price and the deadline, and now the real work begins...

The problem of PDF originals is multiple - first you have to extract the text (preferably preserving at least some of the formatting), then you have to extract the illustrations (and increase the resolution, if necessary), making sure nothing has been left out.

Then you have to do the DTP from scratch, using any of the usual DTP programs (in our case it's usually InDesign).

However, none of the problems stated are simple.

First, you cannot easily extract text from PDF - at least not in a usable format, which would not require additional "manual" work. For this purpose I still use Acrobat 5 and its "Save as RTF" function. It exports all of the text, but all lines of text usually end with line breaks, which have to be removed, so that the text can be translated using Trados or whatever TM software you use.

Acrobat 6 and 7 do this better, and usually do not place line breaks at the end of every line of text, but they have other, more serious faults - one of the most serious being that they tend to arbitrarily skip portions of the text when exporting to RTF - which makes them unusable.

There are other non-Adobe solutions, one of them being Abby PDF Transformer. It uses OCR principle, and actually works OK. However, it often does not preserve formatting (bold and italic text, etc.), so, again, it's not a perfect solution.

Most of the translations we do this way are manuals of one kind or another, mostly containing references to commands which should appear as bold text, like this:

To open the file, press Open, and browse to the file you want to open.

So, it's extremely important to have the bold formatting applied to appropriate strings...

In order to avoid any errors in the DTP process, and to make the DTP process as fast as possible, we usually tag our PDF-exported RTF files prior to translating - checking whether all the necessary formatting (usually only italic and bold) has been retained. It's a manual work, and sometimes it takes a long time, since we also insert special tags as placeholders for pictures and symbols, so that the text which will be translated looks e.g. like this:

To open the file, press Open or use the <PIC> icon on the toolbar.

(Otherwise, the DTP crew would have to find the locations of all those tiny icons themselves, wasting significantly more time, not to mention the increased possibility of errors!)

Then, the translation is done the usual way (Trados + Word), and the final, translated RTF is then processed using the VBA Word macro we developed in-house for export to InDesign tagged text. The tagged text has special formatting (red color) for our picture tags, and also some other stuff which makes the life of our DTP crew a lot easier when doing the DTP "from scratch".

However, the whole process is still extremely labor-intensive and time-consuming - and the end result is usually still just barely "acceptable" (since you obviously can't e.g. "upsize" a picture from 72DPI to 300DPI without any loss of quality...)

I don't have to mention that for this type of work we charge a lot more than when we work from "normal" DTP files.

But, even when we get the original InDesign (or any other DTP) files with all the links and fonts, the situation isn't always as clear as it might seem - often the original INDD files were prepared by... hmmm, let's say "less than professional" people, and are best left alone. In such cases we just use the original links (pictures), and create new INDD files "from scratch" - which is usually a lot faster than trying to use the originals we receive.

By "less than professional" I mean the situation where you have e.g. 12 InDesign paragraph styles defined, and when ALL of those styles applied in the document use manual overrides :(

Or the situation where e.g. indented text in INDD file is "created" using tabs, or - even worse - spaces... :(

You get the picture....

Labels: ,

4 Comments:

Anonymous Anonymous said...

Some customers aren't able to understand that a translator is simply a translator and not also a graphic designer. This kind of work should be paid apart! Now I'm working as in-house translator and my agency try to follow what the customers want, but when they send us a PDF file which can't be converted, we simply send them back a word file, trying to respect the original format.

December 1, 2007 at 10:52 AM  
Anonymous Anonymous said...

Even worse, the usual converters sometimes frame part of texts, and Trados will then not identify these parts, nor would the word spell checker. An agency taught me that this is a common OCR problem, and asked me to retype any text with PDF format (with no additional cost, of course).

December 12, 2007 at 2:25 AM  
Blogger Alliandre said...

Hi colleague :) No more rants? You must be a very lucky translator :-) ;-)

January 11, 2008 at 3:03 PM  
Anonymous Anonymous said...

Sometimes I would like to take one of these clients (most of the times, rampant wannabe-managers who don't even know what they're talking about), and make them do the whole job!!!
Not only do they ask you to translate some 20.000 words in - say - 3 working days, but they want it perfect and with the images, frames, charts, graphics and a champagne bottle on top of it.
I'm just discussing such a job... I don't know if I will survive!

January 21, 2009 at 9:27 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home