I work in the print industry and some clients have the naive idea they'll save m...

just_myles · on March 4, 2020

I can cosign on this methodology. I used to work in an organization that used to build pdfs for accounting and licensing documentation. I used a proprietary tool (Planetpress :( ) to generate the documents using metadata from a separate input file (csv or xml) to determine what column maps to what field.

Good thing about this was as you have already outlined: It allowed for some flexibility in what was acceptable input data. For specific address formats or names we could accept multiple formats as long as they were consistent and in the proper position in the input file.

Regarding renegotiating: We didn't get that far. However, if a customer within our organization was enlisting our expertise and could not produce an acceptable input file, then we would go back to them and explain the format that we require in order to generate the necessary documents. Of course, creating our document through our data pipelines is obviously the better choice, but this was not an option in some cases at the time.

As far as doing the work of creating these documents in a tool like Planetpress is concerned, well, don't use Planetpress. You are better of doing it in your favorite language of choice's libraries tbh. Nothing worse than having to use proprietary code (Presstalk/Postscript.) that you have to learn and never be able to use anywhere.

hnick · on March 4, 2020

By re-negotiating I mean in terms of quoting billable hours. A rule of thumb for a typical Postscript scraper was around 20 hours end to end (dev, testing, and integration into our workflow system).

The problem we have with a lot of client files is that they look fine but printers don't care about "look fine", they crash hard when they run out of virtual memory due to poor structure. And usually without a helpful error message, so that's more billable hours to diagnose. The most common culprit is workflows that develop single document PDFs then merge them resulting in thousands of similar and highly redundant subset fonts.