This post was first published on the devblog. The blog went offline in 2012, so I migrated the posts to my personal blog.

Generating PDFs in a Rails application is a fairly common task. Maybe you want to create a letter, report, document or maybe an invoice. Either way the stuff that normally ends up in an PDF is important and you want to make sure the right stuff ends up there.

This pretty much sounds like a case for automated testing. But how do you test PDF content? One option would be to generate the PDF and then create a HTML out of the PDF using pdftohtml, parse the HTML and make some assertions. As you can guess, this approach isn’t very feasable, because the generated HTML isn’t very easy to parse.

Most of the time PDF generation in Rails applications is done using the RTex Plugin – the PDF is generated via LaTeX. This makes testing a lot easier because you can just parse and check the generated LaTeX-Source.

Everyone how has seen a LaTeX source file may ask: How the hell do I parse that?

In our case we added some “helper” comments like ”% SUM BEGIN” and ”% SUM END” before and after the part we were interested in and then used basic RegEx to parse out the interesting part. You have to manually check that the markup still looks as expected due to the newline handling of LaTeX (one is ok, two = new paragraph). Most of the time it is sufficient to look for ERB-Tags and use < %- instead of < %.

This approach works pretty well for us. One question which you should always keep in mind when you write tests is: What do I test on this level of testing and what do I leave out.

For the PDF/LaTeX-Testcase we choose to test the basic interaction between the objects that provide values for the PDF generation and the Template. We don’t test all combinations, just a few basic cases. Testing all or at least a lot of combinations, edge cases etc. is clearly a concern of unit tests.

Related Posts