Line-breaking, Kerning and HTML conversion

5/19/2011 By Frank 0 comments

One of our customers reported a problem with our HTML to PDF converter: we broke lines at different positions than the major browsers did. For them this was a big issue because of the WYSIWYG requirements of their report editor and the pagination consequences of inconsistent line breaking behavior.

We took a first close look at the issue and concluded that line-breaking was inconsistent across the major browsers and even across versions of the same browser. Because the CSS specification does not clearly state line-breaking rules, we decided that we could not solve this issue without introducing a switch in our API that would force line-breaking compatibility with a given browser/version combination. This in itself seemed lunacy and an engineering nightmare and so not a path we would like to walk.

Next, the customer provided as screenshots showing that all major browsers have identical line-breaking behavior when rendering to screen (@media screen):

Chrome

html-to-pdf-1.png

Safari

html-to-pdf-2.png

IE 8

html-to-pdf-3.png

WebToPDF.NET

html-to-pdf-4.png

Because we render to PDF, we had taken a look at the line-breaking behavior of the major browsers when printing (@media print). And then, behavior is inconsistent. Or more specifically: all major browsers, except IE8 and IE9, break as expected.

Taking a closer look, we concluded that when rendering to screen, kerning is applied by all major browsers. But when printing, IE8 and IE9 suddenly drop the kerning information available in a font. We are clueless to why IE8 and IE9 behave differently with respect to kerning when printing.

We are now working on respecting the kerning information available in a font. This information is stored inside the GPOS table. You would actually expect this information to be available in the (more straightforward) kern table. But this table is never used by real-life fonts.

As always, comments are welcome!