This blog describes how fonts are included (a.k.a embedded) in PDF documents

Fonts in PDF Documents

This blog describes how fonts are included (a.k.a embedded) in PDF documents.

If you want to see which fonts are embedded in a PDF document, then just open it with Adobe Reader and press control-D. Under the 'Fonts' tab you get information like this:

So what does a term like "Embedded subset" actually means?

Embedded subset fonts

If a PDF document has an embedded subset font, it means that the documents contains all the information that is needed to draw the used characters of that font. In this example the PDF contains the text "This is a text", and therefore it contains the glyphs (a.k.a. outlines) of these characters.

A glyph defines how a character looks like. Each glyph is basically a set of drawing instructions. A program like PDFRasterizer uses glyphs to draw the characters of a PDF document into a bitmap.

There is only one glyph for the character 'i', although it is used multiple times in the text. Even if the 'i' is used small, large and very large font-sizes the same glyph will be reused.

Many characters like "bcdfghjklm..." are not used in the PDF document and so there is no need to embed the glyphs of these. So the PDF document will only embed the glyphs of the characters that are actually used. This keeps the PDF document small in size even if the used fonts contains a huge number of glyphs (think of fonts that contains Korean, Japanese or Chinese glyphs)

Note on forms in which a user can enter text: In PDF with forms that there is no other choice then to embed all the glyphs of the font. There is no way to predict the used characters as the user may enter any possible character. If you create such PDF documents, and if you are concerned about the size of the file, it may be best to avoid embedded fonts.

Standard fonts

There are 14 fonts that are never embedded. These are there Times, Courier, Helvetica, Symbol and Zapf Dingbats, some with variations like bold or italic. These 14 fonts consists of these 5 fonts and some variations. The PDF specification mandates that these must always be available and therefore there is no need to embed these.

These fonts are usually present on the operating system and are used when a PDF file is created or when it is read. These fonts not only contains the glyphs but additional information like the width of each character.

Font substitution

There may be situations in which the used font don't give the desired results, whether they are embedded or not. This is where the font substitution map is designed for. It just maps any font that is used in the PDF to a different one.

This following code sample translates the fonts named "Demo Font" and "Demo Font Bold" to variants of arial.

settings = new RenderSettings(); settings.TextSettings.FontSubstitutionMap.Add("Demo Font", "ariali.ttf"); settings.TextSettings.FontSubstitutionMap.Add("Demo Font Bold", "arialbi.ttf");

This has the effect that all characters that uses the font 'Demo font' will be rendered using the glyphs in the font 'ariali.ttf'.

Note on character widths: In a font each character will have its own width. Usually an 'i' is narrow and a 'm' is wide. But this is not always the case. Most notable are the monospaced fonts like courier an 'i' and the 'm' have exactly the same width. This difference may vary in each font that you use for substitution. So in the font arial these differs significantly but in a font like courier the width is exactly the same.

Because of this the spacing between the characters in text may look ugly. In the picture above the spacing between the character 'm' and 'o' is very different compared to the original on the left. So if you substitute a font then pick one that resembles the character widths of the original as closely as possible.

Font resolving

It is also possible, even common, that a PDF document does use a font that is neither a standard font nor an font that is embedded. In that case, it is up to the application to decide how to render the text. In general this is done by selecting an available font that matches the font parameters closely (most importantly the name but there are more aspects). For the substitution the FontSeachPath is also used, this allows an application to control the fonts that can be used for this.