Create tagged PDF

None
None
9/1/2017

What is a tagged PDF?

A tagged PDF has additional information that is referred to as the logical structure of the PDF. The logical structure is a hierarchy of structure elements that exists separately from the visual content. The logical structure is associated with the visual content by bi-directional pointers. A piece of visual content is said to by tagged if such a pointer exists and the corresponding structure element is referred to as the tag.

As of version 5.0 of PDFKit.NET, you can create tagged PDF documents. The following code snippets demonstrate the tagged PDF API.

Initialize logical structure

The following code initialized the logical structure:

Document document = new Document();
Assert.IsNull(document.LogicalStructure);
Assert.IsFalse(document.IsTagged);

document.LogicalStructure = new LogicalStructure();
Assert.IsTrue(document.IsTagged);

Tag a piece of text

Page page = new Page(PageSize.Letter)
document.Pages.Add(page);

TextShape text = new TextShape("hello world", font, 12);
page.Overlay.Add(text);

Tag paragraphTag = new ParagraphTag(document.LogicalStructure.Root);
text.Tag = tag;

Note how the paragraph tag is added as a child of the root tag by passing the parent tag to the constructor.

Editing the role map

The Tagged PDF conventions list standard roles for tags such as . It is possible to introduce application specific roles. If you do, you should also provide mappings from these custom roles to the nearest standard roles. This will help tools that process tagged PDF to deal with your roles in the best way possible. This can be done as follows:

logicalStructure.RoleMap["paragraph"] = "myparagraph";

Tag attributes