What is a tagged PDF?
A tagged PDF has additional information that is referred to as the logical structure of the PDF. The logical structure is a hierarchy of structure elements that exists separately from the visual content. The logical structure is associated with the visual content by bi-directional pointers. A piece of visual content is said to by tagged if such a pointer exists and the corresponding structure element is referred to as the tag.
As of version 5.0 of PDFKit.NET, you can create tagged PDF documents. The following code snippets demonstrate the tagged PDF API.
Initialize logical structure
The following code initialized the logical structure:
Document document = new Document(); Assert.IsNull(document.LogicalStructure); Assert.IsFalse(document.IsTagged); document.LogicalStructure = new LogicalStructure(); Assert.IsTrue(document.IsTagged);
Tag a piece of text
Page page = new Page(PageSize.Letter) document.Pages.Add(page); TextShape text = new TextShape("hello world", font, 12); page.Overlay.Add(text); Tag paragraphTag = new ParagraphTag(document.LogicalStructure.Root); text.Tag = tag;
Note how the paragraph tag is added as a child of the root tag by passing the parent tag to the constructor.
Editing the role map
The Tagged PDF conventions list standard roles for tags such as . It is possible to introduce application specific roles. If you do, you should also provide mappings from these custom roles to the nearest standard roles. This will help tools that process tagged PDF to deal with your roles in the best way possible. This can be done as follows:
logicalStructure.RoleMap["paragraph"] = "myparagraph";