How to reduce PDF file size

Convert PDF, Fonts, Images, Manipulate PDF, Shapes
5/13/2014

Downloads

Adobe Acrobat has an option to save a document with a reduced size. This article shows how to reduce pdf file size with PDFKit.NET.

Note: For small PDF documents, the size may actually increase.

Adobe Acrobat

Adobe Acrobat has an option to save PDF files with a reduced size. This option can be found in a submenu under File - Save As... For some PDF files this results in a remarkable reduction in size. See the PDF file includeded in the downloads of this article. Close inspection of the result reveals that Adobe Acrobat does 2 things for this document:

  • Large images get subsampled at a much lower resolution.
  • Fully embedded fonts get replaced by a subsetted font.

PDFKit.NET

PDFKit.NET can do the same, via the Page.CreateShapes() method. The most interesting part of the sample that you will find attached is the code below. It inspects each shape on a page, and replaces some of them by a modified shape:

  • Image shapes get replaced by a new image shape with a lower resolution.
  • Text shapes that use an Arial font get modified to refer to a subsetted Arial font.

C# code sample

1 static void reduceFileSize(ShapeCollection shapes, int dpi) 2 { 3 for (int i = 0; i < shapes.Count; i++) 4 { 5 Shape shape = shapes[i]; 6 7 if (shape is ShapeCollection) 8 { 9 // recurse 10 reduceFileSize(shape as ShapeCollection, dpi); 11 } 12 else if (shape is ImageShape) 13 { 14 shapes.RemoveAt(i); 15 ImageShape downScaled = downScale(shape as ImageShape, dpi); 16 shapes.Insert(i, downScaled); 17 } 18 else if (shape is TextShape) 19 { 20 TextShape textShape = shape as TextShape; 21 22 var fontName = !string.IsNullOrEmpty(textShape.Font.FamilyName) 23 ? textShape.Font.FamilyName.ToLower() 24 : !string.IsNullOrEmpty(textShape.Font.Name) 25 ? textShape.Font.Name.ToLower() 26 : string.Empty; 27 28 if (fontName == "arial") 29 { 30 shapes.RemoveAt(i); 31 32 TallComponents.PDF.Fonts.Font subsetted = 33 TallComponents.PDF.Fonts.Font.Create("Arial", false, false); 34 subsetted.EmbedMode = TallComponents.PDF.Fonts.EmbedMode.Subset; 35 textShape.Font = subsetted; 36 shapes.Insert(i, textShape); 37 } 38 } 39 } 40 }

VB.NET code sample

1 Private Sub reduceFileSize(shapes As ShapeCollection, dpi As Integer) 2 For i As Integer = 0 To shapes.Count - 1 3 Dim shape As Shape = shapes(i) 4 5 If TypeOf shape Is ShapeCollection Then 6 ' recurse 7 reduceFileSize(TryCast(shape, ShapeCollection), dpi) 8 ElseIf TypeOf shape Is ImageShape Then 9 shapes.RemoveAt(i) 10 Dim downScaled As ImageShape = downScale(TryCast(shape, ImageShape), dpi) 11 shapes.Insert(i, downScaled) 12 ElseIf TypeOf shape Is TextShape Then 13 Dim textShape As TextShape = TryCast(shape, TextShape) 14 15 Dim fontName = If(Not String.IsNullOrEmpty(textShape.Font.FamilyName), textShape.Font.FamilyName.ToLower(), If(Not String.IsNullOrEmpty(textShape.Font.Name), textShape.Font.Name.ToLower(), String.Empty)) 16 17 If fontName = "arial" Then 18 shapes.RemoveAt(i) 19 20 Dim subsetted As TallComponents.PDF.Fonts.Font = TallComponents.PDF.Fonts.Font.Create("Arial", False, False) 21 subsetted.EmbedMode = TallComponents.PDF.Fonts.EmbedMode.Subset 22 textShape.Font = subsetted 23 shapes.Insert(i, textShape) 24 End If 25 End If 26 Next 27 End Sub

The code for creating a downsampled image can be found below.

C# code sample

1 static ImageShape downScale(ImageShape image, int dpi) 2 { 3 Matrix matrix = image.Transform.CreateGdiMatrix(); 4 PointF[] points = new PointF[] { 5 new PointF(0, 0), 6 new PointF((float)image.Width, 0), 7 new PointF(0, (float)image.Height) 8 }; 9 matrix.TransformPoints(points); 10 11 // real dimensions of the image in points as it appears on the page 12 float realWidth = distance(points[0], points[1]); 13 float realHeight = distance(points[0], points[2]); 14 15 // given the desired resolution, these are the desired number of cols/rows of the optimized image 16 int desiredColumns = (int)(realWidth * ((float)dpi / 72f)); 17 int desiredRows = (int)(realHeight * ((float)dpi / 72f)); 18 19 if (desiredColumns < 5) return image; 20 if (desiredRows < 5) return image; 21 22 // create the new image and copy the source image to it (resampling happens here) 23 using (Bitmap bitmap = image.CreateBitmap()) 24 { 25 if (desiredColumns > bitmap.Width) return image; // prevent upscale 26 if (desiredRows > bitmap.Width) return image; // prevent upscale 27 28 Bitmap optimizedBitmap = new Bitmap(desiredColumns, desiredRows, PixelFormat.Format32bppArgb); 29 using (Graphics graphics = Graphics.FromImage(optimizedBitmap)) 30 { 31 graphics.DrawImage(bitmap, 0, 0, desiredColumns, desiredRows); 32 } 33 34 ImageShape optimized = new ImageShape(optimizedBitmap, true); 35 optimized.Compression = Compression.Jpeg; 36 optimized.Width = image.Width; 37 optimized.Height = image.Height; 38 optimized.Transform = image.Transform; 39 40 optimized.Opacity = image.Opacity; 41 optimized.BlendMode = image.BlendMode; 42 optimized.Transform = image.Transform; 43 44 return optimized; 45 } 46 } 47 48 static float distance(PointF a, PointF b) 49 { 50 return (float)Math.Sqrt((a.X - b.X) * (a.X - b.X) + (a.Y - b.Y) * (a.Y - b.Y)); 51 }

VB.NET code sample

1 Private Function downScale(image As ImageShape, dpi As Integer) As ImageShape 2 Dim matrix As Matrix = image.Transform.CreateGdiMatrix() 3 Dim points As PointF() = New PointF() {New PointF(0, 0), New PointF(CSng(image.Width), 0), New PointF(0, CSng(image.Height))} 4 matrix.TransformPoints(points) 5 6 ' real dimensions of the image in points as it appears on the page 7 Dim realWidth As Single = distance(points(0), points(1)) 8 Dim realHeight As Single = distance(points(0), points(2)) 9 10 ' given the desired resolution, these are the desired number of cols/rows of the optimized image 11 Dim desiredColumns As Integer = CInt(realWidth * (CSng(dpi) / 72.0F)) 12 Dim desiredRows As Integer = CInt(realHeight * (CSng(dpi) / 72.0F)) 13 14 If desiredColumns < 5 Then 15 Return image 16 End If 17 If desiredRows < 5 Then 18 Return image 19 End If 20 21 ' create the new image and copy the source image to it (resampling happens here) 22 Using bitmap As Bitmap = image.CreateBitmap() 23 If desiredColumns > bitmap.Width Then 24 Return image 25 End If 26 ' prevent upscale 27 If desiredRows > bitmap.Width Then 28 Return image 29 End If 30 ' prevent upscale 31 Dim optimizedBitmap As New Bitmap(desiredColumns, desiredRows, PixelFormat.Format32bppArgb) 32 Using graphics__1 As Graphics = Graphics.FromImage(optimizedBitmap) 33 graphics__1.DrawImage(bitmap, 0, 0, desiredColumns, desiredRows) 34 End Using 35 36 Dim optimized As New ImageShape(optimizedBitmap, True) 37 optimized.Compression = Compression.Jpeg 38 optimized.Width = image.Width 39 optimized.Height = image.Height 40 optimized.Transform = image.Transform 41 42 optimized.Opacity = image.Opacity 43 optimized.BlendMode = image.BlendMode 44 optimized.Transform = image.Transform 45 46 Return optimized 47 End Using 48 End Function 49 50 Private Function distance(a As PointF, b As PointF) As Single 51 Return CSng(Math.Sqrt((a.X - b.X) * (a.X - b.X) + (a.Y - b.Y) * (a.Y - b.Y))) 52 End Function

The resulting document looks as follows. Its size is only 120 KB, as compared to the orignal 4,583 KB.

If you look closely, you will see a red cross through the image. This is normal: we add this deliberately in unlicensed versions of our software for images that get extracted via Page.CreateShapes().

Reduce-Size-PDF-c-sharp-and-vbnet.png