mule-pdfbox-module

PDF Utilities for MuleSoft

Empower your MuleSoft flows with native PDF manipulation powered by Apache PDFBox. This connector provides high-performance PDF operations with no external dependencies.

πŸ” Key Features

  • πŸ“„ Metadata Extraction – Get author, title, number of pages, and more.

  • βœ‚οΈ Text Extraction – Pull text from a specific range of pages.

  • 🧹 Blank Page Removal – Clean your documents before delivery.

  • πŸ” Page Rotation – Rotate document pages as needed.

  • 🧩 PDF Splitting – Break large PDFs into separate single-page files.

  • πŸ“Ž PDF Merging – Combine multiple PDFs into a single cohesive document

πŸ”§ Built For Developers

  • Lightweight, single-dependency module

  • Designed using MuleSoft Java SDK

  • Input/output via standard Java streams

🧱 Under the Hood

  • Built using Apache PDFBox

  • Fully compatible with Mule 4.x

  • Handles page ranges and robust PDF parsing

Implemented Operations:

1. extractPdfInfo

  • Purpose: Extracts document metadata such as number of pages, author, title, subject, and version.

  • Input: InputStream of the PDF.

  • Output: POJO with document properties.

  • 🧱 Under the Hood - PDFDocumentInformation

2. extractTextByPageRange

  • Purpose: Extracts plain text from a given page range.

  • Input: PDF stream + optional startPage / endPage.

  • Output: Extracted text as a string.

  • 🧱 Under the Hood - PDFTextStripper

3. filterPages

  • Purpose: Removes blank pages and/or filters based on a page range.

  • Mechanism: Detects blankness using text visibility, annotations, and embedded images.

  • Parameters: Page range, remove blank pages flag.

  • Output: Filtered PDF stream.

4. rotatePages

  • Purpose: Rotates pages within a specified range clockwise or counterclockwise.

  • Parameters: Page range, rotation direction.

  • Output: Modified PDF stream.

  • 🧱 Under the Hood - setRotation

5. splitPages

  • Purpose: Splits a PDF into individual pages.

  • Output: A list of InputStreams, each containing a single-page PDF.

6. mergePdfs βœ… (New 1.0.1)

  • Purpose: Combines two or more PDF documents into one.

  • Input: A list of PDF InputStreams.

  • Output: A single merged PDF stream with extracted metadata.

  • 🧱 Under the Hood: PDFMergerUtility + RandomAccessReadBuffer

Last updated