mule-pdfbox-module

PDF Utilities for MuleSoft

Empower your MuleSoft flows with native PDF manipulation powered by Apache PDFBox. This connector provides high-performance PDF operations with no external dependencies.

🔍 Key Features

📄 Metadata Extraction – Get author, title, number of pages, and more.
✂️ Text Extraction – Pull text from a specific range of pages.
🧹 Blank Page Removal – Clean your documents before delivery.
🔁 Page Rotation – Rotate document pages as needed.
🧩 PDF Splitting – Break large PDFs into separate single-page files.
📎 PDF Merging – Combine multiple PDFs into a single cohesive document

🔧 Built For Developers

Lightweight, single-dependency module
Designed using MuleSoft Java SDK
Input/output via standard Java streams

🧱 Under the Hood

Built using Apache PDFBox
Fully compatible with Mule 4.x
Handles page ranges and robust PDF parsing

Implemented Operations:

1. extractPdfInfo

Purpose: Extracts document metadata such as number of pages, author, title, subject, and version.
Input: InputStream of the PDF.
Output: POJO with document properties.
🧱 Under the Hood - PDFDocumentInformation

2. extractTextByPageRange

Purpose: Extracts plain text from a given page range.
Input: PDF stream + optional startPage / endPage.
Output: Extracted text as a string.
🧱 Under the Hood - PDFTextStripper

3. filterPages

Purpose: Removes blank pages and/or filters based on a page range.
Mechanism: Detects blankness using text visibility, annotations, and embedded images.
Parameters: Page range, remove blank pages flag.
Output: Filtered PDF stream.

4. rotatePages

Purpose: Rotates pages within a specified range clockwise or counterclockwise.
Parameters: Page range, rotation direction.
Output: Modified PDF stream.
🧱 Under the Hood - setRotation

5. splitPages

Purpose: Splits a PDF into individual pages.
Output: A list of InputStreams, each containing a single-page PDF.

6. mergePdfs ✅ (New 1.0.1)

Purpose: Combines two or more PDF documents into one.
Input: A list of PDF InputStreams.
Output: A single merged PDF stream with extracted metadata.
🧱 Under the Hood: PDFMergerUtility + RandomAccessReadBuffer

PreviousMuleSoft IDP Universal 🌐 REST Smart Connector 🔌NextSet Up

Last updated 1 month ago