MuleSoft Forge
GitHub
  • MuleSoft Forge Initiative
    • Overview
    • How to Contribute
  • Connectors
    • mule-idp-connector
      • Set Up
      • Operations
        • Service IDP - Execution - Submit
        • Service IDP - Execution Result - Retrieve
        • Service IDP - Review Tasks - List
        • Service IDP - Review Task - Delete
        • Service IDP - Review Task - Update
        • Platform IDP - Actions - List
        • Platform IDP - Action Versions - List
        • Deprecated 1.0.1 - Utils IDP - PDF - ExtractText
        • Deprecated 1.0.1 - Utils IDP - PDF - RemovePages
      • docs.mulesoft.com
      • MuleSoft IDP Universal 🌐 REST Smart Connector 🔌
  • Modules
    • mule-pdfbox-module
      • Set Up
      • Operations
        • Apache PDFBox - Extract Text
        • Apache PDFBox - Filter Pages
        • Apache PDFBox - Get Info
        • Apache PDFBox - Merge PDFs
        • Apache PDFBox - Rotate Pages
        • Apache PDFBox - Split Pages
Powered by GitBook
On this page
  1. Modules

mule-pdfbox-module

PDF Utilities for MuleSoft

PreviousMuleSoft IDP Universal 🌐 REST Smart Connector 🔌NextSet Up

Last updated 14 days ago

Empower your MuleSoft flows with native PDF manipulation powered by . This connector provides high-performance PDF operations with no external dependencies.

🔍 Key Features

  • 📄 Metadata Extraction – Get author, title, number of pages, and more.

  • ✂️ Text Extraction – Pull text from a specific range of pages.

  • 🧹 Blank Page Removal – Clean your documents before delivery.

  • 🔁 Page Rotation – Rotate document pages as needed.

  • 🧩 PDF Splitting – Break large PDFs into separate single-page files.

  • 📎 PDF Merging – Combine multiple PDFs into a single cohesive document

🔧 Built For Developers

  • Lightweight, single-dependency module

  • Designed using MuleSoft Java SDK

  • Input/output via standard Java streams

🧱 Under the Hood

  • Fully compatible with Mule 4.x

  • Handles page ranges and robust PDF parsing

Implemented Operations:

1. extractPdfInfo

  • Purpose: Extracts document metadata such as number of pages, author, title, subject, and version.

  • Input: InputStream of the PDF.

  • Output: POJO with document properties.

2. extractTextByPageRange

  • Purpose: Extracts plain text from a given page range.

  • Input: PDF stream + optional startPage / endPage.

  • Output: Extracted text as a string.

3. filterPages

  • Purpose: Removes blank pages and/or filters based on a page range.

  • Mechanism: Detects blankness using text visibility, annotations, and embedded images.

  • Parameters: Page range, remove blank pages flag.

  • Output: Filtered PDF stream.

4. rotatePages

  • Purpose: Rotates pages within a specified range clockwise or counterclockwise.

  • Parameters: Page range, rotation direction.

  • Output: Modified PDF stream.

5. splitPages

  • Purpose: Splits a PDF into individual pages.

  • Output: A list of InputStreams, each containing a single-page PDF.

6. mergePdfs ✅ (New 1.0.1)

  • Purpose: Combines two or more PDF documents into one.

  • Input: A list of PDF InputStreams.

  • Output: A single merged PDF stream with extracted metadata.

  • 🧱 Under the Hood: PDFMergerUtility + RandomAccessReadBuffer

Built using

🧱 Under the Hood -

🧱 Under the Hood -

🧱 Under the Hood -

Apache PDFBox
PDFDocumentInformation
PDFTextStripper
setRotation
Apache PDFBox