Apache PDFBox - Filter Pages
🔧 Operation Name
Apache PDFBox - Filter Pages
filterPages
🧾 Description
Filters pages from a PDF document based on two optional criteria:
Remove blank pages
Retain only selected page ranges
This is useful for preprocessing documents by cleaning up whitespace or extracting specific sections to keep processing times and associated costs to a minimum.
✅ Inputs
PDF File [Binary]
InputStream
(Binary)
✅
The input PDF document to be filtered.
Remove Blank Pages
Boolean
❌ ([Only one choice allowed])
If true
, pages without visible text, images, or annotations will be removed.
Page Range
String
❌ ([Only one choice allowed])
Comma-separated list of page numbers or ranges to retain (e.g., 1,3,5-7
). If not provided, all pages are considered.
📤 Output
Payload:
InputStream
(Binary) A new filtered PDF stream containing only the selected (and non-blank) pages.Attributes:
PdfBoxFileAttributes
Metadata from the original document (e.g., page count, author, title, etc.).
🧪 MuleSoft Flow Example
Here’s how to call this operation in a MuleSoft flow:

<mule
xmlns="http://www.mulesoft.org/schema/mule/core"
xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
xmlns:pdfbox="http://www.mulesoft.org/schema/mule/pdfbox"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:file="http://www.mulesoft.org/schema/mule/file"
xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core
http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/pdfbox
http://www.mulesoft.org/schema/mule/pdfbox/current/mule-pdfbox.xsd
http://www.mulesoft.org/schema/mule/file
http://www.mulesoft.org/schema/mule/file/current/mule-file.xsd">
<flow name="main">
<scheduler doc:name="Scheduler" doc:id="dsgkfy" >
<scheduling-strategy>
<fixed-frequency timeUnit="HOURS"/>
</scheduling-strategy>
</scheduler>
<flow-ref name="Apache PDFBox - Filter Pages" />
</flow>
<sub-flow name="Apache PDFBox - Filter Pages">
<set-payload doc:id="vxsfk2" doc:name="Set payload" mimeType="application/octet-stream" value='#[%dw 2.0
output application/java
---readUrl("https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf", "application/octet-stream") as Binary]'></set-payload>
<pdfbox:filter-pages doc:id="vlvadh" doc:name="Apache PDFBox - Filter Pages" pageRange="1,3-4"></pdfbox:filter-pages>
<logger doc:name="Logger" doc:id="ecdqs2s" message='#[%dw 2.0
output text
---
"\n\n Apache PDFBox - Filter Pages"
++ "\n\n⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄"
++ "\n\nFilter Pages Attributes: " ++ (write(attributes, "application/json")) as String
++ "\n\n^^^^^^^^^^^^^^^^^^^^"
++ "\n\n Apache PDFBox - Filter Pages"
++ "\n\n"]'/>
<file:write path="test.pdf" doc:name="Write" doc:id="edxzkf" />
</sub-flow>
</mule>
🔍 Notes
Page Indexing: 1-based (e.g.,
1
= first page).If both options are omitted, the PDF is returned unmodified.
You can combine both
removeBlankPages
andpageRange
for tighter filtering.For example: remove blank pages after retaining only pages 2–6.
Output is a binary PDF, not text.
Underlying Application Interface:
Last updated