Apache PDFBox - Filter Pages
🔧 Operation Name
Apache PDFBox - Filter Pages
filterPages
🧾 Description
Filters pages from a PDF document based on two optional criteria:
Remove blank pages
Retain only selected page ranges
This is useful for preprocessing documents by cleaning up whitespace or extracting specific sections to keep processing times and associated costs to a minimum.
✅ Inputs
PDF File [Binary]
InputStream (Binary)
✅
The input PDF document to be filtered.
Remove Blank Pages
Boolean
❌ ([Only one choice allowed])
If true, pages without visible text, images, or annotations will be removed.
Page Range
String
❌ ([Only one choice allowed])
Comma-separated list of page numbers or ranges to retain (e.g., 1,3,5-7). If not provided, all pages are considered.
📤 Output
Payload:
InputStream(Binary) A new filtered PDF stream containing only the selected (and non-blank) pages.Attributes:
PdfBoxFileAttributesMetadata from the original document (e.g., page count, author, title, etc.).
🧪 MuleSoft Flow Example
Here’s how to call this operation in a MuleSoft flow:

<mule
	xmlns="http://www.mulesoft.org/schema/mule/core"
	xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
	xmlns:pdfbox="http://www.mulesoft.org/schema/mule/pdfbox"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:file="http://www.mulesoft.org/schema/mule/file" 
	xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core 
	http://www.mulesoft.org/schema/mule/core/current/mule.xsd  
	http://www.mulesoft.org/schema/mule/pdfbox 
	http://www.mulesoft.org/schema/mule/pdfbox/current/mule-pdfbox.xsd
	http://www.mulesoft.org/schema/mule/file 
	http://www.mulesoft.org/schema/mule/file/current/mule-file.xsd">
	<flow name="main">
		<scheduler doc:name="Scheduler" doc:id="dsgkfy" >
			<scheduling-strategy>
				<fixed-frequency timeUnit="HOURS"/>
			</scheduling-strategy>
		</scheduler>
		<flow-ref name="Apache PDFBox - Filter Pages" />
	</flow>
	
	<sub-flow name="Apache PDFBox - Filter Pages">
		<set-payload doc:id="vxsfk2" doc:name="Set payload" mimeType="application/octet-stream" value='#[%dw 2.0
output application/java
---readUrl("https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf", "application/octet-stream") as Binary]'></set-payload>
		<pdfbox:filter-pages doc:id="vlvadh" doc:name="Apache PDFBox - Filter Pages" pageRange="1,3-4"></pdfbox:filter-pages>
		<logger doc:name="Logger" doc:id="ecdqs2s" message='#[%dw 2.0
output text
---
"\n\n Apache PDFBox - Filter Pages" 
++ "\n\n⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄⌄"
++ "\n\nFilter Pages Attributes: " ++ (write(attributes, "application/json")) as String
++ "\n\n^^^^^^^^^^^^^^^^^^^^"
++ "\n\n Apache PDFBox - Filter Pages" 
++ "\n\n"]'/>
		<file:write path="test.pdf" doc:name="Write" doc:id="edxzkf" />
	</sub-flow>
</mule>🔍 Notes
Page Indexing: 1-based (e.g.,
1= first page).If both options are omitted, the PDF is returned unmodified.
You can combine both
removeBlankPagesandpageRangefor tighter filtering.For example: remove blank pages after retaining only pages 2–6.
Output is a binary PDF, not text.
Underlying Application Interface:
Last updated