Apache PDFBox - Filter Pages
š§ Operation Name
Apache PDFBox - Filter Pages
filterPages
š§¾ Description
Filters pages from a PDF document based on two optional criteria:
Remove blank pages
Retain only selected page ranges
This is useful for preprocessing documents by cleaning up whitespace or extracting specific sections to keep processing times and associated costs to a minimum.
ā
Inputs
PDF File [Binary]
InputStream
(Binary)
ā
The input PDF document to be filtered.
Remove Blank Pages
Boolean
ā ([Only one choice allowed])
If true
, pages without visible text, images, or annotations will be removed.
Page Range
String
ā ([Only one choice allowed])
Comma-separated list of page numbers or ranges to retain (e.g., 1,3,5-7
). If not provided, all pages are considered.
š¤ Output
Payload:
InputStream
(Binary) A new filtered PDF stream containing only the selected (and non-blank) pages.Attributes:
PdfBoxFileAttributes
Metadata from the original document (e.g., page count, author, title, etc.).
š§Ŗ MuleSoft Flow Example
Hereās how to call this operation in a MuleSoft flow:
š Notes
Page Indexing: 1-based (e.g.,
1
= first page).If both options are omitted, the PDF is returned unmodified.
You can combine both
removeBlankPages
andpageRange
for tighter filtering.For example: remove blank pages after retaining only pages 2ā6.
Output is a binary PDF, not text.
Underlying Application Interface:
Last updated