Apache PDFBox - Image to PDF

🔧 Operation Name

Apache PDFBox – Image to PDF imageToPdf

🧾 Description

Converts a single image (e.g., JPG, PNG, GIF, BMP) into a one-page PDF document. The resulting PDF has a single page sized to match the dimensions of the source image.

Perfect for embedding scanned receipts, screenshots, or images into a PDF workflow.

✅ Inputs

Image File [Binary] (InputStream) Binary content of the image (JPEG, PNG, etc.).

📤 Output

Payload: InputStream (binary stream) A one-page PDF containing the provided image.
Attributes: PdfBoxFileAttributes Metadata of the generated PDF: number of pages (always 1), file size, creation date, etc.

🔍 Notes

The generated page is automatically sized to the image’s pixel dimensions (no scaling).
You can adjust scaling or margins later by combining this with other operations (e.g., rotatePages, filterPages).
Currently supports formats loadable by PDImageXObject (PNG, JPEG, BMP, GIF).
Always outputs a single-page PDF regardless of the image dimensions.

Underlying Application Interface:

pdfbox 3.0.6 javadoc (org.apache.pdfbox)javadoc.io

Pseudo Code

Operation: imageToPdf

Input:

imageStream: Binary content of the image (InputStream)

streamingHelper: MuleSoft StreamingHelper (for context/utilities)

Output:
Result containing:

A single InputStream representing the new one-page PDF document.

PdfBoxFileAttributes containing metadata of the generated PDF (page count, size, etc.).

Errors:

PDF_PROCESSING_ERROR: If the input is not a valid image or embedding fails.

PDF_METADATA_EXTRACTION_FAILED: If metadata cannot be retrieved from the generated PDF.

Steps:

Convert the imageStream to a byte array.

Create a new empty PDDocument.

Try to create a PDImageXObject from the byte array inside the document.

If this fails, throw a ModuleException with PDF_PROCESSING_ERROR.

Get the width and height of the image from the PDImageXObject.

Create a new PDRectangle using the image’s width and height.

Create a new PDPage with this rectangle and add it to the document.

Create a PDPageContentStream bound to the new page.

Draw the image on the page at coordinates (0,0) with full width and height.

Close the content stream.

Create a ByteArrayOutputStream.

Save the PDF document to the ByteArrayOutputStream.

Try to extract metadata from the PDF (using extractPdfMetadata).

If extraction fails, throw a ModuleException with PDF_METADATA_EXTRACTION_FAILED.

Convert the ByteArrayOutputStream to a ByteArrayInputStream.

Create a new Result object containing:

The ByteArrayInputStream as output.

The PdfBoxFileAttributes as attributes.

Return the Result object.

In a finally block, ensure the PDDocument is closed to release resources.

Methods used from the Apache PDFBox library

📂 Classes & Methods from PDFBox

PDDocument
- new PDDocument() → creates a new empty PDF document.
- addPage(PDPage page) → adds a new page to the document.
- save(OutputStream) → saves the document to an output stream.
- close() → releases resources held by the document.
PDImageXObject
- PDImageXObject.createFromByteArray(PDDocument doc, byte[] imageData, String name) → creates a PDF image object from raw image bytes.
PDRectangle
- new PDRectangle(float width, float height) → defines a rectangle (page size) with the same dimensions as the image.
PDPage
- new PDPage(PDRectangle mediaBox) → creates a new page with the given dimensions.
PDPageContentStream
- new PDPageContentStream(PDDocument doc, PDPage page) → creates a content stream to draw onto a page.
- drawImage(PDImageXObject image, float x, float y, float width, float height) → draws the image onto the page at (x, y) scaled to width/height.
- close() → finalizes and closes the content stream.

PreviousApache PDFBox - Split Pages

Last updated 3 months ago