Feature : Preserve Folder Hierarchy In PDF Processing Workflows

**Summary**
The PDF-to-PDF and PDF-to-HTML solutions currently only process files in the root folder of the input S3 bucket. Files uploaded to subfolders are not processed, and output does not preserve the original folder structure. This limitation creates significant operational overhead for bulk document processing with complex folder hierarchies.

**Current Behavior**
**PDF-to-PDF Solution:**
✅ Files in root folder (e.g., /pdf/test.pdf) → Output appears in "result/" folder
❌ Files in subfolders (e.g., /pdf/sub-folder/test.pdf) → No output generated

**PDF-to-HTML Solution:**
✅ Files in root folder (e.g., /uploads/test.pdf) → Output generated in "/remediated/" folder
❌ Files in subfolders (e.g., /uploads/sub-folder/test.pdf) → No output generated

**Desired Behavior**
When a file is uploaded to a subfolder structure, the solution should:

Detect and process the file regardless of folder depth
Preserve the original folder hierarchy in the output bucket
Example:

Input: s3://input-bucket/pdf/department-a/2024/document.pdf
Output: s3://output-bucket/result/department-a/2024/document.pdf

**This enhancement would enable:**
Automated bulk processing of documents with existing folder structures
Reduced operational overhead and manual intervention
Better scalability for enterprise document processing workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature : Preserve Folder Hierarchy In PDF Processing Workflows #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature : Preserve Folder Hierarchy In PDF Processing Workflows #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions