-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Summary
The PDF-to-PDF and PDF-to-HTML solutions currently only process files in the root folder of the input S3 bucket. Files uploaded to subfolders are not processed, and output does not preserve the original folder structure. This limitation creates significant operational overhead for bulk document processing with complex folder hierarchies.
Current Behavior
PDF-to-PDF Solution:
✅ Files in root folder (e.g., /pdf/test.pdf) → Output appears in "result/" folder
❌ Files in subfolders (e.g., /pdf/sub-folder/test.pdf) → No output generated
PDF-to-HTML Solution:
✅ Files in root folder (e.g., /uploads/test.pdf) → Output generated in "/remediated/" folder
❌ Files in subfolders (e.g., /uploads/sub-folder/test.pdf) → No output generated
Desired Behavior
When a file is uploaded to a subfolder structure, the solution should:
Detect and process the file regardless of folder depth
Preserve the original folder hierarchy in the output bucket
Example:
Input: s3://input-bucket/pdf/department-a/2024/document.pdf
Output: s3://output-bucket/result/department-a/2024/document.pdf
This enhancement would enable:
Automated bulk processing of documents with existing folder structures
Reduced operational overhead and manual intervention
Better scalability for enterprise document processing workflows