Skip to content

mem: reduce PaddleOCR rec_batch_num from 6 to 1#4295

Open
KRRT7 wants to merge 6 commits intoUnstructured-IO:mainfrom
KRRT7:mem/paddle-rec-batch-num
Open

mem: reduce PaddleOCR rec_batch_num from 6 to 1#4295
KRRT7 wants to merge 6 commits intoUnstructured-IO:mainfrom
KRRT7:mem/paddle-rec-batch-num

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Mar 24, 2026

Reduce PaddleOCR rec_batch_num from 6 (default) to 1. Paddle's native inference engine allocates 500 MiB memory arena chunks proportional to recognition batch size. With batch_num=6, four chunks are allocated during text recognition. Setting it to 1 reduces this to one chunk.

benchmark

Setting Peak memory
rec_batch_num=6 7,184 MiB
rec_batch_num=1 2,684 MiB
Delta -4,500 MiB (-62.6%)

Measured with memray run on layout-parser-paper-with-table.pdf through partition() with hi_res + PaddleOCR table OCR. On CPU, batch processing doesn't parallelize — it's sequential within predictor.run(). Smaller batches just allocate less workspace memory.

Reproduce

Requires unstructured[pdf], paddlepaddle, unstructured-paddleocr, and memray.

cat > /tmp/bench_paddle.py << 'SCRIPT'
from unstructured.partition.auto import partition
elements = partition(
    filename="example-docs/layout-parser-paper.pdf",
    strategy="hi_res",
    pdf_infer_table_structure=True,
    ocr_agent="unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle",
)
print(f"Partitioned: {len(elements)} elements")
SCRIPT

# Baseline (main branch, rec_batch_num=6):
git checkout main
memray run --native --trace-python-allocators -o /tmp/paddle_baseline.bin /tmp/bench_paddle.py
memray stats /tmp/paddle_baseline.bin | grep "Peak memory"

# With this change (rec_batch_num=1):
git checkout mem/paddle-rec-batch-num
memray run --native --trace-python-allocators -o /tmp/paddle_opt.bin /tmp/bench_paddle.py
memray stats /tmp/paddle_opt.bin | grep "Peak memory"

KRRT7 added 6 commits March 19, 2026 09:48
Paddle's native inference engine allocates 500 MiB memory arena chunks
during text recognition, proportional to batch size. With the default
rec_batch_num=6, four 500 MiB chunks are allocated simultaneously.

Setting rec_batch_num=1 reduces this to a single chunk, cutting peak
memory on the PaddleOCR code path by ~1,265 MiB (-42.6%).

Latency benchmark (55 text regions, CPU, 5 runs):
- rec_batch_num=6: 39.1s +/- 3.5s
- rec_batch_num=1: 37.0s +/- 2.0s
No throughput regression — on CPU, batch processing is sequential.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants