Skip to content

fix(server): load pipeline once at startup and return XML file directly#44

Open
haoyu-haoyu wants to merge 1 commit intoBIT-DataLab:mainfrom
haoyu-haoyu:fix/server-singleton-pipeline-and-file-response
Open

fix(server): load pipeline once at startup and return XML file directly#44
haoyu-haoyu wants to merge 1 commit intoBIT-DataLab:mainfrom
haoyu-haoyu:fix/server-singleton-pipeline-and-file-response

Conversation

@haoyu-haoyu
Copy link

@haoyu-haoyu haoyu-haoyu commented Mar 15, 2026

Summary

Three critical issues in server_pa.py fixed in a single focused PR:

1. Model reloaded on every request (performance)

Pipeline(config) was instantiated inside the /convert handler, causing SAM3 model weights to be loaded into GPU memory on every single upload. For a multi-GB model this means:

  • 10-30s latency added to every request just for model loading
  • GPU memory fragmentation and potential OOM after repeated requests

Fix: Singleton Pipeline created once during FastAPI lifespan startup. All requests share the same instance.

2. API returned local path instead of file (usability)

The response was {"output_path": "/server/local/path/xxx.drawio"} — a local filesystem path that API callers cannot use. This makes the entire API non-functional for any remote client.

Fix: /convert now returns the actual .drawio XML file as a FileResponse download with proper Content-Type: application/xml and Content-Disposition headers.

3. No upload size limit (stability)

No file size validation meant arbitrarily large uploads could crash the server with OOM errors.

Fix: Added a 20 MB cap (MAX_UPLOAD_BYTES) with a clear HTTP 413 response.

Other improvements

  • Proper logging module instead of bare print()
  • FastAPI lifespan context manager (modern best practice, replaces deprecated on_event)
  • Version bumped to 1.1.0

Before / After

# Before: every request loads the full model
@app.post("/convert")
async def convert(file: UploadFile = File(...)):
    pipeline = Pipeline(config)  # SAM3 loaded here every time!
    ...
    return {"output_path": result_path}  # useless local path

# After: model loaded once, file returned directly
@app.post("/convert")
async def convert(file: UploadFile = File(...)):
    pipeline = get_pipeline()  # singleton, instant
    ...
    return FileResponse(result_path, filename="output.drawio")  # actual file

Test plan

  • python server_pa.py starts and logs "Pipeline ready"
  • curl -F "file=@test.png" http://localhost:8000/convert -o result.drawio returns valid XML
  • Uploading a >20MB file returns HTTP 413
  • Second request is fast (no model reload)
  • GET /health returns {"status": "ok"}

Three issues fixed in server_pa.py:

1. **Model reloaded on every request** — Pipeline() was instantiated
   inside the /convert handler, meaning SAM3 weights were loaded into
   GPU memory on every upload. Now a singleton Pipeline is created once
   during the FastAPI lifespan startup event.

2. **API returned a local filesystem path instead of the file** — The
   response was {"output_path": "/server/path/..."}, which is useless
   to API callers. Now /convert returns the actual .drawio XML as a
   FileResponse download.

3. **No upload size limit** — Arbitrarily large files could be uploaded,
   risking OOM crashes on the GPU. Added a 20 MB cap with a clear 413
   error.

Other improvements:
- Proper logging via Python logging module
- Structured lifespan context manager (FastAPI best practice)
- Cleaner error handling and temp file cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant