Skip to content

Enhance job processing by increasing batch size and adding database f…#7

Open
mpatrikios wants to merge 5 commits into
mainfrom
speed
Open

Enhance job processing by increasing batch size and adding database f…#7
mpatrikios wants to merge 5 commits into
mainfrom
speed

Conversation

@mpatrikios
Copy link
Copy Markdown
Owner

…allback for job status retrieval

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances job processing capabilities by increasing the batch size for photo processing from 1 to 3 and adding a robust database fallback mechanism for job status retrieval when jobs are not found in memory.

Key changes:

  • Increased batch processing from 1 to 3 photos per batch for improved throughput
  • Added database fallback in get_processing_status to reconstruct job state from the database when not found in memory
  • Enhanced sync_jobs_from_database to accurately recompute job progress and status from database records

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
backend/app/api/process_tasks.py Increased BATCH_SIZE from 1 to 3, added database fallback for job status retrieval with job reconstruction logic, and enhanced the sync_jobs_from_database function to recompute progress and status accurately
.firebase/hosting.ZnJvbnRlbmQ.cache Auto-generated Firebase hosting cache with updated file timestamps and hashes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/app/api/process_tasks.py Outdated
Comment on lines +545 to +549
status_map = {
"completed": ProcessingStatus.COMPLETED,
"failed": ProcessingStatus.FAILED,
"processing": ProcessingStatus.PROCESSING,
}
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status_map dictionary is duplicated between get_processing_status (lines 545-549) and sync_jobs_from_database (lines 810-815). This code duplication can lead to maintenance issues if the status mappings need to be updated. Consider extracting this mapping into a module-level constant or a shared function to ensure consistency and reduce duplication.

Copilot uses AI. Check for mistakes.
Comment thread backend/app/api/process_tasks.py Outdated
Comment on lines +542 to +552
progress = int((completed_photos / total_photos) * 100) if total_photos else 0

# Map database status to processing status safely
status_map = {
"completed": ProcessingStatus.COMPLETED,
"failed": ProcessingStatus.FAILED,
"processing": ProcessingStatus.PROCESSING,
}
current_status = status_map.get(db_job.status, ProcessingStatus.PROCESSING)
if progress >= 100 and current_status == ProcessingStatus.PROCESSING:
current_status = ProcessingStatus.COMPLETED
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress calculation and status update logic (lines 542, 550-552) is duplicated in sync_jobs_from_database (lines 826-833). Consider extracting this logic into a shared helper function to improve maintainability and ensure consistent behavior across both functions.

Copilot uses AI. Check for mistakes.
Comment thread backend/app/api/process_tasks.py Outdated
Comment on lines +536 to +540
total_photos = db.query(PhotoDB).filter(PhotoDB.processing_job_id == db_job.id).count()
completed_photos = db.query(PhotoDB).filter(
PhotoDB.processing_job_id == db_job.id,
PhotoDB.processing_status == DBProcessingStatus.COMPLETED
).count()
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database fallback performs two separate COUNT queries (lines 536 and 537-540) when reconstructing the job status. This could be optimized into a single query using conditional aggregation (COUNT with CASE WHEN) to reduce database round trips, especially when this endpoint might be called frequently for status polling.

Copilot uses AI. Check for mistakes.
Comment on lines +818 to +821
completed_photos = db.query(PhotoDB).filter(
PhotoDB.processing_job_id == db_job.id,
PhotoDB.processing_status == DBProcessingStatus.COMPLETED
).count()
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The completed_photos count query is executed once per job in the loop, which creates an N+1 query problem. If there are many active jobs, this will result in excessive database queries. Consider using a single query with GROUP BY to fetch all completed photo counts at once, or use a JOIN with aggregation to retrieve all job data in one query.

Copilot uses AI. Check for mistakes.
Comment thread backend/app/api/process_tasks.py Outdated
Comment thread backend/app/api/process_tasks.py Outdated
mpatrikios and others added 4 commits January 9, 2026 14:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Reduce PostgreSQL pool_size from 20 to 3 (match Cloud Tasks concurrency)
- Set max_overflow=0 to prevent connection exhaustion
- Set GEMINI_CONCURRENCY default from 1 to 2 for balanced throughput
- Add comprehensive timing logs (T1-T4) for performance diagnosis
- Add in-process file cache to eliminate redundant GCS downloads

🚀 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants