Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,13 @@ Implemented today: Lab API response contract, `/api/compare`, `/api/analyze` in-

Runtime identity polish: when a Forge manifest is applied, Runtime now preserves the manifest `source_model.path` identity for comparison naming. A TensorRT artifact such as `model.engine` can therefore keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32` instead of degrading to `model__...`. This is provenance/compare-readiness polish, not production SaaS infrastructure.

Not implemented yet: real worker daemon, full automated Forge/Runtime execution from production Lab workers, DB/Redis/queue, file upload, SaaS frontend, and production auth/billing/deployment controls.
Not implemented yet: real worker daemon, full automated Forge/Runtime execution from production Lab workers, DB/Redis/queue, file upload, production frontend beyond Local Studio, and production auth/billing/deployment controls.

Portfolio entry points: [portfolio submission](docs/portfolio/inferedge_portfolio_submission.md) · [resume/interview summary](docs/portfolio/inferedge_resume_interview_summary.md) · [1-page architecture summary](docs/portfolio/inferedge_1page_architecture.md) · [pipeline status](docs/portfolio/inferedge_pipeline_status.md)

Interview one-liner: **InferEdge is an end-to-end inference validation pipeline that converts, runs, compares, diagnoses, and decides whether an edge AI model candidate is ready to deploy.**

Final interview angle: InferEdge has both macOS ONNX Runtime CPU smoke and Jetson Orin Nano TensorRT smoke evidence, while production worker daemon, persistent queue/database, frontend, auth, and billing remain future work.
Final interview angle: InferEdge has both macOS ONNX Runtime CPU smoke and Jetson Orin Nano TensorRT smoke evidence, while production worker daemon, persistent queue/database, production frontend, auth, and billing remain future work.

---

Expand All @@ -84,6 +84,23 @@ TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input b
The benchmark uses end-to-end Runtime latency, not trtexec GPU-only latency.
The full pipeline portfolio summary is available at [docs/portfolio/inferedge_pipeline_portfolio.md](docs/portfolio/inferedge_pipeline_portfolio.md), and the detailed Runtime comparison report is available at [docs/portfolio/runtime_compare_yolov8n.md](docs/portfolio/runtime_compare_yolov8n.md).

## Local Studio Demo Evidence

InferEdge Local Studio can replay the bundled portfolio evidence without requiring a live Jetson device during an interview walkthrough.
The `Load Demo Evidence` flow imports the ONNX Runtime CPU and TensorRT Jetson Runtime JSON fixtures from [examples/studio_demo](examples/studio_demo), refreshes Compare View, and keeps the demo pair selectable in Recent jobs while the local server process is running.

![InferEdge Local Studio demo evidence](assets/images/local-studio-demo-evidence.png)

Verified demo fixture values:

| Backend | Device | Mean ms | P99 ms | FPS | Compare Key |
|---|---|---:|---:|---:|---|
| ONNX Runtime | CPU | 45.4299 | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` |
| TensorRT | Jetson | 9.9375 | 15.5231 | 100.6293 | `yolov8n__b1__h640w640__fp32` |

Studio reports this as a `4.57x` TensorRT speedup for the bundled demo pair.
AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided.

---

## Reproducible Review Flow
Expand Down Expand Up @@ -372,7 +389,7 @@ More details: [FastAPI API usage guide](docs/api/api_usage.md)

## Local Studio

InferEdge Local Studio is a local-first browser interface for inspecting the existing CLI workflow, API/job contracts, result metrics, and Lab-owned deployment decision structure.
InferEdge Local Studio is a local-first browser interface for inspecting the existing CLI workflow, API/job contracts, Runtime evidence, Compare View, Jetson command helper, and Lab-owned deployment decision structure.
It runs on the user's machine through the FastAPI server and is intended as a local workflow UI foundation, not a production SaaS dashboard or cloud dashboard.

### Run Local Studio
Expand All @@ -387,7 +404,17 @@ Open:
http://localhost:8000/studio
```

The first Studio skeleton uses local static assets only and renders demo placeholders for the pipeline flow, evidence summary, result metrics, and deployment decision. Future work can connect these cards to real `/api/jobs`, `/api/compare`, and `/api/analyze` responses while keeping DB/queue/upload/auth/billing outside the current scope.
What works today:

- Run creates an in-memory analyze job through the existing `/api/analyze` contract.
- Import accepts a Runtime result JSON path or pasted JSON payload and adds it to the in-memory compare-ready evidence set.
- Load Demo Evidence imports the bundled ONNX Runtime CPU and TensorRT Jetson fixtures for a stable browser demo.
- Compare View shows TensorRT vs ONNX Runtime mean latency, p99, FPS, latency diff, and speedup when compatible evidence is loaded.
- Jetson Helper shows the local command shape for running the Runtime on a Jetson device.
- Deployment Decision stays Lab-owned; AIGuard is optional deterministic diagnosis evidence.

Current non-goals remain unchanged: no DB, queue, upload service, production auth, billing, or production SaaS worker orchestration.
Jobs and imported Studio evidence are in-memory and reset when the local server process restarts.

---

Expand Down
54 changes: 52 additions & 2 deletions Roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Improve usability, discoverability, and expansion paths beyond the core CLI work
- [x] Provide richer CLI presentation with Rich
- [x] Generate HTML benchmark and validation reports
- [x] Run automated benchmark / validation checks in CI
- [ ] Add a web dashboard mode
- [x] Add a local-first Studio workflow UI for portfolio demo and browser inspection

---

Expand Down Expand Up @@ -124,6 +124,56 @@ Improve usability, discoverability, and expansion paths beyond the core CLI work
## 🔭 Future Direction

- [ ] Complete full RKNN runtime backend integration so curated and runtime validation share one end-to-end device workflow
- [ ] Evolve the current API adapter into a foundation for a web dashboard or SaaS-style validation surface
- [ ] Keep Local Studio as a local-first workflow UI, and only later evaluate whether a production dashboard is justified
- [ ] Add memory profiling so deployment decisions are informed by both latency and resource pressure
- [ ] Explore multi-device distributed benchmarking for larger validation fleets and lab-scale experimentation

---

## Cross-Repository Roadmap

The current portfolio boundary is intentionally local-first and evidence-driven. The items below are future development directions, not current claims.

### InferEdgeForge

Forge should stay focused on build provenance and artifact handoff.

- [x] Emit manifest and metadata records for source model, artifact, backend, target, precision, shape, preset, and build id
- [x] Provide worker/runtime summary data that can feed Lab and Runtime contracts
- [ ] Add stronger build reproducibility checks across repeated artifact builds
- [ ] Expand preset coverage for Jetson TensorRT and RKNN build targets
- [ ] Add artifact package export suitable for sharing with Runtime without manual path coordination

### InferEdgeRuntime

Runtime should stay focused on real execution, profiling, and Lab-compatible result export.

- [x] Provide C++ execution/result export boundary
- [x] Validate Lab worker request payloads in dry-run mode
- [x] Export compare-ready Runtime result JSON for ONNX Runtime CPU and TensorRT Jetson evidence
- [x] Preserve source model identity for manifest-backed TensorRT engine results
- [ ] Harden Runtime execution error reporting for failed engine/model loads
- [ ] Add memory/resource profiling to complement latency, p99, and FPS
- [ ] Complete RKNN runtime execution so curated RKNN evidence and live Runtime execution share one path

### InferEdgeLab

Lab should remain the comparison, reporting, API/job contract, Local Studio, and deployment decision owner.

- [x] Compare Runtime result JSON by `compare_key` and `backend_key`
- [x] Generate Markdown/HTML reports and API response bundles
- [x] Provide in-memory `/api/analyze`, `/api/jobs/{job_id}`, and worker request/response mapping contracts
- [x] Provide Local Studio for Run, Import, Demo Evidence, Compare View, Deployment Decision, and Jetson command helper
- [ ] Add optional persisted result storage after the portfolio demo boundary is stable
- [ ] Add production worker daemon integration only after Forge/Runtime handoff is reliable
- [ ] Improve multi-model evidence browsing without turning Studio into a production SaaS surface

### InferEdgeAIGuard

AIGuard should stay optional and deterministic. It should explain evidence risks, not replace Lab's final decision ownership.

- [x] Diagnose provenance mismatch with rule/evidence based detectors
- [x] Preserve `guard_analysis` in Lab reports/API/deployment decision bundles
- [ ] Add more detector coverage for missing manifest fields, backend mismatch, precision mismatch, and suspicious result deltas
- [ ] Add clearer guard evidence examples for interview demos
- [ ] Keep AIGuard optional in Studio until the evidence contract is strong enough to justify a UI action
Binary file added assets/images/local-studio-demo-evidence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/portfolio/inferedge_1page_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ ONNX model
- Lab `worker_request` / `worker_response` boundary
- Lab -> Runtime dev-only minimal execution smoke using `yolov8n.onnx` (ONNX Runtime CPU, success, mean about 47.97 ms, p95 about 51.80 ms, about 20.85 FPS)
- Jetson Orin Nano TensorRT Runtime smoke using Forge manifest + TensorRT engine artifact (success, manifest applied, mean about 14.00 ms, p99 about 15.50 ms, about 71.44 FPS)
- Local Studio demo evidence replay at `/studio` using bundled ONNX Runtime CPU and TensorRT Jetson result fixtures: 45.4299 ms vs 9.9375 ms mean latency, 49.2128 ms vs 15.5231 ms p99, 22.0119 vs 100.6293 FPS, and a 4.57x TensorRT speedup for the demo pair
- Runtime source-model identity polish for manifest-backed TensorRT engine results (`model.engine` can still keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`)
- Runtime `worker_request` validation and `worker_response` dry-run export
- Forge worker/runtime summary
Expand All @@ -53,7 +54,7 @@ ONNX model
- full automated Forge/Runtime execution from a production Lab worker
- database, Redis, or queue
- file upload
- frontend
- production frontend beyond the local Studio workflow UI
- production authentication, billing, and deployment controls

## Interview Explanation
Expand Down
8 changes: 4 additions & 4 deletions docs/portfolio/inferedge_pipeline_status.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Current role:
- runs compare, compare-latest, report, and deployment decision flows
- exposes `/api/compare` with the SaaS API response contract
- exposes in-memory `/api/analyze` and `/api/jobs/{job_id}` workflow stubs
- exposes a local-first `/studio` skeleton that presents the CLI/API/job/deployment decision workflow in the browser
- exposes a local-first `/studio` workflow UI for Run, Import, Compare View, Jetson command helper, demo evidence replay, and deployment decision inspection
- maps analyze jobs to worker requests and worker responses back to job results
- preserves optional AIGuard evidence while keeping Lab as the final decision owner

Expand Down Expand Up @@ -95,7 +95,7 @@ The current cross-repository loop is covered by documentation, fixtures, and smo
- Forge summary-origin Lab worker request validation in Runtime
- AIGuard worker provenance mismatch diagnosis
- Lab deployment decision/report evidence smoke for AIGuard worker provenance diagnosis
- Local Studio skeleton for viewing the Forge -> Runtime -> Lab -> optional AIGuard workflow, smoke evidence, metrics placeholders, and deployment decision ownership from a local browser
- Local Studio local-first workflow UI for viewing Forge -> Runtime -> Lab -> optional AIGuard state, creating in-memory analyze jobs, importing Runtime result JSON, replaying bundled demo evidence, comparing backends, and inspecting Lab-owned deployment decision context

This means the current product boundary is testable without running the production worker infrastructure.

Expand Down Expand Up @@ -124,7 +124,7 @@ Demo readiness: `scripts/demo_pipeline_full.sh` now provides a guided end-to-end
- Manual Jetson TensorRT Runtime smoke using Forge manifest and TensorRT engine artifact
- Runtime compare-key identity polish for manifest-backed engine artifacts
- Guided end-to-end demo entrypoint for portfolio and interview walkthroughs
- Local Studio skeleton at `/studio` for a local-first browser view of the workflow foundation
- Local Studio at `/studio` for a local-first browser view of Run / Import / Demo Evidence / Compare / Decision / Jetson Helper workflows
- Cross-repo fixture compatibility across Forge, Runtime, Lab, and AIGuard
- Rule/evidence based provenance mismatch diagnosis

Expand All @@ -136,7 +136,7 @@ Demo readiness: `scripts/demo_pipeline_full.sh` now provides a guided end-to-end
- database persistence
- Redis, Celery, or another queue
- file upload handling
- production frontend beyond the local Studio skeleton
- production frontend beyond the local Studio workflow UI
- production authentication, billing, and deployment controls

These gaps are intentional. The current project fixes the contracts first, then leaves infrastructure choices for later.
Expand Down
11 changes: 7 additions & 4 deletions docs/portfolio/inferedge_portfolio_submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ InferEdge is not a benchmarking tool, but an end-to-end validation pipeline that
- InferEdge 전체 흐름은 Forge build provenance -> Runtime real execution -> Lab compare/report/API/job/deployment_decision -> optional AIGuard diagnosis evidence로 구성된다.
- Lab은 InferEdgeForge provenance metadata, InferEdge-Runtime C++ execution output, optional InferEdgeAIGuard diagnostic evidence를 하나의 검증 bundle로 연결한다.
- `yolov8n.onnx` manual smoke에서 Lab -> C++ Runtime CLI -> ONNX Runtime CPU execution -> Lab job result ingestion 경로가 dev-only minimal Runtime execution path로 검증되었다.
- 현재 상태는 portfolio-grade pipeline foundation이며, production worker daemon, persistent queue/database, file upload, frontend, auth/billing은 future work로 명확히 분리한다.
- 현재 상태는 portfolio-grade pipeline foundation이며, production worker daemon, persistent queue/database, file upload, production frontend beyond Local Studio, auth/billing은 future work로 명확히 분리한다.

Pipeline:

Expand Down Expand Up @@ -93,13 +93,14 @@ Rule + evidence diagnosis layer. Forge summary, Runtime worker_response, Lab res
- Runtime worker_response compatibility ingest in Lab
- AIGuard worker provenance mismatch diagnosis
- AIGuard guard_analysis preservation in Lab deployment decision/report smoke
- Local Studio browser workflow for Run, Import, Jetson command helper, demo evidence replay, Compare View, and Lab-owned Deployment Decision inspection
- 4개 repository README pipeline summary sync

## 6. Validation Evidence

Recent validation evidence:

- InferEdgeLab: `poetry run python3 -m pytest -q` -> 245 passed
- InferEdgeLab: `poetry run python3 -m pytest -q` -> 262 passed
- InferEdgeForge: `python -m pytest -q` -> 89 passed
- InferEdgeRuntime: `python3 tests/test_lab_worker_adapter_contract.py` -> 12 tests passed
- InferEdgeRuntime: `scripts/smoke_default.sh` -> success
Expand All @@ -110,6 +111,7 @@ Recent validation evidence:
- Jetson TensorRT Runtime smoke: on Jetson Orin Nano (`Linux 5.15.148-tegra`, `aarch64`), the C++ Runtime CLI in `~/InferEdge-Runtime` executed Forge manifest `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/manifest.json` and TensorRT engine artifact `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine`. The output `results/jetson/yolov8n_jetson_tensorrt_manifest_smoke.json` reported `success: true`, `engine_backend: tensorrt`, `device_name: jetson`, `manifest_applied: true`, input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS.
- Runtime compare-key identity polish: InferEdgeRuntime now preserves Forge manifest source model identity for compare naming. If `manifest.source_model.path` is `models/onnx/yolov8n.onnx` and the explicit TensorRT artifact path is `model.engine`, Runtime can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`.
- Guided demo entrypoint: `scripts/demo_pipeline_full.sh` summarizes the full Forge -> Runtime -> Lab -> optional AIGuard flow and can print the Jetson TensorRT Runtime command without claiming production worker or SaaS readiness.
- Local Studio demo evidence: `/studio` can load bundled ONNX Runtime CPU and TensorRT Jetson Runtime result fixtures from `examples/studio_demo`, keep the demo pair selectable in Recent jobs while the local server process is alive, and show TensorRT Jetson vs ONNX Runtime CPU comparison in the browser. The fixture-backed evidence records ONNX Runtime CPU at mean 45.4299 ms / p99 49.2128 ms / 22.0119 FPS and TensorRT Jetson at mean 9.9375 ms / p99 15.5231 ms / 100.6293 FPS, a 4.57x TensorRT speedup for this demo pair.

The direct Runtime execution result includes `deployment_decision`. Its `unknown` value is expected before Lab compare/report because the worker response has not yet been compared by Lab.

Expand All @@ -136,6 +138,7 @@ Forge summary
- **SaaS-ready API + async job workflow:** Lab has API response contracts, in-memory async job stubs, and worker request/response mapping without prematurely adding DB/queue infrastructure.
- **Deterministic rule-based diagnosis:** AIGuard uses rule + evidence detectors instead of vague LLM judgement.
- **Deployment decision ownership:** Lab keeps final deploy/review/blocked ownership while preserving optional guard evidence.
- **Local-first Studio demo:** The browser UI can replay real validation evidence locally without adding DB, queue, upload, auth, billing, or production SaaS infrastructure.

## 8. Current Limitations and Next Steps

Expand All @@ -147,7 +150,7 @@ Current planned production work:
- full automated Forge/Runtime execution from a production Lab worker
- database, Redis, or queue
- file upload flow
- SaaS frontend
- production frontend beyond Local Studio
- production authentication, billing, and deployment controls

Next practical step:
Expand All @@ -164,5 +167,5 @@ Next practical step:
- "macOS ONNX Runtime CPU smoke와 Jetson Orin Nano TensorRT smoke를 모두 확보했고, Jetson에서는 Forge manifest + TensorRT `model.engine` + C++ Runtime CLI 실행으로 mean 약 14.00 ms, p99 약 15.50 ms, FPS 약 71.44 evidence를 확보했습니다."
- "Runtime source identity polish 이후에는 manifest-backed TensorRT engine artifact도 `compare_model_name=yolov8n`, `compare_key=yolov8n__b1__h640w640__fp32`를 유지할 수 있습니다."
- "AIGuard는 LLM 추측이 아니라 artifact hash, source hash, precision, shape 같은 evidence를 비교하는 deterministic detector 구조입니다."
- "아직 production worker, DB/Redis/queue, frontend, auth/billing은 계획 단계로 명확히 구분했고, 먼저 contract와 smoke coverage를 안정화했습니다."
- "아직 production worker, DB/Redis/queue, production frontend, auth/billing은 계획 단계로 명확히 구분했고, 먼저 contract와 smoke coverage를 안정화했습니다."
- "이 프로젝트는 AI inference engineer 포트폴리오 관점에서 C++ runtime, Python orchestration, schema contract, provenance validation, SaaS API boundary를 하나의 제품형 pipeline으로 연결한 사례입니다."
Loading
Loading