feat: TensorRT FP16 depth estimation backend by solderzzc · Pull Request #158 · SharpAI/DeepCamera

solderzzc · 2026-03-15T19:53:01Z

TensorRT FP16 Backend for Depth Anything v2

Benchmark (RTX 4070 Laptop GPU, 518x518)

Backend	Avg (ms)	FPS	Speedup
PyTorch CUDA FP32	36.48	27.4	1x
TensorRT FP16	5.29	189.0	6.9x

Changes — purely additive, no existing code modified

transform.py (497 additions, 0 existing lines changed)

_load_tensorrt(), _build_trt_engine(), _infer_tensorrt()
Engine caching at ~/.aegis-ai/models/feature-extraction/trt_engines/
GPU-specific engine filenames (prevents cross-GPU issues)
bytes() wrapper for TRT 10.15 IHostMemory API
--backend CLI arg (auto/tensorrt/pytorch/coreml)
Graceful fallback: TRT → PyTorch CUDA → CPU

trt_benchmark.py (new file)

Standalone PyTorch vs TRT benchmark script

Config files

models.json: TRT FP16 variant for win32
requirements.txt: tensorrt>=10.0, onnxruntime-gpu (non-Darwin)
deploy.bat: TRT verification step
SKILL.md: Updated hardware backends table

What is NOT changed

CoreML backend (macOS) — untouched
PyTorch inference path — untouched
benchmark.py — untouched
No existing control flow modified

- Upgrade PyTorch CUDA wheels from cu124 to cu126 (RTX 4090/5090) - Fix _load_config() dropping CLI args (--model, --colormap, --blend-mode) - Add deploy.bat for Windows venv + CUDA setup - Add cross-platform benchmark.py (CoreML + PyTorch/CUDA/MPS/CPU) - Track models.json (platform model registry) - Bump depth-estimation version 1.1.0 → 1.2.0 in skills.json

- _load_tensorrt(), _build_trt_engine(), _infer_tensorrt() methods - Engine caching at ~/.aegis-ai/models/feature-extraction/trt_engines/ - GPU-specific engine filenames (prevents cross-GPU issues) - IHostMemory bytes() fix for TRT 10.15+ - Graceful fallback: TRT > PyTorch CUDA > CPU - Added --backend CLI arg - Added trt_benchmark.py for standalone benchmarking Benchmark: RTX 4070 Laptop GPU 518x518 PyTorch CUDA FP32: 36.48ms (27.4 FPS) TensorRT FP16: 5.29ms (189 FPS) — 6.9x faster

Intersteller-Apex changed the title ~~feat: cross-platform CUDA depth estimation improvements~~ feat: TensorRT FP16 + cross-platform CUDA depth estimation Mar 16, 2026

Intersteller-Apex added 2 commits March 15, 2026 18:48

feat: add TensorRT config, deps, and deploy verification

26b9ff2

Intersteller-Apex force-pushed the feature/depth-estimation-cuda-improvements branch from bbbc9fa to 26b9ff2 Compare March 16, 2026 01:49

Intersteller-Apex changed the title ~~feat: TensorRT FP16 + cross-platform CUDA depth estimation~~ feat: TensorRT FP16 depth estimation backend Mar 16, 2026

Intersteller-Apex closed this Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: TensorRT FP16 depth estimation backend#158

feat: TensorRT FP16 depth estimation backend#158
solderzzc wants to merge 3 commits intodevelopfrom
feature/depth-estimation-cuda-improvements

solderzzc commented Mar 15, 2026 •

edited by Intersteller-Apex

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

solderzzc commented Mar 15, 2026 • edited by Intersteller-Apex Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TensorRT FP16 Backend for Depth Anything v2

Benchmark (RTX 4070 Laptop GPU, 518x518)

Changes — purely additive, no existing code modified

What is NOT changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

solderzzc commented Mar 15, 2026 •

edited by Intersteller-Apex

Loading