Skip to content

feat: TensorRT FP16 backend for depth estimation#159

Merged
solderzzc merged 1 commit intodevelopfrom
feature/tensorrt-fp16-backend
Mar 16, 2026
Merged

feat: TensorRT FP16 backend for depth estimation#159
solderzzc merged 1 commit intodevelopfrom
feature/tensorrt-fp16-backend

Conversation

@Intersteller-Apex
Copy link
Collaborator

Summary

Adds TensorRT FP16 as an inference backend for the depth estimation skill.

3 files changed, 194 insertions, 0 deletions — purely additive, zero existing lines modified.

Changes

  • transform.py (+182 lines): TRT engine building, caching, and inference methods
  • models.json (+7 lines): TRT FP16 variant entry under win32
  • requirements.txt (+5 lines): Optional tensorrt>=10.0 dep (non-macOS)

How it works

Backend selection in load_model() tries each backend in order:

  1. CoreML (macOS only)
  2. TensorRT — import tensorrt fails fast if not installed
  3. PyTorch (fallback)

First run builds the engine (30-120s), subsequent runs load from cache (~instant).
Engine files are GPU-specific: ~/.aegis-ai/models/feature-extraction/trt_engines/

Protocol compliance

Follows the JSONL stdin/stdout protocol. Ready event emits backend: tensorrt.
No changes to transform_base.py, benchmark scripts, or CoreML/PyTorch code paths.

Benchmark (RTX 4070 Laptop)

Backend Avg Latency FPS
PyTorch CUDA FP32 36.48ms 27.4
TensorRT FP16 5.29ms 189.0

6.9x speedup

Copy link
Collaborator Author

@Intersteller-Apex Intersteller-Apex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lightweight changes

@solderzzc solderzzc merged commit 5bc4262 into develop Mar 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants