Specialized backend architecture skill for AI-native Python backends. Covers SSE streaming, Pydantic AI + LangGraph orchestration, MCP integration, tiered memory architecture, advanced RAG/GraphRAG, and Zero-Trust security for autonomous agents. Based on a 390-line research document analyzing the 2026 AI backend ecosystem, cross-checked against live sources (Playwright verification).
AI agents now write backend code autonomously. Without an opinionated, validated skill reference, agents hallucinate obsolete patterns, overuse bidirectional transports for one-way token streaming, choose standalone vector databases without relational context, or inject .env files into agent environments.
This skill is a validated, opinionated reference for Python backends powering LLM agents, AI APIs, streaming inference, and production AI infrastructure. It defines non-negotiable architectural rules for 2026.
Built from a 390-line research document — "Arquitectura de Backends Nativos de IA: Estándares, Protocolos y Patrones de Producción (Abril 2026)" — then cross-checked against live sources.
| Version | File | Size | When to Use |
|---|---|---|---|
| v3.0 (Current) | versions/v3.0/SKILL.md |
~55 lines + references | Compact runtime skill with curated references/ and May 2026 source index |
| v1.0 (Historical) | versions/v1.0/SKILL.md |
~373 lines | Preserved for backward compatibility; verify claims against v3 before reuse |
| v1.0-lite (Historical) | versions/v1.0-lite/SKILL.md |
~250 lines | Preserved for backward compatibility; v3.0 replaces it for active runtime ingestion |
- ✅ SSE token streaming —
EventSourceResponsedefault for one-way streams; WebSockets remain valid for bidirectional realtime use cases - ✅ OpenTelemetry — production observability baseline; GenAI semantic conventions are Development and require deliberate opt-in
- ✅ LangGraph 1.2.x + Pydantic AI 1.96.x — stateful graph workflows + typed agent framework
- ✅ MCP Integration — Streamable HTTP transport, JSON-RPC 2.0 error handling
- ✅ Tiered Memory (L0/L1/L2) — Redis for episodic, PostgreSQL for semantic/relational
- ✅ Advanced RAG/GraphRAG — Hybrid retrieval, RRF, knowledge graphs, vector quantization
- ✅ Zero-Trust Security — Firecracker MicroVMs, Credential Brokering Proxies, A-JWT
- ✅ Staff-Level Snippet — FastAPI + Pydantic AI + SSE + OTel + A-JWT in one module
# Clone into your skills directory
git clone https://github.com/kozz36/python-ai-backend-specialist.git
# Use the version that matches your need:
# - Full → detailed architectural planning
# - Lite → rapid stack selection under constraintsOpen versions/v3.0/SKILL.md for the runtime contract, then use versions/v3.0/references/technical-reference.md for detailed matrices. Key reference areas:
- Section 1 — Transport & Streaming (SSE vs WebSockets decision matrix)
- Section 5 — Tiered Memory Architecture (L0/L1/L2)
- Section 7 — Zero-Trust Security (Firecracker, Credential Proxies, A-JWT)
- Section 8 — Staff-Level Integration Snippet (production-ready FastAPI module)
- Section 9 — Architectural Dictates (5 non-negotiable rules)
versions/
├── v1.0/
│ └── SKILL.md # Historical full reference
├── v1.0-lite/
│ └── SKILL.md # Historical condensed reference
└── v3.0/
├── SKILL.md # Current compact runtime contract
└── references/
├── technical-reference.md
└── source-index.md
docs/
├── CHANGELOG.md # Verified version history
└── CONTRIBUTING.md # How to contribute improvements
Every claim in this skill was sourced from the 390-line research document and validated against live sources where possible:
| Claim | Verification Method | Status |
|---|---|---|
| SSE as default for one-way LLM token streams | FastAPI SSE docs + transport tradeoff review | ✅ Default, not exclusive |
EventSourceResponse in FastAPI 0.135.0+ |
GitHub issues, FastAPI release notes | ✅ Real |
| OpenTelemetry GenAI semconv | opentelemetry.io specs | |
| LangGraph 1.2.x graph workflows | PyPI + LangGraph docs | ✅ Real |
| Pydantic AI type-safe agents | pydantic.dev docs, Real Python tutorial | ✅ Real |
| MCP Streamable HTTP transport | WorkOS blog, MCP spec | ✅ Real |
| pgvector + pgvectorscale + pgai | GitHub repos, SoftwareSeni blog | ✅ Real |
| GraphRAG multi-hop retrieval | Project-specific eval required | |
| Firecracker <125ms cold start | Northflank blog, Firecracker docs | ✅ Real |
| A-JWT IETF draft | datatracker.ietf.org | ✅ Real (draft-goswami-agentic-jwt) |
Limitation: Version numbers and performance claims reflect the research document's citations as of April–May 2026. Always verify against live sources before production deployment. See
docs/CONTRIBUTING.mdfor verification requirements.
This skill is maintained as a living document. See docs/CONTRIBUTING.md for:
- How to propose additions (new frameworks, updated versions)
- Verification requirements before merging
- Style guide (tables > narrative, decision trees > lists)
Apache-2.0
Maintained by: @kozz36
Research base: "Arquitectura de Backends Nativos de IA: Estándares, Protocolos y Patrones de Producción (Abril 2026)" (390-line ecosystem analysis, 2026)