Skip to content

Latest commit

 

History

History
379 lines (289 loc) · 13.3 KB

File metadata and controls

379 lines (289 loc) · 13.3 KB

AppMapper Roadmap & Session Context

Last Updated: February 2026


What is AppMapper?

AppMapper is a semantic codebase understanding service that provides application context for vulnerability scanners. It does NOT detect vulnerabilities - it maps applications (routes, auth, data flows) so that CVE-Gen v2 and other tools can perform targeted security analysis.

Key Value: Reduces CVE-Gen scan costs from ~$4.50 to ~$0.60 (85% savings) by pre-indexing codebase knowledge.


Current State: WORKING

Completed Features (Production Ready)

Feature Status Files
Route Extraction DONE route_scanner.py
Auth Pattern Discovery DONE auth_discovery.py
Directory Classification DONE directory_classifier.py
SharedContext Export DONE shared_context.py
Web UI DONE ui/app.py, ui/templates/index.html
CVE-Gen Integration Docs DONE docs/CVE_GEN_INTEGRATION.md

Server Details

# Start server (port 8000)
cd C:\Users\pesmi\Desktop\appmapper
python run_server.py

# Or with full path
"C:\Users\pesmi\AppData\Local\Programs\Python\Python312\python.exe" run_server.py

Note: Port 6000 is blocked by Chrome (ERR_UNSAFE_PORT). Use port 8000.


In Progress: Universal Threat Model Generation

Overview

Adding universal threat model generation that works for ANY codebase - not just web APIs. Supports image processing libraries, network proxies (Envoy), cryptographic code, embedded systems, kernel drivers, and more.

Key Design Decisions

  1. Universal, Not OWASP-Specific: 16 threat categories that apply to any software
  2. Language-Aware: Different threats for C vs Python vs Rust based on memory safety
  3. Domain-Aware: Web API, image processing, crypto, networking have different threat landscapes
  4. Architecture-Aware: Library vs daemon vs CLI affects attack surface and threat amplification

Implementation Plan

See: plans/threat_model_implementation.md and docs/UNIVERSAL_THREAT_MODEL_DESIGN.md

Phases:

  1. Data Models (models.py) DONE - Universal types
  2. Knowledge Base (languages.py, domains.py, architectures.py) DONE
  3. Analysis Components (component_analyzer.py, threat_enumerator.py) DONE
  4. Attack Spec Generator (attack_spec_generator.py) NEXT
  5. Caching & API (cache.py, API endpoints)
  6. Testing & Documentation

Phase 1-3 Complete (Data Models + Knowledge Base + Analysis)

models.py - Universal threat types:

  • ThreatCategory enum with 16 categories (MEMORY_CORRUPTION, INJECTION, CRYPTO_WEAKNESS, etc.)
  • ArchitectureType, AttackSurface enums
  • Universal attack specs: HTTPRequestSpec, MalformedFileSpec, FuzzerHarnessSpec, RawPacketSpec, CLIInvocationSpec
  • Profile dataclasses: LanguageProfile, DomainProfile, ArchitectureProfile
  • CodebaseClassification for complete codebase analysis

languages.py - Language security profiles:

  • 10 languages: C, C++, Rust, Go, Python, JavaScript, Java, PHP, Ruby
  • Memory safety, type safety characteristics
  • Dangerous patterns with CWE mappings

domains.py - Domain profiles:

  • 12 domains: web_api, graphql, image_processing, cryptography, networking, database, cli_tools, file_parsing, authentication, embedded, kernel, ml_ai
  • Detection keywords/imports for auto-detection
  • Domain-specific threats and attack surfaces

architectures.py - Architecture profiles:

  • 10 types: library, cli_tool, network_daemon, web_service, microservice, plugin, driver, desktop_app, mobile_app, embedded
  • Threat amplification per architecture
  • Special security concerns

Usage Example

from appmapper.threat_modeling import (
    ThreatCategory,
    get_language_profile,
    get_domain_profile,
    is_memory_safe,
)

# Get language-specific threats
c_profile = get_language_profile("c")
print(c_profile.memory_safe)  # False
print(c_profile.inherent_threats)  # [MEMORY_CORRUPTION, TYPE_CONFUSION, ...]

# Get domain-specific threats
img_profile = get_domain_profile("image_processing")
print(img_profile.domain_threats)  # [MEMORY_CORRUPTION, RESOURCE_EXHAUSTION, ...]

component_analyzer.py - Codebase classification:

  • ComponentAnalyzer class for full codebase analysis
  • analyze_codebase(repo_path) convenience function
  • Language detection by file extension
  • Domain detection from imports and keywords
  • Architecture detection from signals
  • Dangerous pattern scanning
  • Threat weight calculation combining all signals
  • Output: CodebaseClassification

threat_enumerator.py - Threat generation:

  • ThreatEnumerator class for generating threats
  • enumerate_threats(classification) convenience function
  • Threat templates for all 16 categories
  • Attack tree generation with language/domain customization
  • Output: List of Threat objects with attack trees

Usage Example (Phase 3)

from appmapper.threat_modeling import (
    analyze_codebase,
    enumerate_threats,
    get_threat_summary,
    get_classification_summary,
)

# Analyze a codebase
classification = analyze_codebase("/path/to/repo")
print(get_classification_summary(classification))
# Output includes: languages, domains, architecture, top threats

# Generate concrete threats
threats = enumerate_threats(classification)
print(get_threat_summary(threats))
# Output: 30 threats with attack trees, CWE IDs, affected components

Design Document

See: docs/UNIVERSAL_THREAT_MODEL_DESIGN.md

Contains:

  • Universal ThreatCategory definitions (not OWASP-specific)
  • Language, domain, architecture profile specifications
  • Code pattern detection strategies
  • Example outputs for image libraries, network proxies, web APIs

NEW: Semantic Threat Modeling with RAG

Overview

Story-driven threat modeling that understands applications from a business perspective:

  • What IS this app? (e-commerce, banking, social media)
  • What can users DO? (browse products, transfer money, post content)
  • What would an attacker WANT? (free products, steal funds, data theft)

Key Components (All DONE)

Component Status Files
FP Reduction DONE component_analyzer.py - stricter thresholds, domain conflicts
LLM Validator DONE llm_validator.py - LLM-based validation (~$0.001/call)
Semantic Analyzer DONE semantic_analyzer.py - story-driven threat modeling
Knowledge Graph Schema DONE knowledge_graph/schema.py
OWASP/CWE Scrapers DONE knowledge_graph/scrapers/
RAG Retrieval DONE knowledge_graph/rag.py

FP Reduction Results

Metric Before After
juice-shop domains 12 3
juice-shop threats 30 10
juice-shop architecture library web_service

Semantic Analyzer Usage

from appmapper.threat_modeling.semantic_analyzer import (
    SemanticAnalyzer,
    analyze_semantically,
    print_semantic_model,
)

# Analyze with RAG augmentation (default)
model = analyze_semantically(
    routes=[{"path": "/api/cart", "method": "POST"}, ...],
    app_name="juice-shop",
)

# Get business-aware threat model
print(model.app_category)      # ECOMMERCE
print(model.business_flows)    # [Checkout Flow, User Registration, ...]
print(model.attacker_stories)  # [Price Manipulation, IDOR, ...]

Knowledge Graph RAG

RAG retrieves relevant attack patterns from OWASP/CWE knowledge base to ground LLM threat generation:

from appmapper.threat_modeling.knowledge_graph import ThreatRAG

rag = ThreatRAG()
context = rag.retrieve(
    app_keywords=["shop", "cart", "checkout"],
    detected_endpoints=["/api/products", "/api/cart"],
)

# Returns:
# - App type: E-commerce
# - Matching flows: Checkout Flow, User Registration, Search
# - Relevant attacks: Price Manipulation, IDOR, Cart Manipulation
# - Suggested CWEs: CWE-639, CWE-352, CWE-362, CWE-472

Data Sources

  • OWASP Top 10 2021: 10 categories, 20+ attack patterns
  • CWE: 20 top web security weaknesses with techniques/mitigations
  • Seed Data: 5 app types, 8 flow templates, 4 e-commerce attacks

Project Structure

C:\Users\pesmi\Desktop\appmapper\
├── run_server.py              # Server entry point (port 8000)
├── ROADMAP.md                 # THIS FILE - project state & roadmap
├── README.md                  # Project overview
├── src/
│   └── appmapper/
│       ├── ui/
│       │   ├── app.py         # Flask app with all API endpoints
│       │   └── templates/
│       │       └── index.html # Web UI
│       ├── route_scanner.py   # Route extraction (supports many frameworks)
│       ├── auth_discovery.py  # Auth pattern detection
│       ├── directory_classifier.py
│       ├── shared_context.py  # CVE-Gen compatible output
│       ├── threat_modeling/   # Universal threat modeling system
│       │   ├── __init__.py    # Module exports + API functions
│       │   ├── models.py      # Universal data models (DONE)
│       │   ├── languages.py   # Language security profiles (DONE)
│       │   ├── domains.py     # Domain security profiles (DONE)
│       │   ├── architectures.py # Architecture profiles (DONE)
│       │   ├── owasp.py       # OWASP mappings (for web domain)
│       │   ├── component_analyzer.py  # Codebase classifier (DONE)
│       │   ├── threat_enumerator.py   # Threat generator (DONE)
│       │   ├── llm_validator.py       # LLM-based validation (DONE)
│       │   ├── semantic_analyzer.py   # Story-driven threat modeling (DONE)
│       │   ├── knowledge_graph/       # Threat model knowledge graph
│       │   │   ├── schema.py          # Graph entities (DONE)
│       │   │   ├── rag.py             # RAG retrieval (DONE)
│       │   │   └── scrapers/          # Data collectors (DONE)
│       │   │       ├── owasp.py       # OWASP Top 10 data
│       │   │       ├── cwe.py         # CWE weakness data
│       │   │       └── seed_data.py   # Pre-built app types & flows
│       │   ├── attack_spec_generator.py # Phase 4 - NEXT
│       │   └── cache.py       # Phase 5
│       └── ...
├── docs/
│   ├── CVE_GEN_INTEGRATION.md # Main API documentation
│   ├── QUICKSTART.md
│   ├── IMPLEMENTATION_GUIDELINES.md
│   ├── appmapper_client.py    # Python client library
│   ├── THREAT_MODEL_DESIGN.md # Original OWASP-focused design
│   └── UNIVERSAL_THREAT_MODEL_DESIGN.md # Universal design (NEW)
└── plans/
    └── threat_model_implementation.md

Key API Endpoints

Endpoint Purpose
POST /api/v2/scan-repo Scan repo for routes, auth, access control
POST /api/v2/generate-shared-context Generate CVE-Gen compatible JSON
POST /api/v2/export-shared-context Save SharedContext to file
POST /api/v2/classify-directories Classify directories by purpose
POST /api/v2/threat-model/generate NEW - Generate threat model

Recent Fixes Applied

  1. Spring annotations - Handle @GetMapping without explicit paths
  2. Wildcard URL filtering - Filter /**, /* patterns
  3. Double-prefix bug - Fixed paths like /.well-known/jwks/.well-known/jwks
  4. JavaScript type error - Fixed request_content_type.replace in UI

Environment

  • Python: 3.12 (C:\Users\pesmi\AppData\Local\Programs\Python\Python312\python.exe)
  • API Key: LLM API key required for query and validation features
  • Port: 8000 (NOT 6000 - blocked by Chrome)
  • Platform: Windows

Next Steps

If continuing threat model implementation:

  1. Phase 4 is next: Create attack_spec_generator.py - generates domain-appropriate attack specs (HTTP, fuzzer harness, malformed files, CLI invocations) based on threats
  2. Add API endpoints for threat model generation in ui/app.py
  3. Create cache.py for caching threat model results

Completed:

  • Phase 1-3: Data models, knowledge base, component analyzer, threat enumerator
  • Tested on juice-shop (JavaScript/web) and opentofu (Go/CLI)

Key file to create next:

  • src/appmapper/threat_modeling/attack_spec_generator.py - Generate ready-to-use attack specifications

If fixing bugs:

  1. Check server is running: curl http://127.0.0.1:8000/
  2. Check logs in background task output

If adding features:

  1. Update this ROADMAP.md
  2. Add to implementation plan if significant

Test Repositories

Located in src/appmapper/ui/.repos/:

  • juice-shop - OWASP Juice Shop (Express.js)
  • terrakube - Terraform automation (Spring Boot/Java)
  • VAmPI - Vulnerable API (Flask)
  • opentofu - Infrastructure as Code (Go)

Related Projects

  • CVE-Gen v2: C:\Users\pesmi\Desktop\code-analysis\codeql-dashboard
  • Integration spec: docs/APPMAP_AUTHZ_TECHNICAL_SPEC.md (in CVE-Gen)

Commands Reference

# Start server
python run_server.py

# Test server
curl http://127.0.0.1:8000/

# Scan a repo
curl -X POST http://localhost:8000/api/v2/scan-repo \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "C:/path/to/repo"}'

# Generate SharedContext
curl -X POST http://localhost:8000/api/v2/generate-shared-context \
  -H "Content-Type: application/json" \
  -d '{"repo_path": "C:/path/to/repo"}'