A stateless reverse proxy that intercepts LLM API requests, applies parallel regex and spaCy NER-based PII detection to tokenize sensitive data before forwarding to OpenAI/Anthropic, then de-anonymizes responses using ephemeral in-memory token mappings scoped to the conversation context.
Client Request Proxy LLM Provider
| | |
| POST /v1/proxy/openai/... | |
|---------------------------->| |
| | |
| +----------+----------+ |
| | Parallel Detection | |
| | Regex | NER(spaCy) | |
| +----------+----------+ |
| | |
| Tokenize: "John Smith" -> {{PERSON_1}} |
| "4532-1234..." -> {{CREDIT_CARD_1}} |
| | |
| | Anonymized request |
| |---------------------------------->|
| | |
| |<----------------------------------|
| | Response (may contain tokens) |
| | |
| De-anonymize using token mappings |
| | |
|<-----------------------------| |
| Response with original data | |
| Method | Implementation | Detects | Execution |
|---|---|---|---|
| Regex | Compiled patterns from patterns.json |
Credit cards, emails, SSN, phone, IP | Parallel |
| NER | spaCy en_core_web_sm |
Person names, organizations, locations | Parallel |
Both methods run concurrently via ThreadPoolExecutor. Results are merged with regex taking priority on overlapping spans.
| Type | Token Format | Detection Method |
|---|---|---|
CREDIT_CARD |
{{CREDIT_CARD_N}} |
Regex |
EMAIL |
{{EMAIL_N}} |
Regex |
SSN |
{{SSN_N}} |
Regex |
PHONE |
{{PHONE_N}} |
Regex |
IP_ADDRESS |
{{IP_ADDRESS_N}} |
Regex |
PERSON |
{{PERSON_N}} |
NER |
ORG |
{{ORG_N}} |
NER |
LOCATION |
{{LOCATION_N}} |
NER |
# Install dependencies
pip install -r requirements.txt
# Download spaCy model for NER
python -m spacy download en_core_web_smRequirements:
- Python 3.11+
- No external services (Redis, databases)
Create .env file:
PORT=8000
ENVIRONMENT=development
LOG_LEVEL=DEBUG
OPENAI_API_KEY=sk-... # Optional, for testing
ANTHROPIC_API_KEY=sk-ant-... # Optional, for testingStart the server:
uvicorn app.main:app --port 8000| Endpoint | Method | Description |
|---|---|---|
/v1/proxy/openai/chat/completions |
POST | OpenAI proxy with anonymization |
/v1/proxy/anthropic/messages |
POST | Anthropic proxy with anonymization |
/health |
GET | Health check |
curl -X POST http://localhost:8000/v1/proxy/openai/chat/completions \
-H "X-Target-API-Key: YOUR_OPENAI_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Contact John Smith at john@example.com"}
]
}'The X-Target-API-Key header contains the user's LLM provider API key (passed through, never stored).
Same sensitive value receives the same token within a conversation. The proxy scans the entire message array and maintains a value -> token mapping:
# Input messages
[
{"role": "user", "content": "Card: 4532-1234-5678-9010"},
{"role": "assistant", "content": "Saved"},
{"role": "user", "content": "Verify 4532-1234-5678-9010"}
]
# Both instances map to {{CREDIT_CARD_1}}This works statelessly because clients send full conversation history with each request.
app/
├── main.py # FastAPI application entry
├── config.py # Environment configuration
├── api/
│ └── proxy.py # Proxy endpoints
├── adapters/
│ ├── openai_adapter.py # OpenAI API integration
│ └── anthropic_adapter.py # Anthropic API integration
└── anonymization/
├── pattern_loader.py # Loads regex patterns from JSON
├── ner_detector.py # spaCy NER wrapper
└── tokenizer.py # Parallel detection + tokenization
Edit patterns.json to add regex patterns:
{
"patterns": [
{
"name": "credit_card",
"type": "CREDIT_CARD",
"regex": "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b"
}
]
}Patterns are loaded at startup. Restart required for changes.
Edit app/anonymization/ner_detector.py to modify entity type mappings:
ENTITY_TYPE_MAP = {
"PERSON": "PERSON",
"ORG": "ORG",
"GPE": "LOCATION",
"LOC": "LOCATION",
}- Token mappings exist only in memory during request lifecycle
- Mappings are discarded immediately after response is sent
- Sensitive values are never persisted to disk or logged
- LLM providers receive only tokens, never original values
| Metric | Value |
|---|---|
| Regex detection | ~0.5ms |
| NER detection | ~2-5ms |
| Total overhead | ~3-8ms (parallel execution) |
| Memory (spaCy model) | ~50MB |
| Cold start (model load) | ~1-2s (once at startup) |
Typical LLM API calls take 1-5 seconds; proxy overhead is <1% of total latency.
- Regex patterns may produce false positives on clustered numbers
- NER accuracy depends on spaCy model quality
- No streaming response support (SSE)
- No rate limiting
- No persistent audit logging
./tests/test_openai.sh
./tests/test_anthropic.sh
./tests/test_ner.sh
./tests/test_conversation_consistency.shMIT