The KBA Drafter uses OpenAI's native structured output feature to generate Knowledge Base Articles from support tickets. This replaces the previous Ollama-based implementation.
Add to .env:
# OpenAI API Key (required)
OPENAI_API_KEY=sk-proj-...
# Model (required - must support structured output)
OPENAI_MODEL=gpt-4o-miniThe KBA Drafter requires models that support OpenAI's beta.chat.completions.parse() feature:
- ✅
gpt-4o-mini(recommended - cost-effective) - ✅
gpt-4o - ✅
gpt-4o-2024-08-06or newer
❌ Not supported:
gpt-3.5-turbogpt-4(older versions)- Any model released before August 2024
The KBA output structure is defined as a Pydantic model in backend/kba_output_models.py:
class KBAOutputSchema(BaseModel):
"""Schema for LLM-generated KBA content"""
title: str = Field(min_length=10, max_length=200)
symptoms: list[str] = Field(min_length=1)
resolution_steps: list[str] = Field(min_length=1)
tags: list[str] = Field(min_length=2, max_length=10)
# ... more fieldsThe LLM service (backend/llm_service.py) uses OpenAI's native structured output:
completion = await client.beta.chat.completions.parse(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
response_format=KBAOutputSchema # Pydantic model
)
# Returns validated Pydantic object
result = completion.choices[0].message.parsedBenefits:
- ✅ Automatic JSON parsing (no regex needed)
- ✅ Guaranteed schema compliance
- ✅ Built-in Pydantic validation
- ✅ No manual retry logic needed
- ✅ Type-safe output
Ticket Data → Prompt Builder → OpenAI API
↓
Structured Output
↓
Pydantic Validation
↓
@field_validator checks
↓
KBADraft object
Validation Layers:
- OpenAI Schema Enforcement: API ensures JSON structure matches Pydantic model
- Pydantic Type Validation: Checks field types (str, list, int, etc.)
- Field Validators: Custom validation rules:
symptoms: Each ≥10 charactersresolution_steps: Each ≥10 characterstags: Lowercase alphanumeric with hyphens, ≥2 charactersrelated_tickets: Must matchINC0001234format
The LLM service maps OpenAI exceptions to custom KBA exceptions:
| OpenAI Exception | KBA Exception | HTTP Status | Description |
|---|---|---|---|
APIConnectionError |
LLMUnavailableError |
503 | Cannot reach OpenAI API |
APITimeoutError |
LLMTimeoutError |
504 | Request timeout (default 60s) |
RateLimitError |
LLMRateLimitError |
429 | Rate limit exceeded |
AuthenticationError |
LLMAuthenticationError |
401 | Invalid API key |
BadRequestError |
(preserved) | 400 | Schema/request error |
Error handlers are defined in backend/app.py.
kba_service.py (Orchestrator)
↓
llm_service.py (OpenAI Client)
↓
kba_output_models.py (Pydantic Schema)
↓
OpenAI API (beta.chat.completions.parse)
| Component | Purpose |
|---|---|
llm_service.py |
OpenAI AsyncClient wrapper, structured output |
kba_output_models.py |
Pydantic models for validation |
kba_service.py |
Business logic, orchestration |
kba_exceptions.py |
Custom exceptions |
kba_prompts.py |
Prompt engineering |
The LLM service uses a singleton pattern for efficiency:
from llm_service import get_llm_service
llm = get_llm_service() # Reuses same instanceRun KBA Drafter tests:
cd backend
pytest tests/test_llm_service.py -v
pytest tests/test_kba_schema.py -v
pytest tests/test_kba_publishing.py -vCheck if OpenAI API is accessible:
curl http://localhost:5000/api/kba/healthExpected response:
{
"llm_available": true,
"llm_provider": "openai",
"model": "gpt-4o-mini"
}curl -X POST http://localhost:5000/api/kba/drafts \
-H "Content-Type: application/json" \
-d '{
"ticket_id": "INC0001234",
"user_id": "test@example.com"
}'Removed:
- ❌
ollama_service.py→ deprecated - ❌
ollama_structured_output.py→ deprecated - ❌ Manual JSON parsing with regex
- ❌ Retry logic with repair prompts
- ❌
OLLAMA_BASE_URL,OLLAMA_MODEL,OLLAMA_TIMEOUTenv vars
Added:
- ✅
llm_service.py- OpenAI client - ✅
kba_output_models.py- Pydantic schemas - ✅ Native structured output
- ✅
OPENAI_API_KEY,OPENAI_MODELenv vars
Unchanged:
- ✅ Datenmodelle (
kba_models.py) - ✅ Business-Logik (
kba_service.pyorchestration) - ✅ Audit-Logging (
kba_audit.py) - ✅ Publish-Flow (
kb_adapters.py) - ✅ UI-Flow (Frontend)
Old (.env):
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:1b
OLLAMA_TIMEOUT=60New (.env):
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o-miniCause: Missing or empty OPENAI_API_KEY in .env
Solution:
- Get API key from https://platform.openai.com/api-keys
- Add to
.env:OPENAI_API_KEY=sk-proj-... - Restart backend
Cause: Too many API requests in short time
Solutions:
- Wait 60 seconds and retry
- Check OpenAI dashboard for rate limits
- Upgrade OpenAI plan if needed
Cause: Request took > 60 seconds (default timeout)
Solutions:
- Retry request (transient issue)
- Check OpenAI status page
- Simplify prompt if too long
Cause: Using unsupported model (e.g., gpt-3.5-turbo)
Solution: Update .env to use supported model:
OPENAI_MODEL=gpt-4o-miniCause: LLM output doesn't match expected schema
These should be rare with OpenAI's structured output. If persistent:
- Check prompt in
kba_prompts.py - Review schema in
kba_output_models.py - Check OpenAI dashboard for model issues
Typical KBA generation:
- Prompt: ~1,000-2,000 tokens (ticket + guidelines + schema)
- Completion: ~500-1,000 tokens (KBA content)
- Total: ~1,500-3,000 tokens per generation
gpt-4o-mini (recommended):
- Input: $0.150 per 1M tokens
- Output: $0.600 per 1M tokens
- ~$0.0015 per KBA generation
gpt-4o:
- Input: $5.00 per 1M tokens
- Output: $15.00 per 1M tokens
- ~$0.05 per KBA generation
Recommendation: Use gpt-4o-mini for cost-effectiveness. Quality is sufficient for most KBAs.