Docs Home | API | Configuration | Examples | Basic | Caching | Events | LLM | Agent-Native | Benchmarks | Ecosystem
This document explains how Qirrel processes input text internally.
Input text
-> Tokenizer
-> Pipeline components (normalize/clean/extract/segment)
-> QirrelContext output
-> optional cache reuse on future calls
All processors read/write one shared context object:
meta: request metadata (requestId,timestamp,source)memory: extension namespace for integration statellm: model metadata/safety metadatadata.text: current text valuedata.tokens: tokenizer outputdata.entities: extracted entities
Pipeline construction combines:
- config from
src/config/loader.ts - defaults from
src/config/defaults.ts - built-in processors in this typical order:
normalizecleanadvClean(optional)- extraction processors (email/phone/url/number)
segment
When cache is enabled and component is cacheable, Qirrel wraps the component with cache logic.
Pipeline.process(text) does this:
- check result cache by hashed text key
- tokenize and create initial context
- emit
RunStart - emit processor events around each component
- cache final result (if enabled)
- emit
RunEnd - on failure, emit
Errorthen rethrow
processBatch uses worker-style bounded concurrency:
- validates inputs and concurrency,
- processes texts in parallel,
- preserves original order in results.
If llm.enabled and API key are present:
- adapter is initialized asynchronously during
Pipelineconstruction, pipeline.init()andprocess()both await that initialization path.
Qirrel currently uses:
- pipeline result cache,
- component-level cache wrappers,
- adapter-level LLM response cache.
Contexts are cloned when entering/leaving caches to prevent mutation leaks.
Agent components are intentionally separate from core parsing:
AgentBridgefor tool registration/calling,- MCP request handler for JSON-RPC method handling,
- built-in Qirrel tool catalog for self-discovery.
See Agent-Native Integration for protocol-level details.
src/core/pipeline.ts: orchestration, events, batch, cache integrationsrc/core/Tokenizer.ts: tokenizationsrc/processors/*: deterministic processorssrc/config/*: config loading/defaults/env resolutionsrc/llms/*: adapter abstractions/providerssrc/agent/*: tool bridge + MCP handlersrc/utils/cache/*: cache primitives and wrappers