Problem
Current implementation splits logs by newline (\n), which breaks multiline log entries like:
- Rust/Java stack traces
- JSON blobs spanning multiple lines
- Error messages with context
Each line is treated as a separate log entry, causing patterns to fail and incorrect categorization.
Proposed Solution
Use a "first-line pattern" approach (same as Filebeat/Logstash/Fluentd):
- Detect entry start pattern - LLM analyzes sample logs and identifies regex that matches the START of a new log entry (e.g., timestamp + log level)
- Merge continuation lines - Lines not matching the pattern are appended to the previous entry
- Process merged entries - Patterns and pipeline work on complete log entries
Implementation
New function: detectFirstLinePattern()
- LLM call to detect the first-line regex
- Returns pattern like
^\d{4}-\d{2}-\d{2}.*?(DEBUG|INFO|WARN|ERROR)
New function: mergeMultilineEntries()
function mergeMultilineEntries(lines: string[], firstLinePattern: RegExp): string[] {
const entries: string[] = [];
let currentEntry = "";
for (const line of lines) {
if (firstLinePattern.test(line)) {
if (currentEntry) entries.push(currentEntry);
currentEntry = line;
} else {
currentEntry += "\n" + line;
}
}
if (currentEntry) entries.push(currentEntry);
return entries;
}
CLI flag
--first-line-pattern <regex> - skip LLM detection, use provided pattern
References
Problem
Current implementation splits logs by newline (
\n), which breaks multiline log entries like:Each line is treated as a separate log entry, causing patterns to fail and incorrect categorization.
Proposed Solution
Use a "first-line pattern" approach (same as Filebeat/Logstash/Fluentd):
Implementation
New function:
detectFirstLinePattern()^\d{4}-\d{2}-\d{2}.*?(DEBUG|INFO|WARN|ERROR)New function:
mergeMultilineEntries()CLI flag
--first-line-pattern <regex>- skip LLM detection, use provided patternReferences