BOR-519: Implement ordered event parser pipeline#28
BOR-519: Implement ordered event parser pipeline#28bdclaw2026 wants to merge 1 commit intoSTRRL:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and extensible event parsing mechanism, centralizing the logic for converting diverse log line formats into a standardized Event structure. This enhancement significantly improves the system's ability to ingest and process various log sources consistently, laying the groundwork for more reliable data analysis and monitoring. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces an ordered event parser pipeline, which is a well-structured approach to handling various log formats. The implementation covers JSON, logfmt, key-value, prefix extraction, and plain-text fallback, demonstrating a comprehensive parsing strategy. The included tests are thorough and effectively validate the precedence and fallback behaviors of the parsers. Overall, the changes are well-implemented and contribute positively to the codebase.
| timestampPrefixWithLevelPattern = regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?))\s+\[?([A-Za-z]+)\]?:?\b`) | ||
| timestampPrefixPattern = regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?))\b`) | ||
| levelPrefixPattern = regexp.MustCompile(`^\s*\[?([A-Za-z]+)\]?:?\b`) |
There was a problem hiding this comment.
The timestamp pattern is quite complex and is repeated in timestampPrefixWithLevelPattern and timestampPrefixPattern. Extracting this common sub-pattern into a separate named constant or a helper function could significantly improve readability and maintainability of these regular expressions. This would make it easier to understand and modify the timestamp matching logic in the future.
var (
timestampPattern = `[0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?)`
timestampPrefixWithLevelPattern = regexp.MustCompile(`^\s*(` + timestampPattern + `)\s+\[?([A-Za-z]+)\]?:?\b`)
timestampPrefixPattern = regexp.MustCompile(`^\s*(` + timestampPattern + `)\b`)
levelPrefixPattern = regexp.MustCompile(`^\s*\[?([A-Za-z]+)\]?:?\b`)
)| SourceFormat: SourceFormatPlainText, | ||
| Event: event, | ||
| }, true |
There was a problem hiding this comment.
When prefixLineParser successfully extracts a timestamp or level, it still categorizes the SourceFormat as SourceFormatPlainText. While this parser acts as a fallback before the ultimate plainTextLineParser, labeling it as PlainText might not fully reflect that structured information (timestamp, level) was successfully extracted. Consider introducing a more specific SourceFormatPrefix to better distinguish lines where prefixes were parsed from truly unstructured plain text lines. This would improve the granularity of source format identification.
Summary
pkg/eventValidation