Skip to content

BOR-519: Implement ordered event parser pipeline#28

Open
bdclaw2026 wants to merge 1 commit intoSTRRL:masterfrom
bdclaw2026:bdclaw/bor-519-phase-3parser-pipeline-implement-ordered-parsers-and
Open

BOR-519: Implement ordered event parser pipeline#28
bdclaw2026 wants to merge 1 commit intoSTRRL:masterfrom
bdclaw2026:bdclaw/bor-519-phase-3parser-pipeline-implement-ordered-parsers-and

Conversation

@bdclaw2026
Copy link
Contributor

Summary

  • add an ordered raw-line parser pipeline in pkg/event
  • support JSON, quoted logfmt, generic key=value, prefix extraction, and plain-text fallback
  • add parser tests for precedence, fallback behavior, and raw-line preservation

Validation

  • make unit-test
  • make lint

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and extensible event parsing mechanism, centralizing the logic for converting diverse log line formats into a standardized Event structure. This enhancement significantly improves the system's ability to ingest and process various log sources consistently, laying the groundwork for more reliable data analysis and monitoring.

Highlights

  • Event Parsing Pipeline: Implemented an ordered raw-line parser pipeline within the pkg/event package.
  • Supported Formats: Added support for parsing JSON, quoted logfmt, generic key=value pairs, prefix extraction, and plain-text fallback.
  • Testing: Included comprehensive parser tests to validate precedence, fallback behavior, and raw-line preservation.
Changelog
  • pkg/event/parser.go
    • Implemented the ParseLine function which orchestrates the parsing process through a chain of specialized parsers.
    • Defined jsonLineParser, logfmtLineParser, keyValueLineParser, prefixLineParser, and plainTextLineParser to handle different log formats.
    • Included utility functions for timestamp parsing, level canonicalization, and key-value assignment.
  • pkg/event/parser_test.go
    • Added TestParseLine_JSON to verify JSON parsing.
    • Added TestParseLine_Logfmt to verify logfmt parsing.
    • Added TestParseLine_KeyValue to verify key-value parsing.
    • Added TestParseLine_PrefixFallback to verify prefix-based parsing and fallback.
    • Added TestParseLine_PlainTextFallback to verify plain text fallback.
    • Added TestParseLine_OrderedParsers to confirm parser precedence rules.
    • Included helper functions assertAttr and assertTimestamp for test assertions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces an ordered event parser pipeline, which is a well-structured approach to handling various log formats. The implementation covers JSON, logfmt, key-value, prefix extraction, and plain-text fallback, demonstrating a comprehensive parsing strategy. The included tests are thorough and effectively validate the precedence and fallback behaviors of the parsers. Overall, the changes are well-implemented and contribute positively to the codebase.

Comment on lines +51 to +53
timestampPrefixWithLevelPattern = regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?))\s+\[?([A-Za-z]+)\]?:?\b`)
timestampPrefixPattern = regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?))\b`)
levelPrefixPattern = regexp.MustCompile(`^\s*\[?([A-Za-z]+)\]?:?\b`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timestamp pattern is quite complex and is repeated in timestampPrefixWithLevelPattern and timestampPrefixPattern. Extracting this common sub-pattern into a separate named constant or a helper function could significantly improve readability and maintainability of these regular expressions. This would make it easier to understand and modify the timestamp matching logic in the future.

var (
	timestampPattern = `[0-9]{4}-[0-9]{2}-[0-9]{2}(?:[T ][0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2})?)`
	timestampPrefixWithLevelPattern = regexp.MustCompile(`^\s*(` + timestampPattern + `)\s+\[?([A-Za-z]+)\]?:?\b`)
	timestampPrefixPattern          = regexp.MustCompile(`^\s*(` + timestampPattern + `)\b`)
	levelPrefixPattern              = regexp.MustCompile(`^\s*\[?([A-Za-z]+)\]?:?\b`)
)

Comment on lines +153 to +155
SourceFormat: SourceFormatPlainText,
Event: event,
}, true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When prefixLineParser successfully extracts a timestamp or level, it still categorizes the SourceFormat as SourceFormatPlainText. While this parser acts as a fallback before the ultimate plainTextLineParser, labeling it as PlainText might not fully reflect that structured information (timestamp, level) was successfully extracted. Consider introducing a more specific SourceFormatPrefix to better distinguish lines where prefixes were parsed from truly unstructured plain text lines. This would improve the granularity of source format identification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant