Skip to content

[refactor] Semantic function clustering: duplicates and outliers in pkg/workflow #17762

@github-actions

Description

@github-actions

Automated semantic analysis of all non-test Go files in pkg/workflow/ (240+ files) and utility packages (pkg/stringutil, pkg/sliceutil, etc.) identified several concrete refactoring opportunities through function clustering and duplicate detection.

Overview

Metric Value
Go files analyzed 240+ (pkg/workflow) + 7 utility packages
Duplicate patterns found 4 concrete cases
Outlier functions found 2
Estimated code reduction ~700 lines
Detection method Serena semantic analysis + naming pattern clustering

Critical Issues

Issue 1: Duplicate Mount Syntax Validation

Two functions implement nearly identical mount string validation (source:dest:mode format) in separate files:

  • pkg/workflow/sandbox_validation.govalidateMountsSyntax()
  • pkg/workflow/mcp_config_validation.govalidateMCPMountsSyntax()

Both split on :, check for exactly 3 parts, and validate that mode is "ro" or "rw". The only differences are error types (NewValidationError vs fmt.Errorf) and that the sandbox version also validates non-empty source/dest fields.

Recommendation: Extract a shared validateMountFormat(mounts []string, toolName string) error function into validation_helpers.go or a new mounts_validation.go, then call it from both files with appropriate error wrapping.

Issue 2: Near-Identical missing_data.go and missing_tool.go

pkg/workflow/missing_data.go and pkg/workflow/missing_tool.go are structurally identical (~164 lines each, ~250 duplicated lines):

View duplicate struct definitions
// missing_data.go
type MissingDataConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

// missing_tool.go  (IDENTICAL structure, only type name differs)
type MissingToolConfig struct {
    BaseSafeOutputConfig   `yaml:",inline"`
    CreateIssue bool       `yaml:"create-issue,omitempty"`
    TitlePrefix string     `yaml:"title-prefix,omitempty"`
    Labels      []string   `yaml:"labels,omitempty"`
}

Both parseMissingDataConfig() / parseMissingToolConfig() and buildCreateOutputMissingDataJob() / buildCreateOutputMissingToolJob() follow exactly the same pattern with only the output type name and environment variable prefix differing (e.g., GH_AW_MISSING_DATA_MAX vs GH_AW_MISSING_TOOL_MAX).

Recommendation: Consider a shared BaseMissingOutputConfig type and a generic buildCreateOutputMissingJob(outputType string, config BaseMissingOutputConfig) builder, reducing ~250 lines to ~80 lines of shared code. noop.go follows the same outer parse pattern and could share the same boilerplate helper.

Issue 3: Safe Output Handler Parse Function Boilerplate (11 files)

The following files all implement the same three-part pattern — a config struct embedding BaseSafeOutputConfig + SafeOutputTargetConfig, a parseXxxConfig() method on *Compiler, and a build function:

add_comment.go, add_labels.go, add_reviewer.go, assign_milestone.go, assign_to_agent.go, assign_to_user.go, hide_comment.go, link_sub_issue.go, remove_labels.go, reply_to_pr_review_comment.go, resolve_pr_review_thread.go

View duplicated parse function pattern
// add_labels.go (lines 20-40)
func (c *Compiler) parseAddLabelsConfig(outputMap map[string]any) *AddLabelsConfig {
    if _, exists := outputMap["add-labels"]; !exists {
        return nil
    }
    addLabelsLog.Print("Parsing add-labels configuration")
    var config AddLabelsConfig
    if err := unmarshalConfig(outputMap, "add-labels", &config, addLabelsLog); err != nil {
        addLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &AddLabelsConfig{}
    }
    return &config
}

// remove_labels.go (lines 18-38) — near-identical
func (c *Compiler) parseRemoveLabelsConfig(outputMap map[string]any) *RemoveLabelsConfig {
    if _, exists := outputMap["remove-labels"]; !exists {
        return nil
    }
    removeLabelsLog.Print("Parsing remove-labels configuration")
    var config RemoveLabelsConfig
    if err := unmarshalConfig(outputMap, "remove-labels", &config, removeLabelsLog); err != nil {
        removeLabelsLog.Printf("Failed to unmarshal config: %v", err)
        return &RemoveLabelsConfig{}
    }
    return &config
}

The AssignToUserConfig and UnassignFromUserConfig structs are also nearly identical:

// Both have: BaseSafeOutputConfig, SafeOutputTargetConfig, Allowed []string, Blocked []string
// UnassignFromUser has one extra field: UnassignFirst *string

Recommendation: Extract a generic parse helper (similar to existing unmarshalConfig) that handles the nil-check, logging, and unmarshal boilerplate. Consider also whether assign_to_user.go and unassign_from_user.go could share a common base struct or live in one file.

Issue 4: SanitizeWorkflowName Duplication Across Packages

pkg/workflow/strings.go defines SanitizeWorkflowName() and SanitizeName(), while pkg/stringutil/identifiers.go defines NormalizeWorkflowName() — both sanitize workflow names for use in filenames/identifiers. The workflow package has a general-purpose SanitizeName() function with configurable options that overlaps with pkg/stringutil's sanitization family (SanitizeParameterName, SanitizePythonVariableName, SanitizeToolID).

Recommendation: Evaluate whether SanitizeName() with its options pattern should be moved to pkg/stringutil as the canonical sanitization primitive, with pkg/workflow/strings.go importing it. This would centralize sanitization logic.


Outlier Functions

View outlier details

Outlier 1: extractAddDirPaths() in copilot_engine.go

The function extractAddDirPaths() is a standalone package-level function in copilot_engine.go. All other engine files keep only constructor (NewXxxEngine()) and interface method implementations. This private helper function is an outlier for the file.

Recommendation: Move to copilot_engine_execution.go (where the directory add behavior is likely used), or to engine_helpers.go.

Outlier 2: generateCopilotSessionFileCopyStep() in copilot_logs.go

This function generates a YAML step for copying session files, but it lives in the log parsing file. It is closer in nature to execution/installation step generation than log parsing.

Recommendation: Move to copilot_engine_execution.go where other Copilot execution steps are generated.


Function Clusters (Well-Organized Areas)

View cluster analysis

The following areas show good organization and do not need refactoring:

Engine Interface Pattern (4 files): claude_engine.go, codex_engine.go, gemini_engine.go, copilot_engine.go all correctly implement the same engine interface. The per-engine split is intentional and appropriate.

Bundler Validation (3 files): bundler_runtime_validation.go, bundler_safety_validation.go, bundler_script_validation.go each validate genuinely distinct concerns (runtime mixing, bundle completeness, API compatibility) and are correctly separated.

Firewall Validation (2 files): firewall_validation.go (log-level validation) and network_firewall_validation.go (cross-field dependency validation like allow-urls requires ssl-bump) serve different validation dimensions and are appropriately split.

compiler_safe_outputs_ series (8 files):* Clear responsibility split by concern (job building, step building, environment setup, specialized handlers). Well-organized.

Utility Packages: pkg/sliceutil, pkg/mathutil, pkg/fileutil, pkg/gitutil, pkg/envutil, pkg/timeutil are all cleanly scoped with no cross-package duplication detected.


Recommendations by Priority

Priority 1 — High Impact, Low Risk

  1. Extract shared mount validation (validateMountsSyntax / validateMCPMountsSyntax) into a single function in validation_helpers.go
  2. Move extractAddDirPaths() from copilot_engine.go to copilot_engine_execution.go
  3. Move generateCopilotSessionFileCopyStep() from copilot_logs.go to copilot_engine_execution.go

Priority 2 — Medium Impact

  1. Extract generic parse helper for the repeated parseXxxConfig() boilerplate across the 11 safe output handler files
  2. Consolidate missing_data.go and missing_tool.go build functions via a shared base type or factory

Priority 3 — Long-term

  1. Evaluate moving SanitizeName() to pkg/stringutil as the canonical sanitization primitive

Implementation Checklist

  • Extract validateMountFormat() helper from sandbox and MCP mount validators
  • Relocate extractAddDirPaths() to copilot_engine_execution.go
  • Relocate generateCopilotSessionFileCopyStep() to copilot_engine_execution.go
  • Extract generic parseOutputConfig[T]() helper or similar to reduce parse boilerplate
  • Explore shared base for MissingDataConfig / MissingToolConfig
  • Evaluate consolidating sanitization primitives in pkg/stringutil
  • Ensure all tests pass after each refactoring step

References

  • Workflow run: §22281474970
  • Analysis date: 2026-02-22
  • Analysis scope: 240+ non-test .go files in pkg/workflow/, 7 utility packages

Generated by Semantic Function Refactoring

  • expires on Feb 24, 2026, 5:12 PM UTC

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions