Skip to content

[duplicate-code] Legacy sanitize_content_old duplicates sanitize_content_core pipeline #7435

@github-actions

Description

@github-actions

🔍 Duplicate Code Detected: Dual sanitize content implementations

Analysis of commit b8c044b

Assignee: @copilot

Summary

Two separate sanitization pipelines exist: sanitize_content_old.cjs implements a full content sanitizer while sanitize_content_core.cjs provides the current implementation used by sanitize_content.cjs. The old file reimplements the same steps (command/mention neutralization, URL protocol/domain filtering, truncation) over hundreds of lines with no call sites, creating dead but duplicated logic likely to diverge from the active sanitizer.

Duplication Details

Pattern: Parallel sanitization logic maintained in two files

  • Severity: Medium
  • Occurrences: 2 (two full sanitizer implementations)
  • Locations:
    • pkg/workflow/js/sanitize_content_old.cjs (lines 106-401)
    • pkg/workflow/js/sanitize_content_core.cjs (lines 333-405 plus shared helpers)
  • Code Sample:
// sanitize_content_old.cjs (lines ~171-200)
const lines = sanitized.split("\n");
const maxLines = 65000;
maxLength = maxLength || 524288;
if (lines.length > maxLines) {
  const truncationMsg = "\n[Content truncated due to line count]";
  const truncatedLines = lines.slice(0, maxLines).join("\n") + truncationMsg;
  if (truncatedLines.length > maxLength) {
    sanitized = truncatedLines.substring(0, maxLength - truncationMsg.length) + truncationMsg;
  } else {
    sanitized = truncatedLines;
  }
} else if (sanitized.length > maxLength) {
  sanitized = sanitized.substring(0, maxLength) + "\n[Content truncated due to length]";
}
// sanitize_content_core.cjs (lines ~333-351)
function applyTruncation(content, maxLength) {
  maxLength = maxLength || 524288;
  const lines = content.split("\n");
  const maxLines = 65000;
  if (lines.length > maxLines) { /* same truncation path */ }
  else if (content.length > maxLength) { /* same length truncation */ }
  return content;
}

Both files also duplicate mention/command neutralization and URL protocol/domain filtering logic.

Impact Analysis

  • Maintainability: Bug fixes or safety improvements may be applied to one sanitizer and forgotten in the other, leaving dead code or future misuse.
  • Bug Risk: Accidental reuse of the outdated module would diverge from current security rules (e.g., allowed aliases vs full neutralization), making behavior unpredictable.
  • Code Bloat: Hundreds of lines of duplicate sanitization logic inflate the JS surface area without runtime benefit.

Refactoring Recommendations

  1. Remove or alias legacy module

    • Delete sanitize_content_old.cjs or make it re-export the core implementation to ensure a single sanitizer path.
    • Estimated effort: <1 hour; benefits: eliminates dead duplicate logic and drift risk.
  2. Document single entry point

    • Clearly export one public sanitizer (from sanitize_content.cjs) and add a small test ensuring no stale modules are used.
    • Estimated effort: 1 hour; benefits: future contributors avoid reviving outdated code.

Implementation Checklist

  • Review duplication findings
  • Prioritize refactoring tasks
  • Create refactoring plan
  • Implement changes
  • Update tests
  • Verify no functionality broken

Analysis Metadata

  • Analyzed Files: 2
  • Detection Method: Serena semantic code analysis
  • Commit: b8c044b
  • Analysis Date: 2024-XX-XX

AI generated by Duplicate Code Detector

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions