Skip to content

fix(export): preserve unicode characters in workflow filenames#4120

Open
Sprexatura wants to merge 1 commit intosimstudioai:stagingfrom
Sprexatura:fix/preserve-unicode-filenames
Open

fix(export): preserve unicode characters in workflow filenames#4120
Sprexatura wants to merge 1 commit intosimstudioai:stagingfrom
Sprexatura:fix/preserve-unicode-filenames

Conversation

@Sprexatura
Copy link
Copy Markdown

Summary

Workflow names containing Non-ASCII characters (like Korean) were being sanitized into dashes because of a restrictive regex in sanitizePathSegment.

This PR updates the regex to be Unicode-aware, allowing letters and numbers from any language while still stripping truly unsafe filesystem characters.

Changes

  • Updated sanitizePathSegment in apps/sim/lib/workflows/operations/import-export.ts
  • Added unit tests in apps/sim/lib/workflows/operations/import-export.test.ts

Fixes #4119

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

@Sprexatura is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 12, 2026

PR Summary

Low Risk
Low risk: isolated change to filename sanitization for ZIP exports, with added unit coverage; main risk is any downstream assumptions about ASCII-only filenames.

Overview
Updates sanitizePathSegment to use Unicode-aware character classes so ZIP export filenames preserve non-ASCII letters/numbers (e.g., Korean/Japanese) while still replacing unsafe characters and collapsing repeated dashes.

Adds Vitest coverage for ASCII, whitespace/special character replacement, Unicode preservation, and removal of filesystem-unsafe characters.

Reviewed by Cursor Bugbot for commit 0b274ef. Bugbot is set up for automated code reviews on this repo. Configure here.

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Apr 12, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 12, 2026

Greptile Summary

This PR fixes a bug where workflow names containing non-ASCII characters (e.g. Korean, Japanese) were fully collapsed to dashes during ZIP export because the original [^a-z0-9-_] regex had no Unicode awareness. The fix replaces it with /[^\p{L}\p{N}\-_]/gu, correctly preserving letters and digits from any script while still stripping filesystem-unsafe characters, and adds dash-collapsing for cleaner output.

Confidence Score: 5/5

Safe to merge — the regex change is correct and well-tested; remaining findings are all P2 style issues.

All three findings are P2 (relative import, stale comment, no-op .trim()). None affect runtime correctness or data integrity. The core fix is sound: Unicode property escapes with the u flag is the idiomatic approach, and the test suite covers ASCII, Korean, Japanese, and unsafe-character cases.

apps/sim/lib/workflows/operations/import-export.test.ts — relative import and stale comment need minor cleanup.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/operations/import-export.ts Regex updated to Unicode property escapes (\p{L}\p{N}) with the u flag, correctly preserving non-ASCII letters/digits; consecutive-dash collapsing added; .trim() is a no-op but harmless.
apps/sim/lib/workflows/operations/import-export.test.ts Good coverage of ASCII, Korean, Japanese, and unsafe chars; uses a relative import (violates project convention) and contains a now-stale "this currently fails" comment.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Input: workflow name] --> B["Replace unsafe chars\n/[^\\p{L}\\p{N}\\-_]/gu → '-'"]
    B --> C["Collapse consecutive dashes\n/-+/g → '-'"]
    C --> D[".trim() — no-op\n(whitespace already gone)"]
    D --> E[Sanitized path segment]

    B -->|"'한글' → '한글'"| C
    B -->|"'my workflow' → 'my-workflow'"| C
    B -->|"'work/flow?name*' → 'work-flow-name-'"| C
Loading

Reviews (1): Last reviewed commit: "fix(export): preserve unicode characters..." | Re-trigger Greptile

* @vitest-environment node
*/
import { describe, expect, it } from 'vitest'
import { sanitizePathSegment } from './import-export'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Relative import violates project conventions

The project requires absolute imports everywhere. This relative import should use the @/ alias.

Suggested change
import { sanitizePathSegment } from './import-export'
import { sanitizePathSegment } from '@/lib/workflows/operations/import-export'

Rule Used: Use established path alias patterns instead of dee... (source)

Learnt From
simstudioai/sim#233

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +20 to +21
it('should preserve Korean characters (BUG REPRODUCTION)', () => {
// This currently fails and returns '---'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale comment describes pre-fix behavior

The inline comment // This currently fails and returns '---' describes the old bug, but the test now passes after the fix. Leaving it in place is misleading for future readers.

Suggested change
it('should preserve Korean characters (BUG REPRODUCTION)', () => {
// This currently fails and returns '---'
it('should preserve Korean characters', () => {
expect(sanitizePathSegment('한글')).toBe('한글')

Comment on lines +51 to +54
return name
.replace(/[^\p{L}\p{N}\-_]/gu, '-')
.replace(/-+/g, '-')
.trim()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 .trim() is a no-op and doesn't strip edge dashes

String.prototype.trim() removes only whitespace. Because the first .replace() already converts every whitespace character to -, there is never any whitespace left to trim — .trim() is always a no-op here.

If the intent was to prevent filenames from starting or ending with a dash (e.g., !hello!hello instead of -hello-), the final call should strip edge dashes instead:

Suggested change
return name
.replace(/[^\p{L}\p{N}\-_]/gu, '-')
.replace(/-+/g, '-')
.trim()
return name
.replace(/[^\p{L}\p{N}\-_]/gu, '-')
.replace(/-+/g, '-')
.replace(/^-+|-+$/g, '')

@Sprexatura Sprexatura force-pushed the fix/preserve-unicode-filenames branch from 80c0b12 to c0c811b Compare April 12, 2026 12:07
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 12, 2026 0:07am

Request Review

Previously, Non-ASCII characters (like Korean) in workflow names were
replaced by dashes during export because of a restrictive regex.
This update uses a Unicode-aware regex to allow letters and numbers
from any language while still sanitizing unsafe filesystem characters.

fixes simstudioai#4119

Signed-off-by: JaeHyung Jang <jaehyung.jang@navercorp.com>
@Sprexatura Sprexatura force-pushed the fix/preserve-unicode-filenames branch from c0c811b to 0b274ef Compare April 12, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant