|
| 1 | +# MultilingualCL: Modernization Vision |
| 2 | + |
| 3 | +## Context |
| 4 | + |
| 5 | +MultilingualCL was conceived as a modern multilingual command line where everything -- commands, keywords, variable names, dates, numbers -- can be expressed in any language. First presented at [Capitole du Libre (2017)](https://doi.org/10.6084/m9.figshare.5661853.v1) and [DebConf20 (2020)](https://figshare.com/articles/presentation/Building_a_Multilingual_Command_Line/12857780), the project established strong conceptual foundations. |
| 6 | + |
| 7 | +This document proposes a modernization vision that builds on those foundations and incorporates advances in LLMs, NLP, and modern development practices. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## What the Project Already Has |
| 12 | + |
| 13 | +These existing assets are remarkably forward-looking and should be central to any modernization: |
| 14 | + |
| 15 | +| Asset | File(s) | Value | |
| 16 | +|-------|---------|-------| |
| 17 | +| **Semantic action verb taxonomy** | `resources/en/actions.md` (300+ verbs in 30 categories) | Encodes the *intent* layer (create, delete, search, compress...) | |
| 18 | +| **Resource type ontology** | `resources/en/resources.md` (50+ types) | Encodes the *target* layer (file, process, network, CPU, memory...) | |
| 19 | +| **Action-resource composition** | `resources/actions.json`, `resources/commandline.json` | Maps verbs to valid resource targets (e.g., "create" -> [user, group, file, directory]) | |
| 20 | +| **Full French translations** | `resources/fr/commandes.yaml` | Proof of concept for complete language coverage | |
| 21 | +| **Command documentation** | `resources/en/commands.yaml` | Maps natural language descriptions to Linux commands | |
| 22 | +| **428 language registry** | `languages.md` | Comprehensive language code catalog | |
| 23 | +| **YAML command map schema** | `resources/yaml/command_map.yaml` | Extensible per-locale command definition format | |
| 24 | +| **Object model** | `multilingualcl/model.py` | Clean class hierarchy (Command, Argument, SubCommand) | |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## Current Limitations |
| 29 | + |
| 30 | +### Architecture |
| 31 | +- **Static 1:1 mapping**: Every command variant must be manually defined per language in YAML |
| 32 | +- **No semantic understanding**: Parser does exact string matching only |
| 33 | +- **Unused data**: Action verbs, resources, and French command docs are not connected to the runtime pipeline |
| 34 | +- **Flat parsing**: argparse-based, cannot handle natural language or compositional commands |
| 35 | +- **Single locale per session**: Detected from system locale, no runtime switching |
| 36 | +- **No error intelligence**: Direct `subprocess.run()` with no output parsing or error recovery |
| 37 | + |
| 38 | +### Code Quality |
| 39 | +- **Incomplete parsing**: Positional argument handling is unfinished (`parser.py:67-72`) |
| 40 | +- **No input sanitization**: Commands passed directly to subprocess |
| 41 | +- **Hardcoded paths**: Resource loading uses relative paths |
| 42 | +- **No caching**: Command map reloaded on every REPL iteration |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +## Modernization Vision |
| 47 | + |
| 48 | +### Core Principle: Action + Resource + Modifier = Command |
| 49 | + |
| 50 | +The project already has the ingredients for a **semantic command model**: |
| 51 | + |
| 52 | +``` |
| 53 | +ACTION (verb) + RESOURCE (noun) + MODIFIER (adjective) = LINUX COMMAND |
| 54 | +──────────────── ───────────────── ──────────────────── ───────────── |
| 55 | +create directory - mkdir |
| 56 | +delete file force rm -f |
| 57 | +list process all ps aux |
| 58 | +show memory - free |
| 59 | +search file by name find . -name |
| 60 | +compress directory recursive tar -czvf |
| 61 | +``` |
| 62 | + |
| 63 | +This already exists in the data (`actions.json` maps verbs to resources, `commands.yaml` maps descriptions to commands). The modernization is about **activating these connections** and making them multilingual and intelligent. |
| 64 | + |
| 65 | +--- |
| 66 | + |
| 67 | +### Layer 1: LLM-Powered Intent Resolution |
| 68 | + |
| 69 | +Replace static YAML lookup with semantic understanding. |
| 70 | + |
| 71 | +**Current flow:** |
| 72 | +``` |
| 73 | +user types "af -t" → exact match in YAML → "ls -a" |
| 74 | +``` |
| 75 | + |
| 76 | +**Modern flow:** |
| 77 | +``` |
| 78 | +user types "montre-moi les fichiers cachés" |
| 79 | + → language detection (French) |
| 80 | + → intent extraction: ACTION=list, RESOURCE=file, MODIFIER=hidden |
| 81 | + → command resolution: ls -la |
| 82 | + → safety check → execute |
| 83 | +``` |
| 84 | + |
| 85 | +**Three-tier LLM strategy:** |
| 86 | + |
| 87 | +| Tier | When | Model | Latency | |
| 88 | +|------|------|-------|---------| |
| 89 | +| **Local/fast** | Tab completion, known commands | Small model (<1B) or rule-based | <50ms | |
| 90 | +| **Balanced** | Intent parsing, translation | Local 7B model (Ollama/llama.cpp) | <500ms | |
| 91 | +| **Cloud** | Complex queries, error diagnosis, unknown languages | Claude/GPT API | 1-3s | |
| 92 | + |
| 93 | +**Offline-first**: Core translation must work without internet. The existing YAML maps serve as the offline fallback, with LLM augmentation when available. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +### Layer 2: Activate the Semantic Data |
| 98 | + |
| 99 | +The existing `actions.md`, `resources.md`, `actions.json`, and `commandline.json` files should become the **knowledge graph** that powers command resolution: |
| 100 | + |
| 101 | +1. **Action verb synonyms** (300+ verbs): Feed to the LLM as context so "erase", "remove", "delete", "wipe" all resolve to the DELETE action |
| 102 | +2. **Resource types** (50+ types): Constrain what actions apply to what resources |
| 103 | +3. **Action-resource combinations** (`actions.json`): Already maps which resources each action can target |
| 104 | +4. **Command catalog** (`commands.yaml`): Already maps natural-language descriptions to actual commands |
| 105 | + |
| 106 | +This transforms the multilingual pipeline from: |
| 107 | +``` |
| 108 | +French YAML → English YAML → Linux Command |
| 109 | +``` |
| 110 | + |
| 111 | +To: |
| 112 | +``` |
| 113 | +Any Language → Action/Resource/Modifier (language-agnostic) → Linux Command |
| 114 | +``` |
| 115 | + |
| 116 | +--- |
| 117 | + |
| 118 | +### Layer 3: Multilingual Expansion Strategy |
| 119 | + |
| 120 | +**Current**: 2 locales, manual YAML translation per command per language. |
| 121 | + |
| 122 | +**Proposed tiered approach:** |
| 123 | + |
| 124 | +| Tier | Languages | Method | Quality | |
| 125 | +|------|-----------|--------|---------| |
| 126 | +| **Core** (5-10) | en, fr, es, zh, ar, hi, de, pt, ja, ko | Human-curated translations | Gold standard | |
| 127 | +| **Community** (50+) | Languages with active contributors | Community translation + review | Verified | |
| 128 | +| **LLM-generated** (428+) | All documented languages | LLM translation with confidence scores | Best-effort, marked as auto-translated | |
| 129 | + |
| 130 | +**Key design decisions:** |
| 131 | +- Translate **concepts** (action verbs, resource names), not individual command strings |
| 132 | +- Use Unicode CLDR for locale data (dates, numbers, sorting) |
| 133 | +- Support bidirectional text (Arabic, Hebrew) in terminal output |
| 134 | +- Allow CJK input methods |
| 135 | +- Respect cultural formatting (Arabic-Indic numerals, Chinese numerals, etc.) |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +### Layer 4: Modern REPL Experience |
| 140 | + |
| 141 | +**Current**: Basic readline with termcolor. |
| 142 | + |
| 143 | +**Proposed features:** |
| 144 | + |
| 145 | +| Feature | Description | |
| 146 | +|---------|-------------| |
| 147 | +| **Rich TUI** | Syntax highlighting, tables, progress bars | |
| 148 | +| **Intelligent autocomplete** | Context-aware suggestions based on action+resource model | |
| 149 | +| **Fuzzy matching** | "commti" -> "Did you mean: commit?" (in user's language) | |
| 150 | +| **Output translation** | Translate error messages and help text into user's language | |
| 151 | +| **Command preview** | Show the actual Linux command before execution (transparency) | |
| 152 | +| **Safety tiers** | Safe commands auto-execute; dangerous ones require confirmation | |
| 153 | +| **Session memory** | Remember what user did earlier for contextual suggestions | |
| 154 | +| **Multi-language mixing** | Handle code-switching (e.g., "git ajouter fichier.txt") | |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +### Layer 5: Safety and Transparency |
| 159 | + |
| 160 | +**Current**: Direct `subprocess.run()` with no checks. |
| 161 | + |
| 162 | +**Proposed risk-tiered execution:** |
| 163 | + |
| 164 | +| Risk Level | Commands | Behavior | |
| 165 | +|------------|----------|----------| |
| 166 | +| **Safe** | ls, pwd, cat, git status | Execute, show output | |
| 167 | +| **Moderate** | git commit, npm install, mv | Show preview, execute on enter | |
| 168 | +| **Dangerous** | rm, chmod 777, git push -f | Explain impact in user's language, require "yes" | |
| 169 | +| **Destructive** | rm -rf, mkfs, DROP TABLE | Double confirmation, show undo options | |
| 170 | + |
| 171 | +**Dry-run mode**: Always available to preview what a command would do. |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +### Layer 6: Extensibility Architecture |
| 176 | + |
| 177 | +**Current**: Monolithic YAML files. |
| 178 | + |
| 179 | +**Proposed plugin system:** |
| 180 | + |
| 181 | +- **Command packs**: Installable bundles (e.g., `multilingualcl-docker`, `multilingualcl-kubernetes`) |
| 182 | +- **Language packs**: Community-contributed translation sets |
| 183 | +- **Output formatters**: JSON, table, tree views |
| 184 | +- **Shell integrations**: Bash/Zsh/Fish shell plugins |
| 185 | +- **IDE integrations**: VS Code extension, JetBrains plugin |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +### Layer 7: Voice and Accessibility |
| 190 | + |
| 191 | +- **Voice input**: Spoken commands in any language |
| 192 | +- **Voice output**: Read results aloud (accessibility) |
| 193 | +- **Screen reader support**: WCAG 2.1 AA compliance |
| 194 | +- **High contrast mode**: For terminal accessibility |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Technical Modernization |
| 199 | + |
| 200 | +| Component | Current | Proposed | |
| 201 | +|-----------|---------|----------| |
| 202 | +| Python version | 3.6+ | 3.11+ (pattern matching, better typing, tomllib) | |
| 203 | +| Terminal UI | readline + termcolor | Rich/Textual/Prompt Toolkit | |
| 204 | +| Parsing | argparse (static) | Semantic parser + LLM fallback | |
| 205 | +| Config format | YAML only | TOML for config, YAML for command maps, JSON Schema validation | |
| 206 | +| i18n framework | Custom locale detection | ICU via PyICU + Unicode CLDR | |
| 207 | +| Testing | unittest | pytest + property-based testing (hypothesis) | |
| 208 | +| Packaging | setup.py | pyproject.toml | |
| 209 | +| CI/CD | GitHub Actions (basic) | Matrix testing (multiple Python versions, OSes) | |
| 210 | +| Documentation | Sphinx (not built) | MkDocs Material with multilingual versions | |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## Phased Roadmap |
| 215 | + |
| 216 | +### Phase 1: Foundation (Activate Existing Data) |
| 217 | +- Connect `actions.json`, `resources.md`, and `commands.yaml` to the runtime pipeline |
| 218 | +- Build the Action + Resource + Modifier -> Command resolution engine |
| 219 | +- Modernize REPL with Rich/Textual |
| 220 | +- Add safety confirmation layer |
| 221 | +- Expand command coverage to top 50 commands |
| 222 | +- Add 3-5 more core languages (es, de, zh, ar, hi) |
| 223 | + |
| 224 | +### Phase 2: Intelligence (Add LLM Layer) |
| 225 | +- Integrate local LLM for intent parsing |
| 226 | +- Add language auto-detection |
| 227 | +- Implement fuzzy matching and smart suggestions |
| 228 | +- Add output translation (error messages, help text) |
| 229 | +- Build command preview/dry-run mode |
| 230 | +- Session context and memory |
| 231 | + |
| 232 | +### Phase 3: Ecosystem (Community and Extensibility) |
| 233 | +- Plugin architecture for command packs and language packs |
| 234 | +- Community translation platform |
| 235 | +- Shell integrations (bash/zsh/fish) |
| 236 | +- IDE extensions |
| 237 | +- Voice input/output support |
| 238 | + |
| 239 | +### Phase 4: Future (Advanced Capabilities) |
| 240 | +- Multi-command workflows ("deploy to production" -> sequence of commands) |
| 241 | +- Predictive commands (suggest before you ask) |
| 242 | +- Cross-platform: Windows PowerShell translation, macOS compatibility |
| 243 | +- Web-based terminal interface |
| 244 | +- Mobile support (Termux, iSH) |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +## Key Insight |
| 249 | + |
| 250 | +The most powerful idea in MultilingualCL is that **command-line operations are fundamentally language-agnostic actions on resources**. The project already encodes this in its action verbs, resource types, and composition rules. The modernization is not about starting over -- it is about activating the semantic framework that already exists and augmenting it with LLM intelligence for the languages and natural language understanding that static YAML cannot scale to cover. |
0 commit comments