Skip to content

Commit 116abc0

Browse files
Update documentation
1 parent 882c937 commit 116abc0

1 file changed

Lines changed: 250 additions & 0 deletions

File tree

docs/VISION.md

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# MultilingualCL: Modernization Vision
2+
3+
## Context
4+
5+
MultilingualCL was conceived as a modern multilingual command line where everything -- commands, keywords, variable names, dates, numbers -- can be expressed in any language. First presented at [Capitole du Libre (2017)](https://doi.org/10.6084/m9.figshare.5661853.v1) and [DebConf20 (2020)](https://figshare.com/articles/presentation/Building_a_Multilingual_Command_Line/12857780), the project established strong conceptual foundations.
6+
7+
This document proposes a modernization vision that builds on those foundations and incorporates advances in LLMs, NLP, and modern development practices.
8+
9+
---
10+
11+
## What the Project Already Has
12+
13+
These existing assets are remarkably forward-looking and should be central to any modernization:
14+
15+
| Asset | File(s) | Value |
16+
|-------|---------|-------|
17+
| **Semantic action verb taxonomy** | `resources/en/actions.md` (300+ verbs in 30 categories) | Encodes the *intent* layer (create, delete, search, compress...) |
18+
| **Resource type ontology** | `resources/en/resources.md` (50+ types) | Encodes the *target* layer (file, process, network, CPU, memory...) |
19+
| **Action-resource composition** | `resources/actions.json`, `resources/commandline.json` | Maps verbs to valid resource targets (e.g., "create" -> [user, group, file, directory]) |
20+
| **Full French translations** | `resources/fr/commandes.yaml` | Proof of concept for complete language coverage |
21+
| **Command documentation** | `resources/en/commands.yaml` | Maps natural language descriptions to Linux commands |
22+
| **428 language registry** | `languages.md` | Comprehensive language code catalog |
23+
| **YAML command map schema** | `resources/yaml/command_map.yaml` | Extensible per-locale command definition format |
24+
| **Object model** | `multilingualcl/model.py` | Clean class hierarchy (Command, Argument, SubCommand) |
25+
26+
---
27+
28+
## Current Limitations
29+
30+
### Architecture
31+
- **Static 1:1 mapping**: Every command variant must be manually defined per language in YAML
32+
- **No semantic understanding**: Parser does exact string matching only
33+
- **Unused data**: Action verbs, resources, and French command docs are not connected to the runtime pipeline
34+
- **Flat parsing**: argparse-based, cannot handle natural language or compositional commands
35+
- **Single locale per session**: Detected from system locale, no runtime switching
36+
- **No error intelligence**: Direct `subprocess.run()` with no output parsing or error recovery
37+
38+
### Code Quality
39+
- **Incomplete parsing**: Positional argument handling is unfinished (`parser.py:67-72`)
40+
- **No input sanitization**: Commands passed directly to subprocess
41+
- **Hardcoded paths**: Resource loading uses relative paths
42+
- **No caching**: Command map reloaded on every REPL iteration
43+
44+
---
45+
46+
## Modernization Vision
47+
48+
### Core Principle: Action + Resource + Modifier = Command
49+
50+
The project already has the ingredients for a **semantic command model**:
51+
52+
```
53+
ACTION (verb) + RESOURCE (noun) + MODIFIER (adjective) = LINUX COMMAND
54+
──────────────── ───────────────── ──────────────────── ─────────────
55+
create directory - mkdir
56+
delete file force rm -f
57+
list process all ps aux
58+
show memory - free
59+
search file by name find . -name
60+
compress directory recursive tar -czvf
61+
```
62+
63+
This already exists in the data (`actions.json` maps verbs to resources, `commands.yaml` maps descriptions to commands). The modernization is about **activating these connections** and making them multilingual and intelligent.
64+
65+
---
66+
67+
### Layer 1: LLM-Powered Intent Resolution
68+
69+
Replace static YAML lookup with semantic understanding.
70+
71+
**Current flow:**
72+
```
73+
user types "af -t" → exact match in YAML → "ls -a"
74+
```
75+
76+
**Modern flow:**
77+
```
78+
user types "montre-moi les fichiers cachés"
79+
→ language detection (French)
80+
→ intent extraction: ACTION=list, RESOURCE=file, MODIFIER=hidden
81+
→ command resolution: ls -la
82+
→ safety check → execute
83+
```
84+
85+
**Three-tier LLM strategy:**
86+
87+
| Tier | When | Model | Latency |
88+
|------|------|-------|---------|
89+
| **Local/fast** | Tab completion, known commands | Small model (<1B) or rule-based | <50ms |
90+
| **Balanced** | Intent parsing, translation | Local 7B model (Ollama/llama.cpp) | <500ms |
91+
| **Cloud** | Complex queries, error diagnosis, unknown languages | Claude/GPT API | 1-3s |
92+
93+
**Offline-first**: Core translation must work without internet. The existing YAML maps serve as the offline fallback, with LLM augmentation when available.
94+
95+
---
96+
97+
### Layer 2: Activate the Semantic Data
98+
99+
The existing `actions.md`, `resources.md`, `actions.json`, and `commandline.json` files should become the **knowledge graph** that powers command resolution:
100+
101+
1. **Action verb synonyms** (300+ verbs): Feed to the LLM as context so "erase", "remove", "delete", "wipe" all resolve to the DELETE action
102+
2. **Resource types** (50+ types): Constrain what actions apply to what resources
103+
3. **Action-resource combinations** (`actions.json`): Already maps which resources each action can target
104+
4. **Command catalog** (`commands.yaml`): Already maps natural-language descriptions to actual commands
105+
106+
This transforms the multilingual pipeline from:
107+
```
108+
French YAML → English YAML → Linux Command
109+
```
110+
111+
To:
112+
```
113+
Any Language → Action/Resource/Modifier (language-agnostic) → Linux Command
114+
```
115+
116+
---
117+
118+
### Layer 3: Multilingual Expansion Strategy
119+
120+
**Current**: 2 locales, manual YAML translation per command per language.
121+
122+
**Proposed tiered approach:**
123+
124+
| Tier | Languages | Method | Quality |
125+
|------|-----------|--------|---------|
126+
| **Core** (5-10) | en, fr, es, zh, ar, hi, de, pt, ja, ko | Human-curated translations | Gold standard |
127+
| **Community** (50+) | Languages with active contributors | Community translation + review | Verified |
128+
| **LLM-generated** (428+) | All documented languages | LLM translation with confidence scores | Best-effort, marked as auto-translated |
129+
130+
**Key design decisions:**
131+
- Translate **concepts** (action verbs, resource names), not individual command strings
132+
- Use Unicode CLDR for locale data (dates, numbers, sorting)
133+
- Support bidirectional text (Arabic, Hebrew) in terminal output
134+
- Allow CJK input methods
135+
- Respect cultural formatting (Arabic-Indic numerals, Chinese numerals, etc.)
136+
137+
---
138+
139+
### Layer 4: Modern REPL Experience
140+
141+
**Current**: Basic readline with termcolor.
142+
143+
**Proposed features:**
144+
145+
| Feature | Description |
146+
|---------|-------------|
147+
| **Rich TUI** | Syntax highlighting, tables, progress bars |
148+
| **Intelligent autocomplete** | Context-aware suggestions based on action+resource model |
149+
| **Fuzzy matching** | "commti" -> "Did you mean: commit?" (in user's language) |
150+
| **Output translation** | Translate error messages and help text into user's language |
151+
| **Command preview** | Show the actual Linux command before execution (transparency) |
152+
| **Safety tiers** | Safe commands auto-execute; dangerous ones require confirmation |
153+
| **Session memory** | Remember what user did earlier for contextual suggestions |
154+
| **Multi-language mixing** | Handle code-switching (e.g., "git ajouter fichier.txt") |
155+
156+
---
157+
158+
### Layer 5: Safety and Transparency
159+
160+
**Current**: Direct `subprocess.run()` with no checks.
161+
162+
**Proposed risk-tiered execution:**
163+
164+
| Risk Level | Commands | Behavior |
165+
|------------|----------|----------|
166+
| **Safe** | ls, pwd, cat, git status | Execute, show output |
167+
| **Moderate** | git commit, npm install, mv | Show preview, execute on enter |
168+
| **Dangerous** | rm, chmod 777, git push -f | Explain impact in user's language, require "yes" |
169+
| **Destructive** | rm -rf, mkfs, DROP TABLE | Double confirmation, show undo options |
170+
171+
**Dry-run mode**: Always available to preview what a command would do.
172+
173+
---
174+
175+
### Layer 6: Extensibility Architecture
176+
177+
**Current**: Monolithic YAML files.
178+
179+
**Proposed plugin system:**
180+
181+
- **Command packs**: Installable bundles (e.g., `multilingualcl-docker`, `multilingualcl-kubernetes`)
182+
- **Language packs**: Community-contributed translation sets
183+
- **Output formatters**: JSON, table, tree views
184+
- **Shell integrations**: Bash/Zsh/Fish shell plugins
185+
- **IDE integrations**: VS Code extension, JetBrains plugin
186+
187+
---
188+
189+
### Layer 7: Voice and Accessibility
190+
191+
- **Voice input**: Spoken commands in any language
192+
- **Voice output**: Read results aloud (accessibility)
193+
- **Screen reader support**: WCAG 2.1 AA compliance
194+
- **High contrast mode**: For terminal accessibility
195+
196+
---
197+
198+
## Technical Modernization
199+
200+
| Component | Current | Proposed |
201+
|-----------|---------|----------|
202+
| Python version | 3.6+ | 3.11+ (pattern matching, better typing, tomllib) |
203+
| Terminal UI | readline + termcolor | Rich/Textual/Prompt Toolkit |
204+
| Parsing | argparse (static) | Semantic parser + LLM fallback |
205+
| Config format | YAML only | TOML for config, YAML for command maps, JSON Schema validation |
206+
| i18n framework | Custom locale detection | ICU via PyICU + Unicode CLDR |
207+
| Testing | unittest | pytest + property-based testing (hypothesis) |
208+
| Packaging | setup.py | pyproject.toml |
209+
| CI/CD | GitHub Actions (basic) | Matrix testing (multiple Python versions, OSes) |
210+
| Documentation | Sphinx (not built) | MkDocs Material with multilingual versions |
211+
212+
---
213+
214+
## Phased Roadmap
215+
216+
### Phase 1: Foundation (Activate Existing Data)
217+
- Connect `actions.json`, `resources.md`, and `commands.yaml` to the runtime pipeline
218+
- Build the Action + Resource + Modifier -> Command resolution engine
219+
- Modernize REPL with Rich/Textual
220+
- Add safety confirmation layer
221+
- Expand command coverage to top 50 commands
222+
- Add 3-5 more core languages (es, de, zh, ar, hi)
223+
224+
### Phase 2: Intelligence (Add LLM Layer)
225+
- Integrate local LLM for intent parsing
226+
- Add language auto-detection
227+
- Implement fuzzy matching and smart suggestions
228+
- Add output translation (error messages, help text)
229+
- Build command preview/dry-run mode
230+
- Session context and memory
231+
232+
### Phase 3: Ecosystem (Community and Extensibility)
233+
- Plugin architecture for command packs and language packs
234+
- Community translation platform
235+
- Shell integrations (bash/zsh/fish)
236+
- IDE extensions
237+
- Voice input/output support
238+
239+
### Phase 4: Future (Advanced Capabilities)
240+
- Multi-command workflows ("deploy to production" -> sequence of commands)
241+
- Predictive commands (suggest before you ask)
242+
- Cross-platform: Windows PowerShell translation, macOS compatibility
243+
- Web-based terminal interface
244+
- Mobile support (Termux, iSH)
245+
246+
---
247+
248+
## Key Insight
249+
250+
The most powerful idea in MultilingualCL is that **command-line operations are fundamentally language-agnostic actions on resources**. The project already encodes this in its action verbs, resource types, and composition rules. The modernization is not about starting over -- it is about activating the semantic framework that already exists and augmenting it with LLM intelligence for the languages and natural language understanding that static YAML cannot scale to cover.

0 commit comments

Comments
 (0)