diff --git a/I18N_DESIGN_PLAN.md b/I18N_DESIGN_PLAN.md deleted file mode 100644 index 7829696..0000000 --- a/I18N_DESIGN_PLAN.md +++ /dev/null @@ -1,408 +0,0 @@ -# Plan: Python i18n Library (gt-i18n, gt-flask, gt-fastapi) - -## Context - -Porting the JS `gt-i18n` and `gt-node` packages to Python. The JS packages provide: -- `gt-i18n`: Core i18n manager, translation resolution, hashing, interpolation, fallbacks -- `gt-node`: Node.js AsyncLocalStorage adapter + `initializeGT()` + `withGT()` for Express/serverless - -The Python equivalents will be: -- **gt-i18n**: Core logic (I18nManager, TranslationsManager, translation functions, `t()` internals) -- **gt-flask**: Thin Flask adapter (middleware hook, locale detection, re-exports `t()`) -- **gt-fastapi**: Thin FastAPI adapter (Starlette middleware, locale detection, re-exports `t()`) - -The core `generaltranslation` Python package already has all foundational utilities: `hash_source`, `index_vars`, `format_message`, `extract_vars`, `condense_vars`, `decode_vars`, `declare_var`, `format_cutoff`, locale utilities, and `IntlMessageFormat`. - -## Decisions - -- **`t()` is the primary function name** (standard i18n convention) -- **`t_fallback()`** — a standalone function that does interpolation only (no locale lookup, no translation resolution). Mirrors JS `gtFallback()`. -- **Separate packages**: gt-flask and gt-fastapi are thin adapters; gt-i18n holds all logic -- **`ContextVarStorageAdapter` is the default** in gt-i18n — Flask/FastAPI adapters don't need to create their own. It uses Python's `contextvars.ContextVar` which works for both threaded (Flask) and async (FastAPI) contexts. -- **Translation loading**: Configurable — eager at startup (default) or lazy per-locale -- **Custom translation loader**: Users can override the remote CDN loader with a custom `load_translations` callback (e.g., read from local JSON files) -- **Locale detection**: Accept-Language header by default + user-provided custom `get_locale` callback -- **Use `uv init` CLI** to scaffold gt-fastapi package (not by hand) - ---- - -## Package 1: gt-i18n - -### File Structure - -``` -packages/gt-i18n/src/gt_i18n/ -├── __init__.py # Public exports -├── _config.py # I18nConfig TypedDict + defaults -├── i18n_manager/ -│ ├── __init__.py # Re-exports -│ ├── _storage_adapter.py # Abstract StorageAdapter base class -│ ├── _context_var_adapter.py # ContextVar-based StorageAdapter -│ ├── _translations_manager.py # Translation cache + loader -│ ├── _remote_loader.py # CDN translation loader (httpx) -│ ├── _i18n_manager.py # I18nManager class -│ └── _singleton.py # Module-level get/set singleton -├── translation_functions/ -│ ├── __init__.py # Re-exports -│ ├── _hash_message.py # hashMessage (index_vars + hash_source) -│ ├── _extract_variables.py # Filter $-prefixed keys from options -│ ├── _interpolate.py # Interpolate translated ICU string -│ ├── _t.py # t() function implementation -│ ├── _fallbacks.py # t_fallback(), m_fallback() -│ ├── _msg.py # msg() — register message with hash -│ └── _decode.py # decode_msg, decode_options (base64) -``` - -### Key Components - -#### StorageAdapter (`_storage_adapter.py`) -Port of `StorageAdapter.ts`. Abstract base class. -```python -class StorageAdapter(ABC): - @abstractmethod - def get_item(self, key: str) -> str | None: ... - @abstractmethod - def set_item(self, key: str, value: str) -> None: ... -``` - -#### ContextVarStorageAdapter (`_context_var_adapter.py`) -**Default adapter** used by I18nManager when no custom adapter is provided. Uses Python's `contextvars.ContextVar` — equivalent of Node's `AsyncLocalStorage`. Works for both threaded (Flask) and async (FastAPI) contexts. Each request automatically gets its own context. Flask and FastAPI adapter packages do NOT need to create their own — they just call `initialize_gt()` which uses this default. -```python -_locale_var: contextvars.ContextVar[str] = contextvars.ContextVar("gt_locale") - -class ContextVarStorageAdapter(StorageAdapter): - def get_item(self, key): return _locale_var.get(None) if key == "locale" else None - def set_item(self, key, value): _locale_var.set(value) if key == "locale" -``` - -#### TranslationsManager (`_translations_manager.py`) -Port of `TranslationsManager.ts`. Caches `dict[str, str]` (hash → translated string) per locale with configurable expiry. -- `async get_translations(locale) -> dict[str, str]` — loads if not cached -- `get_translations_sync(locale) -> dict[str, str]` — returns cached (empty if not loaded) -- `async load_all(locales: list[str])` — eagerly fetch all locales - -#### Remote Loader (`_remote_loader.py`) -Port of `createRemoteTranslationLoader`. Fetches from CDN via httpx. -- URL pattern: `{cache_url}/{project_id}/{locale}` -- Returns `dict[str, str]` - -#### I18nManager (`_i18n_manager.py`) -Port of `I18nManager.ts`. Central orchestrator. -```python -class I18nManager: - def __init__( - self, *, - default_locale: str = "en", - locales: list[str] | None = None, - project_id: str | None = None, - cache_url: str | None = None, - store_adapter: StorageAdapter | None = None, # defaults to ContextVarStorageAdapter - load_translations: TranslationsLoader | None = None, # custom loader overrides remote CDN - cache_expiry_time: int = 60_000, - ... - ): ... - def get_locale(self) -> str # reads from store_adapter - def set_locale(self, locale: str) # writes to store_adapter - def requires_translation(self, locale=None) -> bool - async def get_translations(self, locale=None) -> dict[str, str] - def get_translations_sync(self, locale=None) -> dict[str, str] - async def load_all_translations(self) -> None -``` -- If `store_adapter` is None, creates a `ContextVarStorageAdapter` by default -- If `load_translations` is provided, uses it instead of the remote CDN loader -- Common custom loader: reading from local JSON files - -#### Singleton (`_singleton.py`) -Module-level `_manager: I18nManager | None` with `get_i18n_manager()` / `set_i18n_manager()`. - -#### hash_message (`_hash_message.py`) -```python -def hash_message(message: str, *, context=None, id=None, max_chars=None) -> str: - return hash_source(index_vars(message), context=context, id=id, max_chars=max_chars, data_format="ICU") -``` -Reuses: `generaltranslation._id.hash_source`, `generaltranslation.static.index_vars` - -#### extract_variables (`_extract_variables.py`) -Filters `$`-prefixed GT keys from options dict, returning only user interpolation variables. - -#### interpolate_message (`_interpolate.py`) -Port of `interpolateMessage.ts`. Applies `extract_vars` + `condense_vars` + `format_message` with fallback cascade. -Reuses: `generaltranslation.formatting.format_message`, `generaltranslation.formatting.format_cutoff`, `generaltranslation.static.extract_vars`, `generaltranslation.static.condense_vars` - -#### t() function (`_t.py`) -The core user-facing function. Synchronous. -```python -def t(message: str, **kwargs) -> str: - manager = get_i18n_manager() - locale = manager.get_locale() - if not manager.requires_translation(locale): - return interpolate_message(message, kwargs) # source locale, just interpolate - translations = manager.get_translations_sync(locale) - h = hash_message(message, context=kwargs.get("$context"), id=kwargs.get("$id"), max_chars=kwargs.get("$max_chars")) - translated = translations.get(h) - if translated: - return interpolate_message(translated, {**kwargs, "$_fallback": message}) - return interpolate_message(message, kwargs) # no translation found, use source -``` - -#### t_fallback (`_fallbacks.py`) -Standalone interpolation-only function. No locale lookup, no translation resolution. Mirrors JS `gtFallback()`. Useful for fallback rendering or when the user just wants variable interpolation without the full translation pipeline. -```python -def t_fallback(message: str, **kwargs) -> str: - """Interpolate variables into message without translation lookup. - - Performs: extract_variables -> extract_vars -> condense_vars -> format_message -> format_cutoff - Falls back to source message on any error. - """ - return interpolate_message(message, kwargs) -``` - -#### msg / decode / m_fallback -- `msg()` — registers message, encodes hash+source as base64 suffix -- `decode_msg()` / `decode_options()` — extracts from encoded string -- `m_fallback()` — fallback for encoded messages (decodes then delegates to `t_fallback`) - ---- - -## Package 2: gt-flask (thin adapter) - -### File Structure -``` -packages/gt-flask/src/gt_flask/ -├── __init__.py # Exports: initialize_gt, t -└── _setup.py # initialize_gt(), before_request hook, t re-export -``` - -### Key Components - -#### initialize_gt (`_setup.py`) -```python -def initialize_gt(app: Flask, *, default_locale="en", locales=None, project_id=None, - cache_url=None, get_locale=None, load_translations=None, **kwargs): - # No adapter creation needed — I18nManager defaults to ContextVarStorageAdapter - manager = I18nManager(default_locale=default_locale, locales=locales, - project_id=project_id, load_translations=load_translations, ...) - set_i18n_manager(manager) - # Eager loading (default) — uses asyncio.run() at startup - if kwargs.get("eager_loading", True): - asyncio.run(manager.load_all_translations()) - - @app.before_request - def _set_locale(): - if get_locale: - locale = get_locale(request) # user callback - else: - locale = _detect_from_accept_language(request, manager) - manager.set_locale(locale) -``` - -#### t re-export -```python -from gt_i18n import t # just re-export -``` - -#### Locale detection -Default: parse `Accept-Language` header, resolve against `manager.get_locales()` using `generaltranslation.locales.determine_locale`. User can override with `get_locale` callback. - ---- - -## Package 3: gt-fastapi (thin adapter) - -### Scaffolding -Run `uv init packages/gt-fastapi --package` to create the package, then configure pyproject.toml. - -### File Structure -``` -packages/gt-fastapi/src/gt_fastapi/ -├── __init__.py # Exports: initialize_gt, t -└── _setup.py # initialize_gt(), Starlette middleware, t re-export -``` - -### Key Components - -#### initialize_gt (`_setup.py`) -```python -def initialize_gt(app: FastAPI, *, default_locale="en", locales=None, project_id=None, - cache_url=None, get_locale=None, load_translations=None, **kwargs): - # No adapter creation needed — I18nManager defaults to ContextVarStorageAdapter - manager = I18nManager(default_locale=default_locale, locales=locales, - project_id=project_id, load_translations=load_translations, ...) - set_i18n_manager(manager) - - @app.on_event("startup") # or lifespan - async def _load_translations(): - if kwargs.get("eager_loading", True): - await manager.load_all_translations() - - @app.middleware("http") - async def gt_middleware(request: Request, call_next): - if get_locale: - locale = get_locale(request) - else: - locale = _detect_from_accept_language(request, manager) - manager.set_locale(locale) - return await call_next(request) -``` - -FastAPI advantage: can use `await` directly for eager loading in the startup event. - -#### t re-export -```python -from gt_i18n import t -``` - ---- - -## Reusable Functions (already in generaltranslation core) - -| Function | Location | Used For | -|---|---|---| -| `hash_source()` | `generaltranslation._id._hash` | Content hashing | -| `index_vars()` | `generaltranslation.static.index_vars` | Normalize _gt_ indices before hashing | -| `format_message()` | `generaltranslation.formatting.format_message` | ICU interpolation | -| `format_cutoff()` | `generaltranslation.formatting.format_cutoff` | Max chars truncation | -| `extract_vars()` | `generaltranslation.static.extract_vars` | Extract declared variables from ICU | -| `condense_vars()` | `generaltranslation.static.condense_vars` | Simplify _gt_ selects to refs | -| `decode_vars()` | `generaltranslation.static.decode_vars` | Replace _gt_ selects with values | -| `declare_var()` | `generaltranslation.static.declare_var` | Mark non-translatable content | -| `determine_locale()` | `generaltranslation.locales` | Resolve locale against approved list | -| `is_same_dialect()` | `generaltranslation.locales` | Check if translation needed | - ---- - -## Implementation Order - -1. **gt-i18n: StorageAdapter + ContextVarStorageAdapter** -2. **gt-i18n: TranslationsManager + remote loader** -3. **gt-i18n: I18nManager + singleton** -4. **gt-i18n: hash_message, extract_variables, interpolate_message** -5. **gt-i18n: t() function** -6. **gt-i18n: msg, decode, fallbacks** -7. **gt-i18n: __init__.py exports** -8. **Scaffold gt-fastapi** via `uv init packages/gt-fastapi --package` -9. **gt-flask: _setup.py** (initialize_gt + before_request + re-export t) -10. **gt-fastapi: _setup.py** (initialize_gt + middleware + re-export t) -11. **Tests** - ---- - -## Verification & Testing - -### Test Structure -``` -packages/gt-i18n/tests/ -├── test_hash_message.py # hash_message parity with JS -├── test_extract_variables.py # $-key filtering -├── test_interpolate.py # interpolation pipeline + fallback cascade -├── test_t.py # t() with mock translations -├── test_t_fallback.py # t_fallback() interpolation-only -├── test_msg.py # msg() encoding/decoding roundtrip -├── test_fallbacks.py # m_fallback() with encoded messages -├── test_storage_adapter.py # ContextVarStorageAdapter get/set/threading -├── test_translations_manager.py # Cache hit/miss, expiry, custom loader -├── test_i18n_manager.py # Full manager lifecycle -└── fixtures/ - └── generate_fixtures.mjs # Generate JS reference data for parity tests - -packages/gt-flask/tests/ -├── test_flask_integration.py # Flask app with before_request middleware - -packages/gt-fastapi/tests/ -├── test_fastapi_integration.py # FastAPI app with Starlette middleware -``` - -### Unit Tests (gt-i18n) - -**hash_message parity** (`test_hash_message.py`): -- Generate JS fixtures using `hashMessage()` from gt-i18n JS -- Verify Python `hash_message()` produces identical hashes for same inputs -- Test with: plain messages, messages with `$context`, `$id`, `$max_chars`, messages with `declare_var` variables - -**interpolate_message** (`test_interpolate.py`): -- Simple variable substitution: `"Hello, {name}!"` + `{name: "Alice"}` -> `"Hello, Alice!"` -- With declared variables (`_gt_` selects): extract + condense + format -- Fallback cascade: invalid translation falls back to source message -- `$max_chars` truncation via `format_cutoff` - -**t() function** (`test_t.py`): -- Mock I18nManager with pre-loaded translations -- Source locale (no translation needed) -> returns interpolated source -- Target locale with matching translation -> returns interpolated translation -- Target locale with missing translation -> returns interpolated source (fallback) -- Variable interpolation through the full pipeline - -**t_fallback()** (`test_t_fallback.py`): -- Pure interpolation, no manager/locale involved -- `t_fallback("Hello, {name}!", name="World")` -> `"Hello, World!"` -- With declared variables -- Error handling: bad ICU syntax -> returns source message - -**msg() roundtrip** (`test_msg.py`): -- `msg("Hello, {name}!", name="Alice")` -> encoded string -- `decode_msg(encoded)` -> `"Hello, Alice"` -- `decode_options(encoded)` -> `{"$_source": "Hello, {name}!", "$_hash": "...", "name": "Alice"}` - -**ContextVarStorageAdapter** (`test_storage_adapter.py`): -- Set/get locale within same context -- Isolation: different threads/tasks get independent values -- Default (no value set) returns None - -**TranslationsManager** (`test_translations_manager.py`): -- Custom loader returning known translations -> verify cache hit -- Cache expiry: after expiry time, loader is called again -- `load_all()`: all configured locales loaded eagerly -- `get_translations_sync()`: returns cached data or empty dict -- Error in loader -> returns empty dict, doesn't crash - -### Integration Tests - -**Flask** (`test_flask_integration.py`): -```python -def test_flask_t_with_accept_language(): - app = Flask(__name__) - initialize_gt(app, default_locale="en", locales=["en", "es"], - load_translations=lambda locale: {"": "Hola, mundo!"} if locale == "es" else {}) - - @app.route("/hello") - def hello(): - return {"message": t("Hello, world!")} - - with app.test_client() as client: - # Spanish - resp = client.get("/hello", headers={"Accept-Language": "es"}) - assert resp.json["message"] == "Hola, mundo!" - # English (source locale, no translation needed) - resp = client.get("/hello", headers={"Accept-Language": "en"}) - assert resp.json["message"] == "Hello, world!" -``` - -**FastAPI** (`test_fastapi_integration.py`): -```python -def test_fastapi_t_with_accept_language(): - app = FastAPI() - initialize_gt(app, default_locale="en", locales=["en", "es"], - load_translations=lambda locale: {"": "Hola, mundo!"} if locale == "es" else {}) - - @app.get("/hello") - def hello(): - return {"message": t("Hello, world!")} - - with TestClient(app) as client: - resp = client.get("/hello", headers={"Accept-Language": "es"}) - assert resp.json()["message"] == "Hola, mundo!" -``` - -**Custom locale callback** test: -- Pass `get_locale=lambda req: "fr"` -> always uses French - -**Custom translation loader** test: -- Pass `load_translations` that reads from a JSON file -> verify translations load correctly - -### Running Tests -```bash -uv run pytest packages/gt-i18n/tests -v -uv run pytest packages/gt-flask/tests -v -uv run pytest packages/gt-fastapi/tests -v -# All at once: -uv run pytest packages/gt-i18n packages/gt-flask packages/gt-fastapi -v -``` diff --git a/PORTING_GUIDE.md b/PORTING_GUIDE.md deleted file mode 100644 index 8b49802..0000000 --- a/PORTING_GUIDE.md +++ /dev/null @@ -1,250 +0,0 @@ -# Porting Guide: `generaltranslation` JS → Python - -## Overview - -Port the JS `generaltranslation` core package (`/Users/ernestmccarter/Documents/dev/gt/packages/core/`) to Python at `/Users/ernestmccarter/Documents/dev/gt-python/packages/generaltranslation/`. - -The Python package is already scaffolded with `pyproject.toml`, submodules, and a stub `GT` class. Dependencies: `httpx` (async HTTP), `babel` (locale/formatting), `generaltranslation-icu-messageformat-parser`, `generaltranslation-intl-messageformat`. - -## Progress - -| Module | Status | Tests | Notes | -|--------|--------|-------|-------| -| `locales/` | **Done** | 827 tests | 18 files, all functions ported | -| `formatting/` | **Done** | 160 tests | 9 files including `format_list_to_parts`, `CutoffFormat` | -| `static/` | **Done** | 164 tests | 10 files, all functions match JS output exactly | -| `_gt.py` | **Stub only** | — | Constructor with 4 params, no methods | -| `translate/` | **Not started** | — | Empty `__init__.py` | -| `errors/` | **Not started** | — | Empty `__init__.py` | -| `_id/` | **Not started** | — | Empty `__init__.py` | - -**Total tests passing: 991** - -## Python Package Structure - -``` -src/generaltranslation/ -├── __init__.py # Public API exports (currently only GT) -├── _gt.py # GT class (main driver — stub) -├── py.typed # PEP 561 marker -├── locales/ # ✅ Locale utilities (done) -├── formatting/ # ✅ Number, currency, datetime, list formatting (done) -├── static/ # ✅ GT variable encoding/decoding (done) -├── translate/ # ❌ API communication layer (not started) -├── errors/ # ❌ Error types (not started) -└── _id/ # ❌ Hashing / ID generation (not started) -``` - -## JS → Python Dependency Mapping - -| JS Dependency | Python Equivalent | Notes | -|---|---|---| -| `Intl.NumberFormat` | `babel.numbers` | `format_decimal`, `format_percent`, `format_currency` | -| `Intl.DateTimeFormat` | `babel.dates` | `format_date`, `format_time`, `format_datetime` | -| `Intl.PluralRules` | `babel.plural` | CLDR plural rules | -| `Intl.Locale` | `babel.Locale` | BCP 47 parsing, validation | -| `Intl.DisplayNames` | `babel.Locale.get_display_name()` | Language/region display names | -| `Intl.ListFormat` | `babel.lists.format_list()` | Available in Babel | -| `Intl.RelativeTimeFormat` | `babel.dates.format_timedelta()` | Relative time formatting | -| `@formatjs/icu-messageformat-parser` | `generaltranslation-icu-messageformat-parser` | **Separate workspace package, already complete** | -| `intl-messageformat` | `generaltranslation-intl-messageformat` | **Separate workspace package, already complete** | -| `crypto-js` (SHA256) | stdlib `hashlib.sha256` | Built-in, no dep needed | -| `fast-json-stable-stringify` | `json.dumps(obj, sort_keys=True)` | Built-in | -| `fetch` / HTTP | `httpx` | Already a dependency | - -## What's Left to Port - -### Tier 1: Core functionality - -#### `errors/` module -Port from JS `src/errors.ts` + `src/logging/errors.ts`. - -```python -class GTError(Exception): - """Base error for GT operations.""" - -class ApiError(GTError): - def __init__(self, error: str, code: int, message: str): - self.code = code - self.message = message - super().__init__(f"{error}: {message}") -``` - -#### `_id/` module -Port from JS `src/id.ts`. Hashing utilities for content identification. - -```python -import hashlib, json - -def hash_string(s: str) -> str: - """SHA256, first 16 hex chars.""" - return hashlib.sha256(s.encode()).hexdigest()[:16] - -def hash_source(source: dict, hash_function: Callable | None = None) -> str: - """Hash source content with metadata. Uses json.dumps(sort_keys=True) for stable serialization.""" - -def hash_template(template: dict[str, str], hash_function: Callable | None = None) -> str: - """Hash sorted JSON of template object.""" -``` - -#### `translate/` module -Port from JS `src/translate/`. API communication using httpx. - -```python -API_VERSION = "2026-02-18.v1" - -def generate_request_headers(config: TranslationRequestConfig) -> dict[str, str]: - """Headers: Content-Type, x-gt-project-id, x-gt-api-key, gt-api-version""" - -async def api_request(config: TranslationRequestConfig, endpoint: str, body: dict | None = None, timeout: int = 60000, method: str = "POST", retry_policy: str = "exponential") -> dict: - """HTTP request with retry logic. Exponential backoff: 500ms * 2^attempt. Max 3 retries on 5XX.""" - -async def translate_many(requests: list | dict, global_metadata: dict, config: TranslationRequestConfig, timeout: int | None = None) -> list | dict: - """POST /v2/translate. Batch translation.""" - -async def upload_source_files(files: list, options: dict, config: TranslationRequestConfig) -> dict: - """POST /v2/project/files/upload-files. Batch size: 100 files.""" - -async def enqueue_files(files: list, options: dict, config: TranslationRequestConfig) -> dict: - """POST /v2/project/translations/enqueue.""" - -async def download_file_batch(batch: dict, config: TranslationRequestConfig) -> dict: -async def upload_translations(translations: dict, config: TranslationRequestConfig) -> dict: -async def setup_project(project_id: str, config: TranslationRequestConfig, options: dict) -> dict: -async def check_job_status(job_ids: list[str], config: TranslationRequestConfig, timeout_ms: int | None = None) -> dict: -``` - -#### GT class methods -Port from JS `src/index.ts`. The GT class should wrap all the standalone functions, binding its own config: - -**Constructor** — needs additional params: -```python -class GT: - def __init__( - self, - *, - api_key: str = "", - dev_api_key: str = "", - project_id: str = "", - base_url: str = "https://api.gtx.dev", - source_locale: str = "en", - target_locale: str = "", - locales: list[str] | None = None, - custom_mapping: dict[str, str] | None = None, - ): ... -``` - -**Methods to implement:** -- Locale: `get_locale_name()`, `get_locale_emoji()`, `get_locale_properties()`, `get_locale_direction()`, `is_valid_locale()`, `determine_locale()`, `requires_translation()`, `is_same_language()`, `is_same_dialect()`, `is_superset_locale()`, `standardize_locale()`, `get_plural_form()`, `resolve_canonical_locale()`, `resolve_alias_locale()`, `get_region_properties()` -- Formatting: `format_num()`, `format_currency()`, `format_list()`, `format_list_to_parts()`, `format_date_time()`, `format_relative_time()`, `format_message()`, `format_cutoff()` -- Translation API: `translate_many()`, `setup_project()`, `enqueue_files()`, `download_file_batch()`, `upload_source_files()`, `upload_translations()`, `check_job_status()`, `query_branch_data()`, `create_branch()`, `get_project_data()` - -### Tier 2: Nice to have - -- `logging/` — Warning/error message helpers (could use stdlib `logging` directly) -- `settings/` — Constants like `LIBRARY_DEFAULT_LOCALE`, API URLs (can be inlined) - -## Key Type Definitions - -```python -from typing import TypedDict, Literal - -DataFormat = Literal["JSX", "ICU", "I18NEXT", "STRING"] -LogLevel = Literal["debug", "info", "warn", "error", "off"] -PluralType = Literal["singular", "plural", "dual", "zero", "one", "two", "few", "many", "other"] - -class TranslationRequestConfig(TypedDict): - project_id: str - base_url: str - api_key: str - -class EntryMetadata(TypedDict, total=False): - id: str - hash: str - context: str - max_chars: int - data_format: DataFormat -``` - -## Settings / Constants - -```python -LIBRARY_DEFAULT_LOCALE = "en" -DEFAULT_TIMEOUT = 60_000 # ms -DEFAULT_BASE_URL = "https://api.gtx.dev" -API_VERSION = "2026-02-18.v1" -PLURAL_FORMS = ["singular", "plural", "dual", "zero", "one", "two", "few", "many", "other"] -``` - -## Key Patterns to Follow - -1. **Error handling**: Try/except with fallbacks. Log warnings instead of raising in non-critical locale functions. -2. **Retry logic**: Exponential backoff on 5XX (500ms * 2^attempt), max 3 retries. No retry on 4XX. -3. **Hashing**: SHA256, first 16 hex chars. Use `json.dumps(sort_keys=True)` for stable serialization. -4. **Async**: Translation/API functions should be `async` using `httpx.AsyncClient`. -5. **Naming**: Use `snake_case` for Python (JS uses `camelCase`). e.g. `getLocaleName` → `get_locale_name`. -6. **Logging**: Use Python's stdlib `logging` module. Create child loggers per module. - -## Testing Strategy - -Generate test fixtures by executing the **JS source functions** and writing results to JSON. Python tests consume these fixtures via `pytest.mark.parametrize`. - -### Fixture Generation - -Write a Node.js script for each module (e.g., `tests/formatting/fixtures/generate_fixtures.mjs`) that: -1. Imports the JS functions using `npx tsx` with dynamic `await import()` from the TS source files -2. Calls each function with a matrix of inputs (various locales, options, edge cases) -3. Writes the results to a JSON fixture file (e.g., `formatting_fixtures.json`) - -This ensures the Python implementation produces **identical output** to the JS implementation. - -**Important**: Use `await import(path)` for dynamic imports from the JS core TS source, NOT static `import ... from ...` which can fail with ESM/CJS issues. Run with `npx tsx`. - -### Test Directory Layout - -``` -tests/ -├── locales/ -│ ├── fixtures/ -│ │ └── locale_fixtures.json -│ └── test_*.py # 14 test files -├── formatting/ -│ ├── fixtures/ -│ │ ├── generate_fixtures.mjs -│ │ └── formatting_fixtures.json -│ └── test_*.py # 8 test files -└── static/ - ├── fixtures/ - │ ├── generate_fixtures.mjs - │ └── static_fixtures.json - ├── test_*.py # 6 test files - └── test_known_discrepancies.py # Edge case regression tests -``` - -## What NOT to Port - -- JSX-specific types and processing (JsxElement, JsxChildren, etc.) — not relevant for Python -- Browser-specific code (btoa/atob — use stdlib `base64` instead) -- React/Next.js integration code -- SWC plugin -- `cache/` module — Python doesn't need Intl constructor caching -- `backwards-compatability/` module — legacy format conversion not needed for new Python package -- `utils/minify.ts` — code minification not relevant - -## Lessons Learned - -1. **Use Babel, not langcodes**: The actual implementation uses `babel` for all locale/formatting work. `langcodes` was the original plan but `babel` has better CLDR data coverage. - -2. **ICU parser is a separate package**: The `generaltranslation-icu-messageformat-parser` and `generaltranslation-intl-messageformat` packages are workspace siblings, NOT inlined in the main package. They provide `Parser`, `print_ast`, and `IntlMessageFormat`. - -3. **Parser/printer parity matters**: The Python ICU parser and printer must match `@formatjs/icu-messageformat-parser` behavior exactly, including: - - `'<` and `'>` always trigger escape sequences (not just when `allow_tags=True`) - - `print_ast` must re-escape `{}` via `printEscapedMessage` regex, `#` in plural context, and `'` at literal boundaries - - Select/plural nodes use compact comma formatting (no spaces): `{name,select,...}` - - Simple format nodes use spaced commas: `{name, type, style}` - -4. **Boolean stringification**: Python `str(True)` → `"True"` but JS `String(true)` → `"true"`. `declare_var` handles this with `.lower()` for booleans. - -5. **JS `undefined` in fixtures**: JS `undefined` is omitted from JSON. Test harnesses must use `.get("variable")` (defaulting to `None`) not `["variable"]`. - -6. **`_find_other_span` in `index_vars`**: The Python parser doesn't store indices on individual option values (only on top-level nodes). `index_vars` uses manual brace-counting to find the `other` option's `{content}` span within a select node. This works correctly for all valid ICU input. diff --git a/packages/generaltranslation/TESTING_PLAN.md b/packages/generaltranslation/TESTING_PLAN.md deleted file mode 100644 index 86893c7..0000000 --- a/packages/generaltranslation/TESTING_PLAN.md +++ /dev/null @@ -1,525 +0,0 @@ -# Testing Plan: Remaining `generaltranslation` Modules - -## Conventions - -All tests follow the established patterns from the existing `locales/`, `formatting/`, and `static/` test suites: - -- **Framework**: pytest with `@pytest.mark.parametrize` -- **Fixture source**: JSON files generated by running the actual JS functions via `npx tsx` -- **Fixture location**: `tests//fixtures/_fixtures.json` -- **Test IDs**: Derived from fixture `label` fields for readable output -- **Assertion style**: `assert result == case["expected"]` -- **One test file per function or logical group** - ---- - -## 1. `_id/` Module Tests - -### Fixture Generator - -**File**: `tests/_id/fixtures/generate_fixtures.mjs` - -Imports from JS: -```js -const { hashString, hashSource, hashTemplate } = await import(GT_CORE_ID_PATH); -``` - -#### `hash_string` test cases (~15 cases) - -| Category | Inputs | -|----------|--------| -| Empty | `""` | -| Simple | `"hello"`, `"Hello World"` | -| Unicode | `"こんにちは"`, `"café"`, `"🚀🌟"` | -| Special chars | `"hello\nworld"`, `"tab\there"`, `"quotes'and\"doubles"` | -| Long string | 1000-char repeated pattern | -| Whitespace | `" "`, `" leading"`, `"trailing "` | -| JSON-like | `'{"key": "value"}'` | -| ICU pattern | `"{count, plural, one {# item} other {# items}}"` | - -Each case: `{ "label": "...", "input": "...", "expected": "<16-char hex>" }` - -#### `hash_template` test cases (~10 cases) - -| Category | Inputs | -|----------|--------| -| Empty dict | `{}` | -| Single key | `{"greeting": "Hello"}` | -| Multiple keys | `{"a": "1", "b": "2", "c": "3"}` | -| Key order matters | `{"z": "last", "a": "first"}` — verify sort order matches `fast-json-stable-stringify` | -| Unicode values | `{"name": "こんにちは"}` | -| Empty values | `{"key": ""}` | -| Special chars in values | `{"msg": "it's a {test}"}` | -| Many keys | 20-key dict | -| Nested-looking values | `{"data": "{\"nested\": true}"}` | - -Each case: `{ "label": "...", "input": {...}, "expected": "<16-char hex>" }` - -#### `hash_source` test cases (~15 cases) - -Only test with `dataFormat: "ICU"` and `dataFormat: "STRING"` (JSX is not ported). - -| Category | Inputs | -|----------|--------| -| Simple ICU | `{ source: "Hello", dataFormat: "ICU" }` | -| With context | `{ source: "Hello", dataFormat: "ICU", context: "greeting" }` | -| With id | `{ source: "Hello", dataFormat: "ICU", id: "msg_1" }` | -| With maxChars | `{ source: "Hello", dataFormat: "ICU", maxChars: 100 }` | -| All metadata | `{ source: "Hello", dataFormat: "ICU", context: "ctx", id: "id", maxChars: 50 }` | -| Empty source | `{ source: "", dataFormat: "ICU" }` | -| Unicode source | `{ source: "こんにちは", dataFormat: "ICU" }` | -| ICU pattern source | `{ source: "{count, plural, one {#} other {#s}}", dataFormat: "ICU" }` | -| STRING format | `{ source: "plain text", dataFormat: "STRING" }` | -| Negative maxChars | `{ source: "Hello", dataFormat: "ICU", maxChars: -50 }` — JS uses `Math.abs()` | -| Long source | 1000-char string | - -Each case: `{ "label": "...", "input": {...}, "expected": "<16-char hex>" }` - -### Test File - -**File**: `tests/_id/test_hash.py` - -```python -import json -from pathlib import Path -import pytest -from generaltranslation._id import hash_string, hash_source, hash_template - -FIXTURES = json.loads((Path(__file__).parent / "fixtures" / "id_fixtures.json").read_text()) - -@pytest.mark.parametrize("case", FIXTURES["hash_string"], ids=[c["label"] for c in FIXTURES["hash_string"]]) -def test_hash_string(case): - assert hash_string(case["input"]) == case["expected"] - -@pytest.mark.parametrize("case", FIXTURES["hash_template"], ids=[c["label"] for c in FIXTURES["hash_template"]]) -def test_hash_template(case): - assert hash_template(case["input"]) == case["expected"] - -@pytest.mark.parametrize("case", FIXTURES["hash_source"], ids=[c["label"] for c in FIXTURES["hash_source"]]) -def test_hash_source(case): - inp = case["input"] - result = hash_source( - inp["source"], - data_format=inp["dataFormat"], - context=inp.get("context"), - id=inp.get("id"), - max_chars=inp.get("maxChars"), - ) - assert result == case["expected"] -``` - -**Expected test count**: ~40 tests - ---- - -## 2. `translate/` Module Tests - -No fixture generation from JS — these are unit tests with mocked HTTP. - -### `tests/translate/test_headers.py` (~5 tests) - -Tests `generate_request_headers()`: - -```python -def test_headers_with_full_config(): - config = {"project_id": "proj_123", "api_key": "key_abc", "base_url": "https://api2.gtx.dev"} - headers = generate_request_headers(config) - assert headers["Content-Type"] == "application/json" - assert headers["x-gt-project-id"] == "proj_123" - assert headers["x-gt-api-key"] == "key_abc" - assert headers["gt-api-version"] == "2026-02-18.v1" - -def test_headers_missing_api_key(): - config = {"project_id": "proj_123"} - headers = generate_request_headers(config) - assert headers["x-gt-api-key"] == "" - -def test_headers_api_version_constant(): - # Verify version string matches JS API_VERSION - from generaltranslation._settings import API_VERSION - config = {"project_id": "x"} - assert generate_request_headers(config)["gt-api-version"] == API_VERSION -``` - -### `tests/translate/test_batch.py` (~8 tests) - -Tests `create_batches()` and `process_batches()`: - -```python -def test_empty_list(): - assert create_batches([], 100) == [] - -def test_under_batch_size(): - assert create_batches([1, 2, 3], 100) == [[1, 2, 3]] - -def test_exact_batch_size(): - items = list(range(100)) - batches = create_batches(items, 100) - assert len(batches) == 1 - -def test_over_batch_size(): - items = list(range(101)) - batches = create_batches(items, 100) - assert len(batches) == 2 - assert len(batches[0]) == 100 - assert len(batches[1]) == 1 - -def test_multiple_full_batches(): - items = list(range(250)) - batches = create_batches(items, 100) - assert len(batches) == 3 - -def test_custom_batch_size(): - items = list(range(10)) - batches = create_batches(items, 3) - assert len(batches) == 4 # 3+3+3+1 - -@pytest.mark.asyncio -async def test_process_batches_calls_fn_per_batch(): - calls = [] - async def mock_fn(batch): - calls.append(batch) - return [x * 2 for x in batch] - result = await process_batches(list(range(5)), mock_fn, batch_size=2) - assert len(calls) == 3 # batches: [0,1], [2,3], [4] - -@pytest.mark.asyncio -async def test_process_batches_empty(): - async def mock_fn(batch): - return batch - result = await process_batches([], mock_fn, batch_size=100) - # verify empty result handling -``` - -### `tests/translate/test_request.py` (~10 tests) - -Tests `api_request()` with mocked `httpx.AsyncClient`. Uses `pytest-asyncio` and `unittest.mock.patch` or `respx` (httpx mock library). - -```python -@pytest.mark.asyncio -async def test_successful_post(): - # Mock httpx to return 200 with JSON body - # Assert api_request returns parsed JSON - -@pytest.mark.asyncio -async def test_successful_get(): - # method="GET" - -@pytest.mark.asyncio -async def test_4xx_raises_api_error(): - # Mock 400 response - # Assert raises ApiError with correct code/message - # Assert NO retry (only 1 request made) - -@pytest.mark.asyncio -async def test_401_raises_api_error(): - # Auth failure - -@pytest.mark.asyncio -async def test_5xx_retries_3_times(): - # Mock 500 response for all attempts - # Assert 4 total requests (1 initial + 3 retries) - # Assert raises after exhausting retries - -@pytest.mark.asyncio -async def test_5xx_succeeds_on_retry(): - # Mock: 500, 500, 200 - # Assert returns successful response - # Assert 3 total requests - -@pytest.mark.asyncio -async def test_timeout_raises_error(): - # Mock httpx.TimeoutException - # Assert appropriate error - -@pytest.mark.asyncio -async def test_retry_backoff_timing(): - # Verify delays are ~500ms, ~1000ms, ~2000ms - # (or at least verify sleep/delay is called with correct values) - -@pytest.mark.asyncio -async def test_custom_timeout(): - # Pass timeout=5000, verify httpx is called with that timeout - -@pytest.mark.asyncio -async def test_request_sends_correct_headers(): - # Verify headers from generate_request_headers are used -``` - -### `tests/translate/test_endpoints.py` (~20 tests) - -Each endpoint function gets 1-2 tests. Mock `api_request` at the module level. - -For each function, verify: -1. **Correct API path** is called -2. **Request body** is assembled correctly from arguments -3. **Response** is returned/transformed correctly - -```python -# Example for check_job_status: -@pytest.mark.asyncio -async def test_check_job_status_path_and_body(mock_api_request): - await check_job_status(["job1", "job2"], config) - mock_api_request.assert_called_once_with( - config, "/v2/project/jobs/info", - body={"jobIds": ["job1", "job2"]}, - timeout=None, - ) - -# Example for download_file_batch (tests base64 decoding): -@pytest.mark.asyncio -async def test_download_file_batch_decodes_base64(mock_api_request): - mock_api_request.return_value = { - "files": [{"data": base64.b64encode(b"hello").decode(), "fileId": "f1"}], - "count": 1, - } - result = await download_file_batch([{"fileId": "f1"}], {}, config) - assert result[0]["data"] == "hello" # decoded from base64 - -# Example for enqueue_files (tests batching): -@pytest.mark.asyncio -async def test_enqueue_files_batches_large_input(mock_api_request): - files = [{"fileId": f"f{i}", "versionId": "v1", "branchId": "b1", "fileName": f"f{i}.json", "fileFormat": "JSON"} for i in range(150)] - # Should make 2 API calls (batch size 100) - await enqueue_files(files, options, config) - assert mock_api_request.call_count == 2 -``` - -Functions to test (1-2 tests each): -- `check_job_status` — path, body -- `create_branch` — path, body -- `download_file_batch` — base64 decode, batching -- `download_file` — path, params -- `enqueue_files` — body assembly, batching, response flattening -- `get_orphaned_files` — batching, intersection logic -- `get_project_data` — GET method, URL construction -- `process_file_moves` — batching, response aggregation -- `query_branch_data` — path, body -- `query_file_data` — path, body -- `query_source_file` — path, body -- `setup_project` — body assembly, batching -- `submit_user_edit_diffs` — body -- `translate` / `translate_many` — body construction with metadata -- `upload_source_files` — batching -- `upload_translations` — body assembly - -**Expected test count**: ~25 tests - ---- - -## 3. `_gt.py` GT Class Tests - -**File**: `tests/test_gt.py` - -### Constructor tests (~8 tests) - -```python -def test_default_values(): - gt = GT() - assert gt.base_url == "https://api2.gtx.dev" - assert gt.source_locale == "en" - assert gt.api_key == "" - -def test_explicit_params(): - gt = GT(api_key="key", project_id="proj", source_locale="fr") - assert gt.api_key == "key" - assert gt.project_id == "proj" - assert gt.source_locale == "fr" - -def test_env_var_fallback(monkeypatch): - monkeypatch.setenv("GT_API_KEY", "env_key") - monkeypatch.setenv("GT_PROJECT_ID", "env_proj") - gt = GT() - assert gt.api_key == "env_key" - assert gt.project_id == "env_proj" - -def test_explicit_overrides_env(monkeypatch): - monkeypatch.setenv("GT_API_KEY", "env_key") - gt = GT(api_key="explicit_key") - assert gt.api_key == "explicit_key" - -def test_set_config(): - gt = GT() - gt.set_config(api_key="new_key", source_locale="de") - assert gt.api_key == "new_key" - assert gt.source_locale == "de" - -def test_custom_mapping(): - gt = GT(custom_mapping={"xx": "Custom Language"}) - assert gt.custom_mapping == {"xx": "Custom Language"} - -def test_rendering_locales(): - gt = GT(locales=["en", "fr", "de"], source_locale="en") - # _rendering_locales should include source + target locales - -def test_default_base_url_matches_js(): - from generaltranslation._settings import DEFAULT_BASE_URL - gt = GT() - assert gt.base_url == DEFAULT_BASE_URL == "https://api2.gtx.dev" -``` - -### Private helper tests (~4 tests) - -```python -def test_get_translation_config(): - gt = GT(api_key="k", project_id="p", base_url="https://example.com") - config = gt._get_translation_config() - assert config == {"project_id": "p", "base_url": "https://example.com", "api_key": "k"} - -def test_validate_auth_no_key(): - gt = GT(project_id="p") - with pytest.raises(Exception): # or ApiError - gt._validate_auth("test_fn") - -def test_validate_auth_no_project(): - gt = GT(api_key="k") - with pytest.raises(Exception): - gt._validate_auth("test_fn") - -def test_validate_auth_success(): - gt = GT(api_key="k", project_id="p") - gt._validate_auth("test_fn") # should not raise -``` - -### Locale delegation tests (~6 tests, no mocking) - -These call real functions through the GT class to verify correct delegation. - -```python -def test_get_locale_name(): - gt = GT(source_locale="en") - assert gt.get_locale_name("de") == "German" - -def test_is_valid_locale(): - gt = GT() - assert gt.is_valid_locale("en") is True - assert gt.is_valid_locale("xxxxx") is False - -def test_get_locale_direction(): - gt = GT() - assert gt.get_locale_direction("ar") == "rtl" - assert gt.get_locale_direction("en") == "ltr" - -def test_is_same_language(): - gt = GT() - assert gt.is_same_language("en", "en-US") is True - -def test_standardize_locale(): - gt = GT() - assert gt.standardize_locale("en-us") == "en-US" - -def test_format_num(): - gt = GT(source_locale="en") - result = gt.format_num(1234.5) - assert "1,234.5" in result or "1234.5" in result # locale-dependent -``` - -### API delegation tests (~5 tests, mock api_request) - -```python -@pytest.mark.asyncio -async def test_translate_many_delegates(mock_api_request): - gt = GT(api_key="k", project_id="p") - await gt.translate_many([...], options={...}) - # Assert api_request was called with correct config and endpoint - -@pytest.mark.asyncio -async def test_setup_project_delegates(mock_api_request): - gt = GT(api_key="k", project_id="p") - await gt.setup_project([...]) - # Assert correct endpoint called - -@pytest.mark.asyncio -async def test_api_method_without_auth_raises(): - gt = GT() # no api_key or project_id - with pytest.raises(Exception): - await gt.translate_many([...], options={...}) -``` - -**Expected test count**: ~23 tests - ---- - -## Summary - -| Module | Test File(s) | Strategy | Expected Tests | -|--------|-------------|----------|---------------| -| `_id/` | `test_hash.py` | JS fixture comparison | ~40 | -| `translate/` headers | `test_headers.py` | Unit assertions | ~5 | -| `translate/` batch | `test_batch.py` | Unit + async | ~8 | -| `translate/` request | `test_request.py` | Mocked httpx | ~10 | -| `translate/` endpoints | `test_endpoints.py` | Mocked api_request | ~25 | -| GT class | `test_gt.py` | Constructor + delegation | ~23 | -| **Total new** | | | **~111** | -| **Existing** | locales + formatting + static | | **991** | -| **Grand total** | | | **~1102** | - -## Expanding Test Coverage via JS Output Inspection - -For any module, we can systematically generate additional tests by running the JS implementation against new inputs and recording the exact outputs. This is the same strategy that produced the existing 991 tests, and should be used whenever: - -1. **A new function is ported** — always generate fixtures from JS first -2. **An edge case is discovered** — add the input to the fixture generator, re-run, commit the new fixture JSON -3. **A bug is found** — reproduce with JS to get the "correct" output, then add as a test case - -### How to add more tests - -**Step 1**: Add new inputs to the fixture generator (`.mjs` file): -```js -// In generate_fixtures.mjs for the relevant module -add("descriptive label", newInput); -// The generator automatically calls the JS function and records the output -``` - -**Step 2**: Regenerate fixtures: -```bash -export PATH="/Users/ernestmccarter/.nvm/versions/node/v24.13.0/bin:$PATH" -npx tsx -``` - -**Step 3**: The Python tests automatically pick up new cases (via `@pytest.mark.parametrize` from the JSON fixture file). No test code changes needed. - -**Step 4**: Run tests to verify Python matches: -```bash -uv run pytest packages/generaltranslation/tests// -v -``` - -### Where to apply this for remaining modules - -**`_id/` module**: The fixture generator (`tests/_id/fixtures/generate_fixtures.mjs`) should: -- Import JS `hashString`, `hashSource`, `hashTemplate` -- Run each against the full input matrix defined above -- Record exact hex output -- After initial implementation, expand by adding new inputs (edge cases, unicode, large strings) and re-running to check parity - -**`translate/` module**: API functions are hard to fixture-test against JS (they hit real endpoints). However, the **body construction logic** can be verified by: -1. Writing a JS script that calls each translate function with mock `fetch` (intercepting the outgoing request body) -2. Recording the exact request bodies produced by JS for various inputs -3. Asserting Python produces identical request bodies - -This is optional but would catch subtle serialization differences (key ordering, null vs undefined handling, etc.). - -**`_gt.py` GT class**: The GT class methods are thin wrappers. Testing them via JS fixture comparison is less valuable than testing the underlying standalone functions (which are already fixture-tested). Focus on delegation correctness instead. - -### Continuous parity verification - -When the JS implementation changes, the fixture generators can be re-run to produce updated expected outputs. Any Python test that then fails indicates a divergence that needs investigation. - -```bash -# Full re-generation workflow: -export PATH="/Users/ernestmccarter/.nvm/versions/node/v24.13.0/bin:$PATH" -npx tsx packages/generaltranslation/tests/formatting/fixtures/generate_fixtures.mjs -npx tsx packages/generaltranslation/tests/static/fixtures/generate_fixtures.mjs -npx tsx packages/generaltranslation/tests/_id/fixtures/generate_fixtures.mjs -uv run pytest packages/ -q -``` - ---- - -## Regression Check - -After all phases: -```bash -uv run pytest packages/ -q -# Expected: ~1100 tests, all passing, 0 xfail -```