feat!: multilingual text-to-speech by IgorSwat · Pull Request #1134 · software-mansion/react-native-executorch

IgorSwat · 2026-05-08T14:23:32Z

Description

Introduces major changes to the text-to-speech module based on Kokoro model, including:

Multilingual text-to-speech - a set of complete pipelines & voices for different languages. A complete list of (currently) supported languages can be found below.
Improved phonemization & speech quality - utilizing neural phonemization model as a fallback for the old lexicon-base phonemization significantly improves speech quality, particularly for non-standard, out of dictionary words.
Timestamp-based audio cutting - an improve postprocessing algorithm, eliminates artifacts introduced by .pte model, resulting in cleaner, more natural speech.
API changes: prepared for voice-cloning & custom, fine-tuned versions of Kokoro model.

Supported language current status:

🇺🇸 American English: ✅
🇬🇧 British English: ✅
🇫🇷 French: ✅
🇪🇸 Spanish: ✅
🇵🇹/🇧🇷 Portugese: ✅
🇮🇹 Italian: ✅
🇵🇱 Polish: ✅
🇩🇪 German: ✅
🇮🇳 Hindi: ✅
🇯🇵 Japanese: ❌ (coming soon)
🇨🇳 Mandarin Chinese: ❌ (coming soon)

Introduces a breaking change?

Yes
No

There are 2 major breaking changes introduced by this PR:

Changed "synthezation from phonemes" API.

Old API:

 const audioData = await tts.forwardFromPhonemes({
   phonemes:
     'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
 });

New API:

const audioData = await tts.forward({
  text:
    'ɐ mˈæn hˌu dˈʌzᵊnt tɹˈʌst hɪmsˈɛlf, kæn nˈɛvəɹ ɹˈiᵊli tɹˈʌst ˈɛniwˌʌn ˈɛls.',
   phonemize: false,  # Disables phonemization and treats text as phonemes
});

Changed predefined model - voice setups. Now both model files & voice/phonemization files are bundled together, due to languages like Polish or German having fine-tuned model weights.

Old API:
```
const model = useTextToSpeech({
  model: KOKORO_MEDIUM,
  voice: KOKORO_VOICE_AF_HEART,
});
```
New API:
```
const model = useTextToSpeech(KOKORO_AMERICAN_ENGLISH_FEMALE_HEART);
```

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Play around demo speech apps.

Unit tests for RNE-specific code will be added later on.
Phonemis package has it's own, wide range of unit tests implemented (see Phonemis repo)

Screenshots

Related issues

#712

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak

You should also update the code in documentation and documentation in general. Also address lint warnings, there are plenty of them that you need to add to cspell ignore.

msluszniak · 2026-05-08T15:30:49Z

Also if this PR adds breaking change, please describe it directly below Introduces a breaking change? section in PR body.

chmjkb · 2026-05-12T13:45:51Z

+            ? pageY + height + 2
+            : pageY - Math.min(DROPDOWN_MAX_HEIGHT, models.length * 42) - 2;


a bunch of magic numbers, please explain

Come on, this is just a frontend of a demo app. Those values are indeed random numbers which make it look good 😅

chmjkb · 2026-05-12T13:48:00Z

+  { label: '🇮🇹 IM Nicola', value: KOKORO_ITALIAN_MALE_NICOLA },
+  { label: '🇵🇹 PF Dora', value: KOKORO_PORTUGUESE_FEMALE_DORA },
+  { label: '🇵🇹 PM Santa', value: KOKORO_PORTUGUESE_MALE_SANTA },
+  { label: '🇵🇱 PM Mateusz', value: KOKORO_POLISH_MALE_MATEUSZ },


i wonder who the fella is

chmjkb · 2026-05-12T14:21:53Z

+      phonemizer_(phonemis::Config{
+          .lang = lang,
+          .tagger = taggerDataSource.empty()
+                        ? std::optional<phonemis::tagger::Config>{}
+                        : std::make_optional(phonemis::tagger::Config{
+                              .data_filepath = taggerDataSource}),
+          .phonemizer =
+              phonemis::phonemizer::Config{
+                  .lang = lang,
+                  .lexicon_filepath = lexiconSource.empty()
+                                          ? std::nullopt
+                                          : std::make_optional(lexiconSource),
+                  .nn_model_filepath =
+                      neuralModelSource.empty()
+                          ? std::nullopt
+                          : std::make_optional(neuralModelSource)}}),


this is very hard to read

I agree, but there are 2 things which prevent me from making it more readable:

Each time I try to improve it by just re-formatting indentation, the linter reverts it during committing phase

It's hard to simplify the code logic itself, since we always need these "empty path" checks to check what comes from the JS side, and these checks generate most of the complexity of this piece of code.

the linter reverts it during committing phase

This is because we have clang-format call in pre-commit hook with LLVM default format enforced.

chmjkb · 2026-05-12T15:34:50Z

+      taggerIdx >= 0 ? (paths[taggerIdx] as string) : '',
+      lexiconIdx >= 0 ? (paths[lexiconIdx] as string) : '',
+      neuralModelIdx >= 0 ? (paths[neuralModelIdx] as string) : '',


do we need those assertions?

Yes, we do. Tagger, lexicon and neural model are all theoretically optional, so we need some conditional to decide whether we should pass en empty string or an existing value.

…voice

…e type aliases TypeDoc emits `export type` declarations under `06-api-reference/type-aliases/`, not `06-api-reference/interfaces/`. The links in useTextToSpeech.md pointed at the interfaces/ paths, which never get generated for these names, breaking the Docusaurus build (`onBrokenLinks: 'throw'`).

…nsion/react-native-executorch into @is/multilingual-tts

- tests/CMakeLists.txt: build phonemis from source (add_subdirectory) and propagate its include dir to rntests_core. The previous IMPORTED STATIC pointed at a libphonemis.a that nothing builds. - FrameTransformTest, ObjectDetectionTest, InstanceSegmentationTest: update bbox member access for #1130's BBox refactor (.x1/.y1/.x2/.y2 → .p1.x/.p1.y/.p2.x/.p2.y). - PoseEstimationTest: keypoint type became float in #1130; update the static_assert from int32_t to float. - FrameTransformTest: make the three Right_* tests platform-aware. Production inverseRotateBbox/inverseRotatePoints are a no-op on Android for Right (front-cam upright portrait); rotateFrameForModel rotates CW on Android vs CCW on iOS. Tests now have #if defined(__APPLE__) branches matching production. - SpeechToTextTest: GTEST_SKIP TranscribeReturnsValidChars with a TODO — known-failing on this branch, needs separate investigation. - run_tests.sh: fix two stale Hugging Face URLs (fsmn-vad and yolo26n-pose filenames had changed upstream, causing wget to 404 and silently abort the script).

IgorSwat requested review from chmjkb and msluszniak May 8, 2026 14:24

IgorSwat force-pushed the @is/multilingual-tts branch from 8380a2a to eb999a7 Compare May 8, 2026 14:26

IgorSwat self-assigned this May 8, 2026

IgorSwat added feature PRs that implement a new feature improvement PRs or issues focused on improvements in the current codebase labels May 8, 2026

IgorSwat changed the title ~~feat: multilingual text-to-speech~~ feat!: multilingual text-to-speech May 8, 2026

msluszniak requested changes May 8, 2026

View reviewed changes

msluszniak reviewed May 11, 2026

View reviewed changes

chmjkb requested changes May 12, 2026

View reviewed changes

msluszniak linked an issue May 18, 2026 that may be closed by this pull request

Text to Speech - add new languages support #712

Open

5 tasks

barhanc and others added 18 commits May 18, 2026 17:25

build: update ios libs

e6b59cd

Link phonemis with Android build

717c16c

Link phonemis with iOS build

8c74d42

Adjust typescript API to new tts structure

3609385

Fix model picker & adjust to new Phonemis API

b4a27e3

Add spanish

ad3c76a

Add italian

5273507

Add basic polish, portugese and hindi

847d352

Partitioner refactor

b43b75b

Adjust Kokoro API to new Partitioner

7614cca

Adjust native/JS API

eca179e

Native side refactor

249440d

Implement dynamic phonemization

dd297d6

Silence tsconfig warnings

088dfd5

Introduce finetuned Kokoro

b9d736a

Add audio volume up

4b0e928

Improve audio trimming algorithm

c3b4d9f

Change the typescript API to allow custom model weights bundled with …

c356673

…voice

IgorSwat added 4 commits May 18, 2026 17:30

Fix audio api error

7070c4e

Bump audio-api

06e42b2

Add german voice

fe67c65

Apply review suggestions

995a70d

IgorSwat force-pushed the @is/multilingual-tts branch from a1837c6 to 995a70d Compare May 18, 2026 15:38

msluszniak requested review from chmjkb and msluszniak May 18, 2026 15:47

IgorSwat added 2 commits May 18, 2026 20:23

Fix hindi phonemization

c2c211c

Update docs

a3c38d3

IgorSwat force-pushed the @is/multilingual-tts branch from 7ed3558 to a3c38d3 Compare May 19, 2026 08:50

msluszniak and others added 3 commits May 19, 2026 11:31

Update t2s tests

1ad23ea

Merge branch '@is/multilingual-tts' of https://github.com/software-ma…

38340f6

…nsion/react-native-executorch into @is/multilingual-tts

IgorSwat force-pushed the @is/multilingual-tts branch from 10e8e1c to 38340f6 Compare May 19, 2026 11:32

		? pageY + height + 2
		: pageY - Math.min(DROPDOWN_MAX_HEIGHT, models.length * 42) - 2;

Conversation

IgorSwat commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

chmjkb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chmjkb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chmjkb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

IgorSwat May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IgorSwat commented May 8, 2026 •

edited

Loading

IgorSwat May 18, 2026 •

edited

Loading

msluszniak May 18, 2026 •

edited

Loading