feat!: multilingual text-to-speech#1134
Conversation
8380a2a to
eb999a7
Compare
msluszniak
left a comment
There was a problem hiding this comment.
You should also update the code in documentation and documentation in general. Also address lint warnings, there are plenty of them that you need to add to cspell ignore.
|
Also if this PR adds breaking change, please describe it directly below Introduces a breaking change? section in PR body. |
| ? pageY + height + 2 | ||
| : pageY - Math.min(DROPDOWN_MAX_HEIGHT, models.length * 42) - 2; |
There was a problem hiding this comment.
a bunch of magic numbers, please explain
There was a problem hiding this comment.
Come on, this is just a frontend of a demo app. Those values are indeed random numbers which make it look good 😅
| { label: '🇮🇹 IM Nicola', value: KOKORO_ITALIAN_MALE_NICOLA }, | ||
| { label: '🇵🇹 PF Dora', value: KOKORO_PORTUGUESE_FEMALE_DORA }, | ||
| { label: '🇵🇹 PM Santa', value: KOKORO_PORTUGUESE_MALE_SANTA }, | ||
| { label: '🇵🇱 PM Mateusz', value: KOKORO_POLISH_MALE_MATEUSZ }, |
| phonemizer_(phonemis::Config{ | ||
| .lang = lang, | ||
| .tagger = taggerDataSource.empty() | ||
| ? std::optional<phonemis::tagger::Config>{} | ||
| : std::make_optional(phonemis::tagger::Config{ | ||
| .data_filepath = taggerDataSource}), | ||
| .phonemizer = | ||
| phonemis::phonemizer::Config{ | ||
| .lang = lang, | ||
| .lexicon_filepath = lexiconSource.empty() | ||
| ? std::nullopt | ||
| : std::make_optional(lexiconSource), | ||
| .nn_model_filepath = | ||
| neuralModelSource.empty() | ||
| ? std::nullopt | ||
| : std::make_optional(neuralModelSource)}}), |
There was a problem hiding this comment.
I agree, but there are 2 things which prevent me from making it more readable:
- Each time I try to improve it by just re-formatting indentation, the linter reverts it during committing phase
- It's hard to simplify the code logic itself, since we always need these "empty path" checks to check what comes from the JS side, and these checks generate most of the complexity of this piece of code.
There was a problem hiding this comment.
the linter reverts it during committing phase
This is because we have clang-format call in pre-commit hook with LLVM default format enforced.
| taggerIdx >= 0 ? (paths[taggerIdx] as string) : '', | ||
| lexiconIdx >= 0 ? (paths[lexiconIdx] as string) : '', | ||
| neuralModelIdx >= 0 ? (paths[neuralModelIdx] as string) : '', |
There was a problem hiding this comment.
do we need those assertions?
There was a problem hiding this comment.
Yes, we do. Tagger, lexicon and neural model are all theoretically optional, so we need some conditional to decide whether we should pass en empty string or an existing value.
a1837c6 to
995a70d
Compare
7ed3558 to
a3c38d3
Compare
…e type aliases TypeDoc emits `export type` declarations under `06-api-reference/type-aliases/`, not `06-api-reference/interfaces/`. The links in useTextToSpeech.md pointed at the interfaces/ paths, which never get generated for these names, breaking the Docusaurus build (`onBrokenLinks: 'throw'`).
…nsion/react-native-executorch into @is/multilingual-tts
10e8e1c to
38340f6
Compare
- tests/CMakeLists.txt: build phonemis from source (add_subdirectory) and propagate its include dir to rntests_core. The previous IMPORTED STATIC pointed at a libphonemis.a that nothing builds. - FrameTransformTest, ObjectDetectionTest, InstanceSegmentationTest: update bbox member access for #1130's BBox refactor (.x1/.y1/.x2/.y2 → .p1.x/.p1.y/.p2.x/.p2.y). - PoseEstimationTest: keypoint type became float in #1130; update the static_assert from int32_t to float. - FrameTransformTest: make the three Right_* tests platform-aware. Production inverseRotateBbox/inverseRotatePoints are a no-op on Android for Right (front-cam upright portrait); rotateFrameForModel rotates CW on Android vs CCW on iOS. Tests now have #if defined(__APPLE__) branches matching production. - SpeechToTextTest: GTEST_SKIP TranscribeReturnsValidChars with a TODO — known-failing on this branch, needs separate investigation. - run_tests.sh: fix two stale Hugging Face URLs (fsmn-vad and yolo26n-pose filenames had changed upstream, causing wget to 404 and silently abort the script).
Description
Introduces major changes to the text-to-speech module based on Kokoro model, including:
Supported language current status:
Introduces a breaking change?
There are 2 major breaking changes introduced by this PR:
Changed "synthezation from phonemes" API.
Old API:
New API:
Changed predefined model - voice setups. Now both model files & voice/phonemization files are bundled together, due to languages like Polish or German having fine-tuned model weights.
Old API:
New API:
Type of change
Tested on
Testing instructions
Play around demo speech apps.
Unit tests for RNE-specific code will be added later on.
Phonemis package has it's own, wide range of unit tests implemented (see Phonemis repo)
Screenshots
Related issues
#712
Checklist
Additional notes