本項目為 BYVoid/OpenCC 的 fork,目的有二:
-
新增一個與原版 OpenCC 兼容、算法完全一致的 WASM 實現。 提供可直接在瀏覽器和 Node.js 環境運行的 WebAssembly 版本,方便前端開發者使用。
-
在增加測試案例的情況下,於現有框架內擴充詞表,修正不準確的轉換。 持續優化詞庫質量,提升轉換準確度。
注意到已有的 opencc-js 和 wasm-opencc 項目已於多年前停止維護,詞表更新不夠及時。在 LLM 時代,缺乏上下文感知的轉換已經落後於時代,不過 OpenCC 的算法和詞表並未過時。我們會在保持算法穩定性的前提下,透過增加測試案例來擴充和改進詞表,以提供更準確的轉換結果。
以下為原版 OpenCC 的 README 內容,保留作為參考。
Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant conversion and regional idioms among Mainland China, Taiwan and Hong Kong. This is not translation tool between Mandarin and Cantonese, etc.
中文簡繁轉換開源項目,支持詞彙級別的轉換、異體字轉換和地區習慣用詞轉換(中國大陸、臺灣、香港、日本新字體)。不提供普通話與粵語的轉換。
Discussion (Telegram): https://t.me/open_chinese_convert
- 嚴格區分「一簡對多繁」和「一簡對多異」。
- 完全兼容異體字,可以實現動態替換。
- 嚴格審校一簡對多繁詞條,原則爲「能分則不合」。
- 支持中國大陸、臺灣、香港異體字和地區習慣用詞轉換,如「裏」「裡」、「鼠標」「滑鼠」。
- 詞庫和函數庫完全分離,可以自由修改、導入、擴展。
- Debian
- Ubuntu
- Fedora
- Arch Linux
- macOS (Homebrew)
- WinGet (使用
winget install BYVoid.OpenCC命令) - Bazel
- Node.js
- Python
- More (Repology)
- Windows (x86_64): OpenCC-1.3.0 (SHA-256) This is a Windows release intended for WinGet distribution. For details, see doc/windows-winget-release.md.
- Debian/Ubuntu (amd64):
https://opencc.js.org/converter?config=s2t
npm install opencc
import { OpenCC } from 'opencc';
async function main() {
const converter: OpenCC = new OpenCC('s2t.json');
const result: string = await converter.convertPromise('汉字');
console.log(result); // 漢字
}See demo.js and ts-demo.ts.
pip install opencc (Windows, Linux, macOS)
import opencc
converter = opencc.OpenCC('s2t.json')
converter.convert('汉字') # 漢字#include "opencc.h"
int main() {
const opencc::SimpleConverter converter("s2t.json");
converter.Convert("汉字"); // 漢字
return 0;
}#include "opencc.h"
int main() {
opencc_t opencc = opencc_open("s2t.json");
const char* input = "汉字";
char* converted = opencc_convert_utf8(opencc, input, strlen(input)); // 漢字
opencc_convert_utf8_free(converted);
opencc_close(opencc);
return 0;
}opencc --helpopencc_dict --help
OpenCC CLI supports two diagnostic modes that output JSON instead of converted text:
--segmentation — Output segmentation result only (no conversion):
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --segmentation
# {"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"]}--inspect — Output full inspection result (segmentation + per-stage conversion + final output):
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect
# {"input":"他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题","segments":["他","只看","了几行","日志",",就","一叶知秋",",猜到","整个","系统","是","数据库","连接池","出了","问题"],"stages":[{"index":1,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","數據庫","連接池","出了","問題"]},{"index":2,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]},{"index":3,"segments":["他","只看","了幾行","日誌",",就","一葉知秋",",猜到","整個","系統","是","資料庫","連線池","出了","問題"]}],"output":"他只看了幾行日誌,就一葉知秋,猜到整個系統是資料庫連線池出了問題"}
# Pretty-print with jq:
echo "他只看了几行日志,就一叶知秋,猜到整个系统是数据库连接池出了问题" | opencc -c s2twp.json --inspect | jq .These modes are useful for diagnosing conversion issues:
- Use
--segmentationto verify that the input is segmented as expected. - Use
--inspectto see which conversion stage produces an unexpected result.
Rules:
--segmentationand--inspectare mutually exclusive.
- Swift (iOS): SwiftyOpenCC
- iOSOpenCC (pod): iOSOpenCC
- Java: opencc4j
- Android: android-opencc
- PHP: opencc4php
- Pure JavaScript: opencc-js
- WebAssembly:
- Browser Extension: opencc-extension
- Go (Pure): OpenCC for Go
- Dart (native-assets): opencc-dart
s2t.jsonSimplified Chinese to Traditional Chinese 簡體到繁體t2s.jsonTraditional Chinese to Simplified Chinese 繁體到簡體s2tw.jsonSimplified Chinese to Traditional Chinese (Taiwan Standard) 簡體到臺灣正體tw2s.jsonTraditional Chinese (Taiwan Standard) to Simplified Chinese 臺灣正體到簡體s2hk.jsonSimplified Chinese to Traditional Chinese (Hong Kong variant) 簡體到香港繁體hk2s.jsonTraditional Chinese (Hong Kong variant) to Simplified Chinese 香港繁體到簡體s2twp.jsonSimplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom 簡體到繁體(臺灣正體標準)並轉換爲臺灣常用詞彙tw2sp.jsonTraditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom 繁體(臺灣正體標準)到簡體並轉換爲中國大陸常用詞彙t2tw.jsonTraditional Chinese (OpenCC Standard) to Taiwan Standard 繁體(OpenCC 標準)到臺灣正體hk2t.jsonTraditional Chinese (Hong Kong variant) to Traditional Chinese 香港繁體到繁體(OpenCC 標準)t2hk.jsonTraditional Chinese (OpenCC Standard) to Hong Kong variant 繁體(OpenCC 標準)到香港繁體t2jp.jsonTraditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai) 繁體(OpenCC 標準,舊字體)到日文新字體jp2t.jsonNew Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai) 日文新字體到繁體(OpenCC 標準,舊字體)tw2t.jsonTraditional Chinese (Taiwan standard) to Traditional Chinese 臺灣正體到繁體(OpenCC 標準)t2cngov.jsonTraditional Chinese to CN Government Standard 繁體到大陸政府標準繁體t2cngov_keep_simp.jsonTraditional Chinese to CN Government Standard (Keep Simplified) 繁體到大陸政府標準繁體(保留簡體)
將各種標準的繁體中文(港、台、混合格式)轉換為中國《通用規範漢字表》(2013)定義的規範繁體字。
兩種轉換模式:
- t2cngov - 全部轉為標準繁體(包括簡體→繁體)
- t2cngov_keep_simp - 保留原有簡體字,僅標準化繁體部分
使用示例:
# 全轉為政府標準繁體
echo "測試简体混繁體" | opencc -c t2cngov.json
# 輸出: 測試簡體混繁體
# 保留簡體字
echo "測試简体混繁體" | opencc -c t2cngov_keep_simp.json
# 輸出: 测试简体混繁體
# 異體字標準化
echo "潮溼的露臺" | opencc -c t2cngov.json
# 輸出: 潮湿的露台**致謝:**基於 TerryTian-tech 的研究成果,在此表示感謝。
通过环境变量OPENCC_DATA_DIR加载指定路径下的配置文件
OPENCC_DATA_DIR=/path/to/your/config/dir opencc --helpOpenCC 現已支援外部 C++ 分詞插件。當前第一個插件為 opencc-jieba,
可通過 s2twp_jieba.json、tw2sp_jieba.json 等插件配置啓用。
OpenCC now supports external C++ segmentation plugins. The first plugin is
opencc-jieba, which can be enabled through plugin-backed configs such as
s2twp_jieba.json and tw2sp_jieba.json.
注意:
- 該插件機制目前仍為試驗性功能。
jieba插件是可選組件,預設 OpenCC 構建、Python 套件和 Node.js 套件都不要求它。opencc-jieba額外依賴cppjieba及其配套詞典資源,這些依賴僅在構建或分發該插件時需要。- 在下一次正式發布版本之前,插件 ABI 仍可能發生變化,不應視為穩定介面。
- 我們預計從下一次正式發布版本開始,將插件 ABI 視為穩定介面。
- Windows 下插件必須與宿主 OpenCC 二進位使用 ABI 相容的工具鏈/執行時構建;MSVC 與 MinGW 產物不支援混用。
Notes:
- The plugin mechanism is currently experimental.
- The
jiebaplugin is optional and is not required for the default OpenCC build, Python package, or Node.js package. opencc-jiebaadditionally depends oncppjiebaand its dictionary resources. These dependencies are only needed when building or distributing the plugin itself.- The plugin ABI may still change before the next formal OpenCC release and should not yet be treated as stable.
- We expect to treat the plugin ABI as stable starting with the next formal OpenCC release.
- On Windows, plugins must be built with an ABI-compatible toolchain/runtime as the host OpenCC binary. Mixing MSVC-built hosts with MinGW-built plugins, or the reverse, is unsupported.
g++ 4.6+ or clang 3.2+ is required.
makebuild.cmdbazel build //:opencc# Build and run all tests (recommended)
make check
# Or build first, then run tests separately
make
make testtest.cmdbazel test --test_output=all //src/... //data/... //python/... //test/...make benchmark
Example results (from Github CI, commit ID 9e80d5d, 2026-04-16, CMake macos-latest):
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_Initialization/hk2s 868 us 868 us 665
BM_Initialization/hk2t 139 us 139 us 5059
BM_Initialization/jp2t 203 us 203 us 3448
BM_Initialization/s2hk 26201 us 26200 us 27
BM_Initialization/s2t 26385 us 26382 us 27
BM_Initialization/s2tw 27108 us 27108 us 27
BM_Initialization/s2twp 26446 us 26445 us 25
BM_Initialization/s2twp_jieba 142754 us 141974 us 5
BM_Initialization/t2hk 66.7 us 66.7 us 10519
BM_Initialization/t2jp 166 us 166 us 4215
BM_Initialization/t2s 797 us 797 us 883
BM_Initialization/t2tw 58.1 us 58.1 us 12075
BM_Initialization/tw2s 845 us 845 us 831
BM_Initialization/tw2sp 1004 us 1004 us 697
BM_Initialization/tw2t 93.3 us 93.3 us 7492
BM_ConvertLongText/s2t 327 ms 327 ms 2 bytes_per_second=5.45069M/s
BM_ConvertLongText/s2twp 554 ms 554 ms 1 bytes_per_second=3.21299M/s
BM_ConvertLongText/s2twp_jieba 742 ms 741 ms 1 bytes_per_second=2.40096M/s
BM_Convert/s2t_100 0.649 ms 0.649 ms 1083 bytes_per_second=6.15628M/s
BM_Convert/s2t_1000 6.64 ms 6.64 ms 106 bytes_per_second=6.16118M/s
BM_Convert/s2t_10000 68.1 ms 68.1 ms 10 bytes_per_second=6.14608M/s
BM_Convert/s2t_100000 718 ms 717 ms 1 bytes_per_second=5.96785M/s
BM_Convert/s2twp_100 1.20 ms 1.20 ms 552 bytes_per_second=3.32407M/s
BM_Convert/s2twp_1000 12.3 ms 12.3 ms 57 bytes_per_second=3.32311M/s
BM_Convert/s2twp_10000 126 ms 126 ms 6 bytes_per_second=3.31205M/s
BM_Convert/s2twp_100000 1296 ms 1296 ms 1 bytes_per_second=3.3027M/s
BM_Convert/s2twp_jieba_100 1.51 ms 1.49 ms 495 bytes_per_second=2.67698M/s
BM_Convert/s2twp_jieba_1000 15.0 ms 15.0 ms 48 bytes_per_second=2.72292M/s
BM_Convert/s2twp_jieba_10000 153 ms 153 ms 5 bytes_per_second=2.73681M/s
BM_Convert/s2twp_jieba_100000 1728 ms 1728 ms 1 bytes_per_second=2.47784M/s
Please update if your project is using OpenCC.
- ibus-pinyin
- fcitx
- rimeime
- libgooglepinyin
- ibus-libpinyin
- alfred-chinese-converter
- GoldenDict
- China Biographical Database Project (CBDB)
Apache License 2.0
- darts-clone BSD License
- marisa-trie BSD License
- tclap MIT License
- rapidjson MIT License
- Google Test BSD License
- cppjieba MIT License
- Optional dependency used by the experimental
opencc-jiebaplugin. - 試驗性
opencc-jieba插件使用的可選依賴。
- Optional dependency used by the experimental
- Introduction 詳細介紹 https://github.com/BYVoid/OpenCC/wiki/%E7%B7%A3%E7%94%B1
- 現代漢語常用簡繁一對多字義辨析表 http://ytenx.org/byohlyuk/KienxPyan
- BYVoid
- 佛振
- Peng Huang
- LI Daobing
- Kefu Chai
- Kan-Ru Chen
- Ma Xiaojun
- Jiang Jiang
- Ruey-Cheng Chen
- Paul Meng
- Lawrence Lau
- 瑾昀
- 內木一郎
- Marguerite Su
- Brian White
- Qijiang Fan
- LEOYoon-Tsaw
- Steven Yao
- Pellaeon Lin
- stony
- steelywing
- 吕旭东
- Weng Xuetian
- Ma Tao
- Heinz Wiesinger
- J.W
- Amo Wu
- Mark Tsai
- Zhe Wang
- sgqy
- Qichuan (Sean) ZHANG
- Flandre Scarlet
- 宋辰文
- iwater
- Xpol Wan
- Weihang Lo
- Cychih
- kyleskimo
- Ryuan Choi
- Prcuvu
- Tony Able
- Xiao Liang
- Frank Lin
Please feel free to update this list if you have contributed OpenCC.
