From 40b1ed613c252294f7c96a5d394475f00d4eb752 Mon Sep 17 00:00:00 2001 From: Daniel weadock Date: Mon, 2 Mar 2026 12:45:28 -0500 Subject: [PATCH] Add CONTRIBUTING.md with setup guide and project overview Provides contribution guidelines including: - Development environment setup (with CPU instructions) - How to report bugs and submit PRs - Project structure overview - Model architecture summary (emotion labels, events, languages) - Docker usage instructions Co-Authored-By: Claude Opus 4.6 --- CONTRIBUTING.md | 152 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 CONTRIBUTING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..ca34ccd --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,152 @@ +# Contributing to SenseVoice + +Thank you for your interest in contributing to SenseVoice! This guide will help you get started. + +## Getting Started + +### Prerequisites + +- Python 3.8 or higher +- Git +- (Optional) CUDA-compatible GPU for faster inference + +### Setting Up the Development Environment + +1. **Fork and clone the repository:** + +```bash +git clone https://github.com//SenseVoice.git +cd SenseVoice +``` + +2. **Create a virtual environment:** + +```bash +python3 -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate +``` + +3. **Install dependencies:** + +```bash +pip install -r requirements.txt +``` + +4. **Verify the installation:** + +```bash +python -c "from funasr import AutoModel; print('Installation successful')" +``` + +### Running on CPU + +If you don't have a GPU, you can run SenseVoice on CPU by setting the device to `"cpu"`: + +```python +model = AutoModel( + model="iic/SenseVoiceSmall", + trust_remote_code=True, + device="cpu", +) +``` + +For the FastAPI server: + +```bash +export SENSEVOICE_DEVICE=cpu +fastapi run --port 50000 +``` + +## How to Contribute + +### Reporting Bugs + +1. Search [existing issues](https://github.com/FunAudioLLM/SenseVoice/issues) to avoid duplicates. +2. Use the [Bug Report template](https://github.com/FunAudioLLM/SenseVoice/issues/new?template=bug_report.md). +3. Include your environment details (OS, Python version, PyTorch version, GPU, CUDA version). +4. Provide a minimal code sample to reproduce the issue. + +### Submitting Pull Requests + +1. **Fork the repository** and create a new branch from `main`: + +```bash +git checkout -b your-branch-name +``` + +2. **Make your changes.** Keep commits focused and atomic. + +3. **Test your changes** to make sure nothing is broken. + +4. **Push to your fork** and open a Pull Request against `FunAudioLLM/SenseVoice:main`. + +5. **Describe your changes** clearly in the PR description. Explain *what* changed and *why*. + +### Types of Contributions We Welcome + +- **Bug fixes** — Check [open issues labeled `bug`](https://github.com/FunAudioLLM/SenseVoice/issues?q=is%3Aissue+is%3Aopen+label%3Abug) for known problems. +- **Documentation improvements** — Typo fixes, clarifications, additional examples, translations. +- **New examples** — Demo scripts showing different use cases (emotion detection, event detection, multilingual transcription). +- **Performance improvements** — Optimizations for inference speed or memory usage. +- **Test coverage** — Unit tests and integration tests are very welcome. + +### Code Style + +- Follow existing code patterns and conventions in the repository. +- Use type hints where applicable. +- Add docstrings to new functions and classes. +- Keep lines under 120 characters. + +## Project Structure + +``` +SenseVoice/ +├── model.py # Core SenseVoiceSmall model (encoder, CTC decoder, emotion/event embeddings) +├── api.py # FastAPI server for inference +├── webui.py # Gradio web interface +├── demo1.py # Inference example using FunASR AutoModel +├── demo2.py # Direct model inference with timestamp support +├── export.py # ONNX model export +├── export_meta.py # Export utilities for model rebuilding +├── finetune.sh # Fine-tuning script with DeepSpeed support +├── requirements.txt # Python dependencies +├── Dockerfile # Docker build configuration +├── data/ # Example training and validation data +├── utils/ # Utilities (frontend, ONNX inference, CTC alignment, export) +└── image/ # Documentation images +``` + +## Understanding the Model + +SenseVoice is a non-autoregressive encoder-only model that outputs: + +- **Speech transcription** (ASR) across 50+ languages +- **Emotion labels**: `HAPPY`, `SAD`, `ANGRY`, `NEUTRAL`, `FEARFUL`, `DISGUSTED`, `SURPRISED` +- **Audio event labels**: `BGM`, `Speech`, `Applause`, `Laughter`, `Cry`, `Sneeze`, `Breath`, `Cough` +- **Language identification** for Mandarin, English, Cantonese, Japanese, and Korean + +The model uses a SANM (Self-Attention with Normalized Memory) encoder architecture with CTC decoding. Emotion and event labels are predicted from the first 4 encoder output tokens, while the remaining tokens produce the transcription. + +## Docker + +You can also run SenseVoice using Docker: + +```bash +# Build +docker build -t sensevoice . + +# Run with GPU +docker run --gpus all -p 50000:50000 sensevoice + +# Run on CPU +docker run -e SENSEVOICE_DEVICE=cpu -p 50000:50000 sensevoice +``` + +## Questions? + +- Open an issue using the [Questions template](https://github.com/FunAudioLLM/SenseVoice/issues/new?template=ask_questions.md). +- Join the community via the DingTalk group (see [README](./README.md#community)). + +## License + +By contributing to SenseVoice, you agree that your contributions will be licensed under the same license as the project. See [FunASR License](https://github.com/modelscope/FunASR?tab=readme-ov-file#license) for details.