Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions kleidiai-examples/audiogen/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
SPDX-FileCopyrightText: Copyright 2025-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>

SPDX-License-Identifier: Apache-2.0
-->
Expand All @@ -8,8 +8,8 @@

Welcome to the home of audio generation on Arm® CPUs, featuring Stable Audio Open Small with Arm® KleidiAI™. This project provides everything you need to:

- Convert models to LiteRT-compatible formats
- Run these models on Arm® CPUs using the LiteRT runtime, with support from XNNPack and Arm® KleidiAI™
- Convert models to LiteRT formats using LiteRT Torch.
- Run these models on Arm® CPUs using the LiteRT runtime, with support from XNNPACK and Arm® KleidiAI™.

## Prerequisites

Expand Down
30 changes: 6 additions & 24 deletions kleidiai-examples/audiogen/install_requirements.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
# SPDX-FileCopyrightText: Copyright 2025-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>
#
# SPDX-License-Identifier: Apache-2.0
#
Expand All @@ -9,33 +9,15 @@
# Install individual packages
echo "Installing required packages for the Audiogen module..."

# ai-edge-torch
pip install ai-edge-torch==0.4.0 \
"tf-nightly>=2.19.0.dev20250208" \
"ai-edge-litert-nightly>=1.1.2.dev20250305" \
"ai-edge-quantizer-nightly>=0.0.1.dev20250208"

# Stable audio tools
pip install "stable_audio_tools==0.0.19"

# LiteRT Torch
pip install "litert-torch==0.9.0"

# Working out dependency issues, this combination of packages has been tested on different systems (Linux and MacOS).
pip install --no-deps "torch==2.6.0" \
"torchaudio==2.6.0" \
"torchvision==0.21.0" \
"protobuf==5.29.4" \
"numpy==1.26.4" \

# Packages to convert via onnx
pip install --no-deps "onnx==1.18.0" \
"onnxsim==0.4.36" \
"onnx2tf==1.27.10" \
"tensorflow==2.19.0" \
"tf_keras==2.19.0" \
"onnx-graphsurgeon==0.5.8" \
"ai_edge_litert" \
"sng4onnx==1.0.4"
# stable_audio_tools has a dependency on numpy 1.26.4, we need this version, otherwise it fails.
pip install --no-deps "numpy==1.26.4"

echo "Finished installing required packages for AudioGen submodules conversion."
echo "To start converting the Conditioners, DiT and Autoencoder modules conversion, use the following command:"
echo "python ./scripts/export_{MODEL-T0-CONVERT}.py"
echo "python ./scripts/export_sao.py"
139 changes: 24 additions & 115 deletions kleidiai-examples/audiogen/scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,23 @@
<!--
SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
SPDX-FileCopyrightText: Copyright 2025-2026 Arm Limited and/or its affiliates <open-source-office@arm.com>

SPDX-License-Identifier: Apache-2.0
-->

# Building and Running the Audio Generation Application on Arm® CPUs with the Stable Audio Open Small Model

## Goal
This guide will show you how to convert the Stable Audio Open Small Model to LiteRT-compatible form to run on Arm® CPUs with the LiteRT runtime.
This guide will show you how to convert the Stable Audio Open Small Model to LiteRT format to run on Arm® CPUs with the LiteRT runtime.

### Converting the Stable Audio Open Small Model to LiteRT format
The Stable Audio Open Small Model is made of three submodules:
- Conditioners (Text conditioner and number conditioners)
- Diffusion Transformer (DiT)
- AutoEncoder.

You will explore two different conversion routes, to convert the submodules to LiteRT format.
You will explore how to use LiteRT torch for those models.

1. __ONNX → LiteRT__ using the [onnx2tf](https://github.com/PINTO0309/onnx2tf) tool. This is the traditional two-step approach (<strong>PyTorch</strong> → <strong>ONNX</strong> → <strong>LiteRT</strong>). You will use it to convert the Conditioners submodule.

2. __PyTorch → LiteRT__ using the [Google AI Edge Torch](https://developers.googleblog.com/en/ai-edge-torch-high-performance-inference-of-pytorch-models-on-mobile-devices/) tool. This method, currently under active development, aims to simplify the conversion by performing it in a single step. You will use this tool to convert the DiT and AutoEncoder submodules.
__PyTorch → LiteRT__ using the [LiteRT Torch](https://github.com/google-ai-edge/litert-torch) tool. This tool aims to simplify the conversion and the quantization of torch models to LiteRT, for easy deployment on edge devices.

### Create a virtual environment and install dependencies.

Expand All @@ -41,136 +39,47 @@ bash install_requirements.sh
<strong> Option B</strong>
```bash
# Option B (with .venv activated)
# Packages for the ai-edge-torch tool
pip install ai-edge-torch==0.4.0 \
"tf-nightly>=2.19.0.dev20250208" \
"ai-edge-litert-nightly>=1.1.2.dev20250305" \
"ai-edge-quantizer-nightly>=0.0.1.dev20250208"

# Stable-Audio Tools
# Stable audio tools
pip install "stable_audio_tools==0.0.19"

# Working out dependency issues, this combination of packages has been tested on different systems (Linux® and macOS®).
pip install --no-deps "torch==2.6.0" \
"torchaudio==2.6.0" \
"torchvision==0.21.0" \
"protobuf==5.29.4" \
"numpy==1.26.4" \

# Packages to convert using ONNX
pip install --no-deps onnx \
onnxsim \
onnx2tf \
tensorflow \
tf_keras \
onnx_graphsurgeon \
ai_edge_litert \
sng4onnx
```
# Install LiteRT Torch
pip install "litert-torch==0.9.0"

> [!NOTE]
>
> If you are using GPU on your machine, you might faced the following error:
> ```bash
> Traceback (most recent call last):
> File "/home/<user>/Workspace/tflite/env3_10/lib/python3.10/site-packages/torch/_inductor/runtime/hints.py", line 46, in <module>
> from triton.backends.compiler import AttrsDescriptor
> ImportError: cannot import name 'AttrsDescriptor' from 'triton.backends.compiler' (/home/<user>/Workspace/tflite/env3_10/lib/> python3.10/site-packages/triton/backends/compiler.py)
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/home/<user>/Workspace/tflite/audiogen/./scripts/export_dit_autoencoder.py", line 6, in <module>
> import ai_edge_torch
> File "/home/<user>/Workspace/tflite/env3_10/lib/python3.10/site-packages/ai_edge_torch/__init__.py", line 16, in <module>
> from ai_edge_torch._convert.converter import convert
> File "/home/<user>/Workspace/tflite/env3_10/lib/python3.10/site-packages/ai_edge_torch/_convert/converter.py", line 21, in > <module>
> from ai_edge_torch._convert import conversion
> File "/home/<user>/Workspace/tflite/env3_10/lib/python3.10/site-packages/ai_edge_torch/_convert/conversion.py", line 23, in > <module>
> from ai_edge_torch._convert import fx_passes
> File "/home/<user>/Workspace/tflite/env3_10/lib/python3.10/site-packages/ai_edge_torch/_convert/fx_passes/__init__.py", line 21, > in <module>
> from ai_edge_torch._convert.fx_passes.optimize_layout_transposes_pass import OptimizeLayoutTransposesPass
> .
> .
> .
> ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler' (/home/<user>/Workspace/tflite/env3_10/lib/> python3.10/site-packages/triton/compiler/compiler.py)
> ```
> Please use triton 3.2.0 as the following:
> ```bash
> pip install triton==3.2.0
> ```


### Convert Conditioners Submodule
The Conditioners submodule is based on the <strong>T5Encoder</strong> model. Convert it first to <strong>ONNX</strong>, then to <strong>LiteRT</strong> format. All details are implemented in [`scripts/export_conditioners.py`](./export_conditioners.py), which includes the following steps:

1. Load the Conditioners submodule from the Stable Audio Open Small Model configuration and checkpoint.
2. Export the Conditioners submodule to ONNX via `torch.onnx.export()`.
3. Convert the resulting `.onnx` file to LiteRT using `onnx2tf`.

The two conversion steps (PyTorch -> ONNX and ONNX -> LiteRT) are defined as follows:

<strong> PyTorch -> ONNX </strong>
```python
# Export to ONNX
torch.onnx.export(
model,
example_inputs,
output_path,
input_names=[], #Model inputs, a list of input tensors
output_names=[], #Model outputs, a list of output tensors
opset_version=15,
)
```
# Install numpy with this version
pip install --no-deps "numpy==1.26.4"

<strong> ONNX -> LiteRT </strong>
```bash
# Conversion to LiteRT format
onnx2tf -i "input_onnx_model_path" -o "output_folder_path"
```
_or within a Python script_:
```python
import subprocess

onnx2tf_command = [
"onnx2tf",
"-i", str(input_onnx_model_path),
"-o", str(output_folder_path),
]
# Call the command line tool
subprocess.run(onnx2tf_command, check=True)
```
Converting an `.onnx` model to `.tflite`, creates a folder containing models with different precisions (e.g., float16, float32). You will be using the float32.tflite model for on-device inference.

To run the [`scripts/export_conditioners.py`](./export_conditioners.py) script, use the following command (ensure your .venv is still active):

```bash
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
```

### Convert DiT and AutoEncoder Submodules
To convert the DiT and AutoEncoder submodules, we use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided in by the `ai-edge-torch` tools. This API supports exporting a PyTorch model directly to LiteRT following three mains steps; model re-authoring, quantization, and finally conversion.
### Exporting the models
To convert the models, we use the [Generative API](https://github.com/google-ai-edge/litert-torch/tree/main/litert_torch/generative) provided in by the `litert_torch` tools. This API supports exporting a PyTorch model directly to LiteRT following three mains steps; model re-authoring, quantization, and finally conversion.

Here is a code snippet illustrating how the API works in practice.
```python
import ai_edge_torch
from ai_edge_torch.generative.quantize import quant_recipe
import litert_torch
from litert_torch.quantize import quant_config
from litert_torch.generative.quantize import quant_recipe, quant_recipe_utils


# Specify the quantization format
quant_config = quant_recipes.full_int8_dynamic_recipe()
quant_config_int8 = quant_config.QuantConfig(
generative_recipe=quant_recipe.GenerativeQuantRecipe(
default=quant_recipe_utils.create_layer_quant_dynamic(),
)
)
# Initiate the conversion
edge_model = ai_edge_torch.convert(
model, example_inputs, quant_config=quant_config
model, example_inputs, quant_config=quant_config_int8
)
```
Notes on the arguments for `ai_edge_torch.convert()`:
Notes on the arguments for `litert_torch.convert()`:
- __model__: The PyTorch model to be converted. This should be the pre-trained model loaded from the `.config` and `.ckpt` files, and set to evaluation mode (model.eval()).
- __example_inputs__: A tuple of torch.Tensor objects. These are dummy input tensors that match the expected shape and type of your model's forward pass arguments. For models with multiple inputs, provide them as a tuple in the correct order.

To convert the DiT and AutoEncoder submodules, run the [`export_dit_autoencoder.py`](./export_dit_autoencoder.py) script using the following command (ensure your .venv is still active):
To convert the models, run the [`export_sao.py`](./export_sao.py) script using the following command (ensure your .venv is still active):

```bash
python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
python3 ./scripts/export_sao.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
```

The three LiteRT format models will be required to run the audiogen application on Android™ device.
Expand Down
Loading