Breathing Life into Language

Aphrodite is an inference engine that optimizes the serving of HuggingFace-compatible models at scale. Built on vLLM's Paged Attention technology, it delivers high-performance model inference for multiple concurrent users. Aphrodite serves as the backend engine powering PygmalionAI's chat platforms and API infrastructure.

Aphrodite builds upon and integrates the exceptional work from various projects, primarily vLLM.

Features

Continuous Batching
Efficient K/V management with PagedAttention from vLLM
Optimized CUDA kernels for improved inference
Quantization support via AQLM, AutoRound, AWQ, BitNet, Bitsandbytes, ExLlamaV3, GGUF, GPTQ, QuIP#, SqueezeLLM, Marlin, [2] [3], NVIDIA ModelOpt, TorchAO, VPTQ, compressed_tensors, MXFP4, and more.
Distributed inference
Quantized KV cache using scaled and scale-less FP8, and TurboQuant
Support for modern samplers such as DRY, XTC, Mirostat, and more
Disaggregated inference
Speculative decoding, including EAGLE, DFlash, ngram, MTP, and more
Multimodal support
Multi-LoRA support

Quickstart

Install the engine:

pip install -U aphrodite-engine

Then launch a model:

aphrodite run Qwen/Qwen3.5-0.8B

This will create a OpenAI-compatible API server that can be accessed at port 2242 of the localhost. You can plug in the API into a UI that supports OpenAI, such as SillyTavern.

Requirements

Operating System: Linux, Windows (WSL2)
Python: 3.10 to 3.13 (build from source for 3.14)

Build Requirements

CUDA >= 12

Notes

By design, Aphrodite takes up 92% of your GPU's VRAM. If you're not serving an LLM at scale, you may want to limit the amount of memory it takes up. You can do this in the API example by launching the server with the --gpu-memory-utilization 0.6 (0.6 means 60%).
You can view the full list of commands by running aphrodite run --help.

Acknowledgements

Aphrodite Engine would have not been possible without the phenomenal work of other open-source projects. A (non-exhaustive) list:

Contributing

Everyone is welcome to contribute. You can support the project by opening Pull Requests for new features, fixes, or general UX improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 1,631 Commits
.buildkite		.buildkite
.gemini		.gemini
.github		.github
aphrodite		aphrodite
assets		assets
benchmarks		benchmarks
cmake		cmake
csrc		csrc
docker		docker
docs		docs
examples		examples
patches		patches
requirements		requirements
tests		tests
tools		tools
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
amdpatch.sh		amdpatch.sh
build_and_upload_docker.sh		build_and_upload_docker.sh
build_wheel.sh		build_wheel.sh
config.yaml		config.yaml
env.py		env.py
environment.yaml		environment.yaml
formatting.ps1		formatting.ps1
formatting.sh		formatting.sh
install_windows.ps1		install_windows.ps1
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
runtime.sh		runtime.sh
setup.py		setup.py
update-runtime.sh		update-runtime.sh
use_existing_torch.py		use_existing_torch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Breathing Life into Language

Features

Quickstart

Requirements

Build Requirements

Notes

Acknowledgements

Sponsors

Contributing

About

Uh oh!

Releases 39

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Breathing Life into Language

Features

Quickstart

Requirements

Build Requirements

Notes

Acknowledgements

Sponsors

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 39

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages