Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs
kvcache_agent	kvcache_agent
mocks	mocks
pipelines	pipelines
queue	queue
recordreplay	recordreplay
router	router
scheduler	scheduler
schemas	schemas
telemetry	telemetry
ARCHITECTURE.md	ARCHITECTURE.md
BUILD.bazel	BUILD.bazel
README.md	README.md
__init__.py	__init__.py
api_server.py	api_server.py
config.py	config.py
debug.py	debug.py
process_control.py	process_control.py
request.py	request.py

Name

Last commit message

Last commit date

MAX inference server

MAX is a high-performance inference server that provides an OpenAI-compatible endpoint for large language models (LLMs) locally or in the cloud. To start your own serving endpoint with just a few commands, check out our quickstart guide.

The inference server is built in Python and the source is available in our GitHub repo, but we're currently not accepting contributions to the serving library.

License

Users must adhere to the terms of usage for MAX and Mojo. Modular Community License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

MAX inference server

License

FilesExpand file tree

serve

Directory actions

More options

Directory actions

More options

Latest commit

History

serve

Folders and files

parent directory

README.md

MAX inference server

License