Name	Name	Last commit message	Last commit date
parent directory ..
qwen2	qwen2
README.md	README.md
pixi.toml	pixi.toml

Name

Last commit message

Last commit date

Registering and Serving Custom Model Architectures with MAX

This example demonstrates how to register custom model architectures with MAX and serve them using the MAX serving infrastructure. MAX provides an extensible framework for defining custom models that can be served with OpenAI-compatible endpoints, enabling you to deploy your own model architectures alongside standard models.

For this example, we're using one of the already-registered architectures in MAX: Qwen2ForCausalLM. This architecture already exists, but the local version will override the built-in one. This demonstrates the general structure and process for bringing a model that doesn't already exist in MAX to the framework.

Registering your custom model architecture with MAX's model registry makes it available for loading and serving. Once a custom model has been defined in a directory, the flag --custom-architectures [directory] applied to max serve, max generate, or the max.entrypoints.pipelines entrypoint will load the Python code for your custom MAX model and try to use it with weights you link to.

A single Pixi command runs the two variants of this example:

pixi run generate
pixi run serve

Both the generate and serve examples use the weights and hyperparameters from the Hugging Face model Qwen/Qwen2.5-0.5B-Instruct. The former directly generates a text completion from a prompt, while the latter starts an OpenAI API compatible serving endpoint.

A full invocation within an environment with MAX installed looks like:

python -m max.entrypoints.pipelines serve --custom-architectures qwen2 --model-path=Qwen/Qwen2.5-0.5B-Instruct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Registering and Serving Custom Model Architectures with MAX

FilesExpand file tree

custom-models

Directory actions

More options

Directory actions

More options

Latest commit

History

custom-models

Folders and files

parent directory

README.md

Registering and Serving Custom Model Architectures with MAX