Name	Name	Last commit message	Last commit date
parent directory ..
kernels	kernels
.gitignore	.gitignore
BUILD.bazel	BUILD.bazel
README.md	README.md
addition.py	addition.py
benchmarks.mojo	benchmarks.mojo
causal_conv1d.py	causal_conv1d.py
custom_op_example.bzl	custom_op_example.bzl
dogs.jpg	dogs.jpg
fused_attention.py	fused_attention.py
histogram.py	histogram.py
image_pipeline.py	image_pipeline.py
mandelbrot.py	mandelbrot.py
matrix_multiplication.py	matrix_multiplication.py
parametric_addition.py	parametric_addition.py
pixi.lock	pixi.lock
pixi.toml	pixi.toml
top_k.py	top_k.py
vector_addition.py	vector_addition.py

Writing custom CPU or GPU graph operations using Mojo

The MAX graph API provides a powerful framework for staging computational graphs to be run on GPUs, CPUs, and more. Each operation in one of these graphs is defined in Mojo, an easy-to-use language for writing high-performance code.

For an API walkthrough, see the tutorial to build custom ops for GPUs.

The examples here illustrate how to construct custom graph operations in Mojo that run on GPUs and CPUs, as well as how to build computational graphs that contain and run them on different hardware architectures.

Graphs in MAX can be extended to use custom operations written in Mojo. The following examples are shown here:

addition: Adding 1 to every element of an input tensor.
mandelbrot: Calculating the Mandelbrot set.
vector_addition: Performing vector addition using a manual GPU function.
top_k: A top-K token sampler, a complex operation that shows a real-world use case for a custom operation used today within a large language model processing pipeline.
matrix_multiplication: Various matrix multiplication algorithms, using a memory layout abstraction.
fused_attention: A fused attention operation, which leverages many of the available MAX GPU programming features to show how to address an important use case in AI models.
image_pipeline: Simple image pipeline that sequences custom ops: grayscale, brighten, and blur. It leaves the data on the GPU for each op before writing the result back to CPU and disk.

Custom kernels have been written in Mojo to carry out these calculations. For each example, a simple graph containing a single operation is constructed in Python. This graph is compiled and dispatched onto a supported GPU if one is available, or the CPU if not. Input tensors, if there are any, are moved from the host to the device on which the graph is running. The graph then runs and the results are copied back to the host for display.

One thing to note is that this same Mojo code runs on CPU as well as GPU. In the construction of the graph, it runs on a supported accelerator if one is available or falls back to the CPU if not. No code changes for either path. The vector_addition example shows how this works under the hood for common MAX abstractions, where compile-time specialization lets MAX choose the optimal code path for a given hardware architecture.

The kernels/ directory contains the custom kernel implementations, and the graph construction occurs in the Python files in the base directory. These examples are designed to stand on their own, so that they can be used as templates for experimentation.

A single Pixi command runs each of the examples:

pixi run addition
pixi run mandelbrot
pixi run vector_addition
pixi run top_k
pixi run matrix_multiplication
pixi run fused_attention
pixi run image_pipeline

pixi run <example> runs the associated Python example, taking care to ensure the necessary dependencies (i.e. the max package) are visible. The Python code will construct the graph and related inference session state. The Mojo kernels code defining the custom operations will be (re)compiled on the fly as needed, ensuring the executing graph is always using the latest version of the Mojo code.

You can also run benchmarks to compare the performance of your GPU to your CPU:

pixi run benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Writing custom CPU or GPU graph operations using Mojo

FilesExpand file tree

custom_ops

Directory actions

More options

Directory actions

More options

Latest commit

History

custom_ops

Folders and files

parent directory

README.md

Writing custom CPU or GPU graph operations using Mojo