Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

MAX AI kernels

This directory contains low-level, high-performance compute kernels written in Mojo, designed to serve as building blocks for numerical, machine learning, and other performance-critical workloads.

This library includes production-grade kernel implementations for various CPUs and GPUs, including NVIDIA GPUs (T4, A10G, L40, A100, H100, RTX 40 series, and more) and AMD GPUs (MI300X, MI325X, Radeon RX 9000, and more).

These kernels demonstrate powerful Mojo programming features such as fine-grained control over memory layout, parallelism, and hardware mapping. These implementations prioritize performance and correctness, and are intended to be used both directly and as primitives in higher-level libraries.

To evaluate kernel performance on NVIDIA hardware, see Kernel profiling with Nsight Compute.

If you're looking for the high-level Python APIs based on these kernels and used to build MAX graphs, see the max/nn/ directory.

Contributing

We're accepting kernel contributions. See the kernels contributing guide for details.

License

Apache License v2.0 with LLVM Exceptions

See the license file in the repository for more details.

Support

For any inquiries, bug reports, or feature requests, please open an issue on the GitHub repository.