Distribute and run LLMs with a single file.
-
Updated
Apr 23, 2026 - C++
Distribute and run LLMs with a single file.
High-speed Large Language Model Serving for Local Deployment
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目,支持在 Mac mini/MacBook 上或通过算力棒本地运行推理,实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理,支持复杂多步骤任务规划与执行。
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant
InferrLM - On-device AI for iOS & Android
A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.
Notolog Markdown Editor
A lightweight CUDA-based local inference platform built around Z-Image Turbo by Tongyi
Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.
Tool for test diferents large language models without code.
Desktop AI tutoring app with local inference using Ollama for privacy-focused education.
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs with a optimized GB10 kernel
Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.
EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎,强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。
Llama.cpp but for kindles
Local LLM inference server for Qwen models on Apple Silicon. Private, offline-first AI development with OpenCode integration. Or alternatively: Run Qwen models locally on Mac with MLX. Works offline, no subscriptions, full privacy. Integrates with OpenCode for AI-powered development.
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
Local embeddings server for Apple Silicon using MLX, providing OpenAI-compatible API endpoints
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."