local-inference

Star

Here are 84 public repositories matching this topic...

mozilla-ai / llamafile

Star

Distribute and run LLMs with a single file.

cross-platform speech-to-text local-inference llama-cpp local-llm local-ai gguf open-source-ai single-file-executable

Updated Apr 23, 2026
C++

Tiiny-AI / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Jan 24, 2026
C++

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目，支持在 Mac mini/MacBook 上或通过算力棒本地运行推理，实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理，支持复杂多步骤任务规划与执行。

desktop-automation mano gui-automation edge-computing on-device-ai local-inference vision-language-action multimodal-ai gui-grounding osworld computer-use-agents visual-language-model mano-p

Updated Apr 20, 2026

efeslab / fiddler

Star

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

mixture-of-experts llm local-inference llm-inference mixtral-8x7b

Updated Nov 18, 2024
Python

oxide-lab / Oxide-Lab

Star

Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant

Updated Mar 24, 2026
Rust

sbhjt-gr / InferrLM

Star

InferrLM - On-device AI for iOS & Android

embeddings gemini http-server openai document-processing rag edge-ai on-device-ai local-inference anthropic llamacpp llama-cpp local-llm gguf multimodal-ai

Updated Apr 17, 2026
TypeScript

YASSERRMD / barq-web-rag

Star

A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.

rust wasm rag local-inference barq rag-in-browser browser-vector-db

Updated Mar 24, 2026
JavaScript

notolog / notolog-editor

Star

Notolog Markdown Editor

python markdown qt emacs markdown-editor onnx python-ai on-device-ai ai-assistant pyside6 python-qt local-inference llama-cpp local-llm qwen llama-cpp-python gguf phi-4 privacy-first-ai

Updated Feb 7, 2026
Python

MutantSparrow / JustRayzist

Star

A lightweight CUDA-based local inference platform built around Z-Image Turbo by Tongyi

text-to-image upscaling local-inference prompt-enhancement z-image-turbo

Updated Apr 25, 2026
Python

strnad / HeartMuse

Sponsor

Star

Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.

Updated Mar 9, 2026
Python

BorjaOteroFerreira / IALab-Suite

Star

Tool for test diferents large language models without code.

api-rest chat-application flask-api inference-api large-language-models llm local-inference llamacpp llm-inference llama2 llama-cpp-python llama2-7b mixtral-8x7b

Updated Oct 18, 2025
Python

michael-borck / study-buddy

Star

Desktop AI tutoring app with local inference using Ollama for privacy-focused education.

electron javascript desktop-app css education privacy typescript ai offline nextjs edtech desktop-application llama tutoring privacy-focused local-inference ollama ai-tutor offline-application

Updated Apr 13, 2026
TypeScript

yas-sim / openvino-llm-chatbot-rag

Star

LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).

natural-language-processing offline chatbot intel edge-computing rag openvino huggingface edge-inference cloud-free llm local-inference langchain dolly2 retrieval-augmented-generation llama2 neural-chat

Updated Jan 25, 2024
Python

Navi-AI-Lab / nvllm

Star

(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs with a optimized GB10 kernel

nvidia cuda-kernels cutlass local-inference vllm llm-inference qwen paged-attention self-hosted-ai gb10 sm120 nvfp4 dgx-spark fp4-quantization attention-kernel fp8-kv-cache

Updated Apr 25, 2026
Python

monday8am / edgelab

Star

Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.

android-development gemma litert edge-ai mediapipe local-inference function-calling agentic-ai koog litert-lm

Updated Apr 3, 2026
Kotlin

LianHe-BI / Basic-Qwen-3B-SD-Prompt-SOUL-ARCHITECT-v2.0-DEMO

Star

EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎，强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。

Updated Jan 19, 2026
Python

Entersjkhdfkjdhfksjf / LlamaCPP-kindle

Star

Llama.cpp but for kindles

ai jailbreak kindle local-inference local-llm llm-inference local-ai

Updated Mar 8, 2026

kibotu / mlx-llm-server-mac-m-series

Sponsor

Star

Local LLM inference server for Qwen models on Apple Silicon. Private, offline-first AI development with OpenCode integration. Or alternatively: Run Qwen models locally on Mac with MLX. Works offline, no subscriptions, full privacy. Integrates with OpenCode for AI-powered development.

machine-learning opencode mlx apple-silicon openai-api ai-assistant llm local-inference local-ai qwen offline-ai private-ai

Updated Apr 21, 2026
Shell

Raxephion / AuraGen-AuraFlow-WebUI

Star

Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.

python open-source image-generation webui gradio text-to-image stable-diffusion diffusers local-inference generative-ai ai-image-generator auraflow low-vram

Updated Jun 7, 2025
Python

aperepel / mlx-serve-embeddings

Star

Local embeddings server for Apple Silicon using MLX, providing OpenAI-compatible API endpoints

machine-learning privacy embeddings m2 m3 mlx m1 text-embeddings vector-embeddings apple-silicon openai-api local-inference qwen litellm openai-compatible

Updated Feb 17, 2026
Shell

Improve this page

Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local-inference

Here are 84 public repositories matching this topic...

mozilla-ai / llamafile

Tiiny-AI / PowerInfer

Mininglamp-AI / Mano-P

efeslab / fiddler

oxide-lab / Oxide-Lab

sbhjt-gr / InferrLM

YASSERRMD / barq-web-rag

notolog / notolog-editor

MutantSparrow / JustRayzist

strnad / HeartMuse

BorjaOteroFerreira / IALab-Suite

michael-borck / study-buddy

yas-sim / openvino-llm-chatbot-rag

Navi-AI-Lab / nvllm

monday8am / edgelab

LianHe-BI / Basic-Qwen-3B-SD-Prompt-SOUL-ARCHITECT-v2.0-DEMO

Entersjkhdfkjdhfksjf / LlamaCPP-kindle

kibotu / mlx-llm-server-mac-m-series

Raxephion / AuraGen-AuraFlow-WebUI

aperepel / mlx-serve-embeddings

Improve this page

Add this topic to your repo