Skip to content

Comments

feat: Add ModelsLab backend for cloud AI inference#8597

Open
adhikjoshi wants to merge 1 commit intomudler:masterfrom
adhikjoshi:ml
Open

feat: Add ModelsLab backend for cloud AI inference#8597
adhikjoshi wants to merge 1 commit intomudler:masterfrom
adhikjoshi:ml

Conversation

@adhikjoshi
Copy link

ModelsLab Backend for LocalAI

🚀 First Cloud Provider Backend for LocalAI - Comprehensive multi-modal AI generation via ModelsLab API

Overview

This backend integrates ModelsLab's cloud AI APIs with LocalAI, providing self-hosted users access to:

  • 🤖 Large Language Models (Llama 3.1, Mixtral, Gemini, GPT)
  • 🖼️ Image Generation (Flux, SDXL, Playground v2.5)
  • 🎬 Video Generation (CogVideoX, Mochi) - First video backend in LocalAI
  • 🔊 Text-to-Speech (Multi-language voice synthesis)
  • 📊 Text Embeddings (Semantic search, RAG applications)

Key Features

  • Hybrid Architecture: Keep lightweight processing local, offload heavy workloads to cloud
  • Zero Hardware Requirements: No GPU needed for AI generation
  • Latest Models: Immediate access to newest AI models without local setup
  • Cost Efficient: Pay-per-use pricing vs. expensive GPU hardware
  • Multi-Modal: Single backend supporting all AI generation types
  • Production Ready: Built-in retry logic, error handling, and monitoring

Quick Start

1. Get ModelsLab API Key

2. Install Backend in LocalAI

# Install ModelsLab backend
local-ai backends install modelslab

# Or build from source
docker build -t modelslab-backend .

3. Configure API Key

export MODELSLAB_API_KEY="your_api_key_here"

4. Use with LocalAI

# Start LocalAI
local-ai run

# Generate text
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "modelslab/llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello from the cloud!"}]
  }'

# Generate image  
curl http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "modelslab/flux",
    "prompt": "A beautiful sunset over mountains",
    "size": "1024x1024"
  }'

# Generate video (Innovation!)
curl http://localhost:8080/v1/videos/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "modelslab/cogvideox", 
    "prompt": "A cat playing with a ball of yarn",
    "duration": 5
  }'

Configuration

Environment Variables

Variable Description Default
MODELSLAB_API_KEY Your ModelsLab API key (required) -
MODELSLAB_BASE_URL ModelsLab API endpoint https://modelslab.com/api/v6
MODELSLAB_TIMEOUT Request timeout in seconds 300
MODELSLAB_RETRY_COUNT Max retry attempts 3
LOCALAI_BACKEND_ADDRESS gRPC server address localhost:50051
PYTHON_GRPC_MAX_WORKERS Concurrent request workers 4

Model Configuration

The backend automatically maps LocalAI model requests to ModelsLab models:

# Default model mappings
text: "meta-llama/llama-3.1-8b-instruct"
image: "flux"  
video: "cogvideox"
tts: "tts"
embeddings: "text-embedding-3-small"

Override via LoadModel options:

{
  "model": "custom-model-id",
  "options": ["modelslab_api_key:your_key", "modelslab_base_url:custom_url"]
}

Supported Models

🤖 Text Generation (LLM)

  • Llama 3.1 (8B, 70B, 405B variants)
  • Mixtral 8x7B (Mixture of experts)
  • Gemini 1.5 (Pro and Flash)
  • GPT-4 and GPT-3.5-Turbo
  • Claude 3 (Opus, Sonnet, Haiku)
  • Custom fine-tuned models

🖼️ Image Generation

  • Flux (Latest SOTA model)
  • SDXL (High-quality generation)
  • Playground v2.5 (Aesthetic quality)
  • Stable Diffusion variants
  • Midjourney-style generation

🎬 Video Generation (New!)

  • CogVideoX (Text-to-video, Image-to-video)
  • Mochi (High-quality motion)
  • AnimateDiff (Animation-focused)
  • Stable Video Diffusion

🔊 Audio & Speech

  • Text-to-Speech (100+ languages)
  • Voice Cloning (Custom voices)
  • Music Generation (Text-to-music)

📊 Text Embeddings

  • text-embedding-3-small (1536 dimensions)
  • text-embedding-3-large (3072 dimensions)
  • Custom embedding models

API Compatibility

This backend implements LocalAI's standard gRPC interface:

service Backend {
  rpc Health(Empty) returns (Reply) {}
  rpc LoadModel(ModelOptions) returns (Result) {}
  rpc GenerateImage(GenerateImageRequest) returns (Result) {}
  rpc GenerateVideo(GenerateVideoRequest) returns (Result) {}
}

Development

Building from Source

# Clone repository
git clone https://github.com/modelslab/localai-backend.git
cd localai-backend

# Build container
docker build -t modelslab-backend .

# Run locally
./run.sh --addr localhost:50051

Testing

# Install test dependencies
pip install pytest pytest-asyncio

# Run tests
python -m pytest tests/

# Test with LocalAI
local-ai backends add modelslab ./
local-ai run

Project Structure

localai-modelslab-backend/
├── backend.py           # Main gRPC server
├── requirements.txt     # Python dependencies  
├── libbackend.sh       # Backend startup script
├── run.sh              # Entry point
├── Dockerfile          # Container build
├── README.md           # Documentation
└── tests/              # Test suite

Architecture

Request Flow

LocalAI → gRPC → ModelsLab Backend → HTTP → ModelsLab API → AI Models
   ↑                                                           ↓
   ← Image/Video/Text ← Download ← Response ← Generation ← Cloud

Key Components

  1. gRPC Server: Implements LocalAI backend protocol
  2. API Client: Async HTTP client for ModelsLab API
  3. Request Translator: Maps LocalAI → ModelsLab parameters
  4. Async Handler: Manages long-running generations with polling
  5. Error Handler: Converts API errors to gRPC status codes

Use Cases

Hybrid AI Workflows

  • Light tasks locally: Quick chat, simple image edits
  • Heavy tasks in cloud: Video generation, high-res images, complex LLMs

Cost Optimization

  • Development: Use cloud for experimentation
  • Production: Mix local and cloud based on workload

Latest Model Access

  • Immediate availability: New models without local setup
  • No storage limits: Generate without disk space concerns

Scalability

  • Elastic compute: Handle traffic spikes via cloud
  • Geographic distribution: Low-latency generation worldwide

Pricing

ModelsLab uses pay-per-use pricing:

  • LLM: ~$0.20 per 1M tokens
  • Image: ~$0.025 per image
  • Video: ~$0.025 per video
  • TTS: ~$0.01 per 1K characters
  • Embeddings: ~$0.02 per 1M tokens

See ModelsLab Pricing for current rates.

Support

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make changes and add tests
  4. Submit a pull request

License

This project is licensed under the Apache License 2.0. See LICENSE for details.


Made with ❤️ by the ModelsLab Team

Bringing cloud AI to self-hosted LocalAI installations worldwide.

@netlify
Copy link

netlify bot commented Feb 18, 2026

Deploy Preview for localai ready!

Name Link
🔨 Latest commit a37f6c4
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/6995d908446d0000084b96ee
😎 Deploy Preview https://deploy-preview-8597--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant