llm-server

Host an LLM on a non-blocking HTTP server with a simple interface for managing and querying models. This allows you to host an LLM on LLM-capable harware and then access it over the network from devices that can't run an LLM locally.

Built on FastAPI and uvicorn, it wraps the large-language-model library to expose LLM inference over a REST API.

A user friendly GUI interface is included (but not required) for general use.

Setup

See Releases to install from wheel file.

See pyproject.toml for required Python version and dependencies.

Optional Dependencies

This library uses optional dependencies for additional features. See the pyproject.toml for the list of optional library tags. To install with GUI support, use the gui tag.

Library

Install this repo as a library into another project.

...with uv:

uv add "llm_server @ git+https://github.com/EricApgar/llm-server"

...with optional libraries:

uv add "llm_server[gui] @ git+https://github.com/EricApgar/llm-server"

...with pip:

pip install "llm_server @ git+https://github.com/EricApgar/llm-server"

...with optional libraries:

pip install "llm_server[gui] @ git+https://github.com/EricApgar/llm-server"

Repo

Run locally for development of this repo. Create a virtual environment and then install the dependencies into the environment.

...with uv:

uv sync

...with optional libraries:

uv sync --extra gui

...with pip:

pip install -e "."

...with optional libraries:

pip install -e ".[gui]"

Hardware

Running an LLM requires an NVIDIA GPU with ideally a large number of TOPS. See the large-language-model repo for hardware details and GPU driver setup.

Usage

Create a server, register a model with a tag, load it, and start serving. Then send a request and receive a response.

Server

import llm_server


server = llm_server.Server()
server.set_host(ip_address='127.0.0.1', port=8000)
server.add_model(tag='gpt', name='openai/gpt-oss-20b')
server.load_model(tag='gpt', location=<path to model cache dir>)
server.start()  # Non-blocking.

server.stop()

Requester

URL = 'http://127.0.0.1:8001/ask'
details = {...}  # See request body examples below.

response = requests.post(URL, json=details, timeout=15)
data = response.json()
print(data['text'])

API Endpoints

Method	Endpoint	Description
GET	`/`	Health check — returns `"Running."`.
GET	`/get-models`	List all available hosted models and their tags.
GET	`/ask-test`	Send a test prompt (`"Tell me a joke."`) to the first available model.
POST	`/ask`	Send a prompt to a specific model.

POST /ask — Request Body

Text Example

{
    "tag": "gpt",
    "prompt": "Tell me a joke.",
    "max_tokens": 64,
    "temperature": 0.9
}

prompt may also be a Conversation formatted dict output (see llm-conversation).
- llm_conversation.Conversation.to_dict()

Image Example

{
    "tag": "Phi4",
    "prompt": "Describe the image.",
    "images": [llm_server.encode_image(<path to image>)],
    "max_tokens": 64,
}

temperature arg currently not supported for Phi-4-multimodal-instruct.
images accepts base64-encoded PNG strings for multimodal models. Encoder is provided by llm_server.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
assets		assets
scripts		scripts
src/llm_server		src/llm_server
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-server

Setup

Optional Dependencies

Library

...with uv:

...with pip:

Repo

...with uv:

...with pip:

Hardware

Usage

Server

Requester

API Endpoints

POST /ask — Request Body

Text Example

Image Example

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-server

Setup

Optional Dependencies

Library

...with uv:

...with pip:

Repo

...with uv:

...with pip:

Hardware

Usage

Server

Requester

API Endpoints

POST /ask — Request Body

Text Example

Image Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages