Mozilla LLM Proxy Auth (MLPA)

A proxy to verify App Attest/FxA payloads and proxy requests through LiteLLM to enact budgets and per user management.

Setup

make setup

This creates a virtual environment in .venv/, installs dependencies, and installs the tool locally in editable mode.

Running MLPA locally with Docker

Run LiteLLM and PostgreSQL

docker compose -f litellm_docker_compose.yaml up -d

Create and migrate appattest database

sh ./scripts/create-app-attest-database.sh
alembic upgrade head
Set MLPA_DEBUG=true in the config.py or .env file

Create a virtual LiteLLM key

Run python scripts/create-and-set-virtual-key.py (also sets the value in .env)

Run MLPA

Install it as a library:

pip install --no-cache-dir -e .

Run the binary

mlpa

Navigate to

Config (see LiteLLM Documentation for more config options)

See `config.py` for all configuration variables

Also See `litellm_config.yaml` for litellm config

Service account configured to hit VertexAI: service_account.json should be in directory root

API Documentation

After running, Swagger can be viewed at http://localhost:<PORT>/api/docs

Useful Prometheus queries

Metric Description	Query
Total requests (RPS)	`sum(rate(requests_total{endpoint!~"/metrics"}[5m]))`
Requests per endpoint (RPS)	`sum by (method, endpoint) (rate(requests_total{endpoint!~"/metrics"}[5m]))`
Requests currently in progress	`sum(in_progress_requests{endpoint!~"/metrics"})`
Response status codes (RPS)	`sum by (status_code) (rate(response_status_codes_total[5m]))`
Error rate (5xx)	`sum(rate(response_status_codes_total{status_code=~"5.."}[5m])) / sum(rate(requests_total{endpoint!~"/metrics"}[5m]))`
Overall average request latency	`sum(rate(request_latency_seconds_sum{endpoint!~"/metrics"}[5m])) / sum(rate(request_latency_seconds_count{endpoint!~"/metrics"}[5m]))`
Average request latency per endpoint	`sum by (method, endpoint) (rate(request_latency_seconds_sum{endpoint!~"/metrics"}[5m])) / sum by (method, endpoint) (rate(request_latency_seconds_count{endpoint!~"/metrics "}[5m]))`
Challenge validation latency	`rate(validate_challenge_latency_seconds_sum[5m]) / rate(validate_challenge_latency_seconds_count[5m])`
App Attest auth latency by result	`sum by (result) (rate(validate_app_attest_latency_seconds_sum[5m])) / sum by (result) (rate(validate_app_attest_latency_seconds_count[5m]))`
App Assert auth latency by result	`sum by (result) (rate(validate_app_assert_latency_seconds_sum[5m])) / sum by (result) (rate(validate_app_assert_latency_seconds_count[5m]))`
FxA authentication latency by result	`sum by (result) (rate(validate_fxa_latency_seconds_sum[5m])) / sum by (result) (rate(validate_fxa_latency_seconds_count[5m]))`
Chat completion latency by result	`sum by (result) (rate(chat_completion_latency_seconds_sum[5m])) / sum by (result) (rate(chat_completion_latency_seconds_count[5m]))`
Time to first token (TTFT)	`rate(chat_completion_ttft_seconds_sum[5m]) / rate(chat_completion_ttft_seconds_count[5m])`
Tokens per chat request by type	`sum(rate(chat_tokens_total[5m])) by (type) / on() group_left() sum(rate(chat_completion_latency_seconds_count[5m]))`
Total tokens per chat request	`sum(rate(chat_tokens_total[5m])) / sum(rate(chat_completion_latency_seconds_count[5m]))`

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
alembic		alembic
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Apple_App_Attestation_Root_CA.pem		Apple_App_Attestation_Root_CA.pem
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
alembic.ini		alembic.ini
litellm_config.yaml		litellm_config.yaml
litellm_docker_compose.yaml		litellm_docker_compose.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mozilla LLM Proxy Auth (MLPA)

Setup

Running MLPA locally with Docker

Run LiteLLM and PostgreSQL

Create and migrate appattest database

Create a virtual LiteLLM key

Run MLPA

Config (see LiteLLM Documentation for more config options)

See `config.py` for all configuration variables

Also See `litellm_config.yaml` for litellm config

API Documentation

Useful Prometheus queries

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

Firefox-AI/MLPA

Folders and files

Latest commit

History

Repository files navigation

Mozilla LLM Proxy Auth (MLPA)

Setup

Running MLPA locally with Docker

Run LiteLLM and PostgreSQL

Create and migrate appattest database

Create a virtual LiteLLM key

Run MLPA

Config (see LiteLLM Documentation for more config options)

See config.py for all configuration variables

Also See litellm_config.yaml for litellm config

API Documentation

Useful Prometheus queries

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

See `config.py` for all configuration variables

Also See `litellm_config.yaml` for litellm config

Packages