diff --git a/docs.json b/docs.json index d4261a9b6..bf9d4119c 100644 --- a/docs.json +++ b/docs.json @@ -412,7 +412,8 @@ "group": "K8s Install", "pages": [ "enterprise/k8s-install/index", - "enterprise/k8s-install/resource-limits" + "enterprise/k8s-install/resource-limits", + "enterprise/k8s-install/rate-limits" ] }, { diff --git a/enterprise/k8s-install/index.mdx b/enterprise/k8s-install/index.mdx index db70d66a8..cf4208ef0 100644 --- a/enterprise/k8s-install/index.mdx +++ b/enterprise/k8s-install/index.mdx @@ -50,9 +50,14 @@ OpenHands Enterprise consists of several components deployed as Kubernetes workl ## Guides - - Configure memory, CPU, and storage for optimal performance. - + + + Configure memory, CPU, and storage for optimal performance. + + + Configure per-API-key rate limits for the Runtime API. + + ## Request Access diff --git a/enterprise/k8s-install/rate-limits.mdx b/enterprise/k8s-install/rate-limits.mdx new file mode 100644 index 000000000..8a2c86a03 --- /dev/null +++ b/enterprise/k8s-install/rate-limits.mdx @@ -0,0 +1,494 @@ +--- +title: API Key Rate Limits +description: Configure per-API-key rate limits for the Runtime API +icon: gauge +--- + +This guide explains how to configure rate limits for the internal API key that +connects the OpenHands server to the Runtime API. This is an **administrator task** +typically performed after initial deployment if you need to enforce request limits. + +## Background + +OpenHands Enterprise uses an internal API key to authenticate requests between two +backend services: + +- **OpenHands Server** — the main application that users interact with +- **Runtime API** — the service that manages sandbox containers + +``` +Users → OpenHands Server → (internal API key) → Runtime API → Sandboxes +``` + +During installation, you created two Kubernetes secrets that hold the same key value: +- `sandbox-api-key` — used by the OpenHands Server +- `default-api-key` — used by the Runtime API + + + This internal API key is **not** the same as user API keys (which start with `sk-oh-`). + Users never see or interact with this internal key. + + +## Default Behavior + +By default, the internal API key has **no rate limit**. This means the OpenHands Server +can make unlimited requests to the Runtime API. + +You may want to add a rate limit if: +- You're experiencing resource contention in the Runtime API +- You want to prevent runaway automation from overwhelming the system +- You need to enforce fair usage across multiple OpenHands Server instances + +## How Rate Limiting Works + +When configured, rate limiting is enforced per API key using a **fixed window** strategy: + +1. Each API key can have a `max_requests_per_minute` value +2. Requests are counted within each 60-second window +3. Requests exceeding the limit receive HTTP 429 (Too Many Requests) + +If `max_requests_per_minute` is not set (the default), no rate limiting is applied. + +## Configuring a Rate Limit + +We provide a script that handles all the steps: retrieving credentials from Kubernetes, +authenticating to the Runtime API, and updating the rate limit. + +### Prerequisites + +Before running the script, ensure you have: + +- **kubectl** configured with access to your OpenHands namespace + +That's it! The script runs entirely via `kubectl exec` inside the cluster, so you don't +need curl or python3 installed locally. + +### The Script + +Save this script as `set-rate-limit.sh` and make it executable with `chmod +x set-rate-limit.sh`: + +```bash +#!/bin/bash +# +# set-rate-limit.sh +# +# Configure or check the rate limit for the internal API key used between +# the OpenHands Server and the Runtime API. +# +# This script runs commands inside the runtime-api pod using kubectl exec, +# so it works regardless of whether the Runtime API is exposed externally. +# +# Usage: +# ./set-rate-limit.sh --check # Check current rate limit +# ./set-rate-limit.sh # Set rate limit +# +# Examples: +# ./set-rate-limit.sh --check # Display current rate limit +# ./set-rate-limit.sh 500 # Set limit to 500 requests per minute +# ./set-rate-limit.sh null # Remove limit (allow unlimited) +# +# Prerequisites: +# - kubectl configured with access to the openhands namespace +# + +set -e + +# ============================================================================== +# Configuration +# ============================================================================== + +NAMESPACE="openhands" +RUNTIME_API_URL="http://localhost:5000" # Internal URL within the pod + +# ============================================================================== +# Parse command line arguments +# ============================================================================== + +if [ $# -lt 1 ]; then + echo "Usage: $0 [--check | ]" + echo "" + echo "Options:" + echo " --check Display the current rate limit without changing it" + echo "" + echo "Arguments:" + echo " rate-limit Requests per minute (integer), or 'null' to remove the limit" + echo "" + echo "Examples:" + echo " $0 --check # Check current rate limit" + echo " $0 500 # Set limit to 500 requests per minute" + echo " $0 null # Remove limit (allow unlimited requests)" + exit 1 +fi + +CHECK_ONLY=false +RATE_LIMIT="" + +if [ "$1" == "--check" ]; then + CHECK_ONLY=true + echo "Checking current rate limit..." +else + RATE_LIMIT="$1" + # Validate rate limit is either a number or "null" + if [ "$RATE_LIMIT" != "null" ] && ! [[ "$RATE_LIMIT" =~ ^[0-9]+$ ]]; then + echo "Error: rate-limit must be a positive integer or 'null'" + exit 1 + fi + echo "Rate limit to set: $RATE_LIMIT" +fi +echo "" + +# ============================================================================== +# Step 1: Find the runtime-api pod +# ============================================================================== + +echo "Step 1: Finding runtime-api pod..." + +# Get the name of a running runtime-api pod +POD=$(kubectl get pods -n "$NAMESPACE" -l app.kubernetes.io/name=runtime-api \ + -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) + +if [ -z "$POD" ]; then + echo "Error: Could not find a runtime-api pod in namespace '$NAMESPACE'" + echo "Make sure the runtime-api deployment is running." + exit 1 +fi + +echo " ✓ Found pod: $POD" + +# ============================================================================== +# Step 2: Retrieve the admin password from Kubernetes secrets +# ============================================================================== + +echo "Step 2: Retrieving admin password from Kubernetes secret..." + +# The admin password was created during installation and stored in the +# 'admin-password' secret in the openhands namespace +ADMIN_PASSWORD=$(kubectl get secret admin-password -n "$NAMESPACE" \ + -o jsonpath='{.data.admin-password}' | base64 -d) + +if [ -z "$ADMIN_PASSWORD" ]; then + echo "Error: Could not retrieve admin password from Kubernetes secret." + echo "Make sure the 'admin-password' secret exists in the '$NAMESPACE' namespace." + exit 1 +fi + +echo " ✓ Admin password retrieved" + +# ============================================================================== +# Step 3: Run the rate limit update inside the pod +# ============================================================================== + +# Determine the action description for output +if [ "$CHECK_ONLY" = true ]; then + echo "Step 3: Connecting to runtime-api pod and checking rate limit..." +else + echo "Step 3: Connecting to runtime-api pod and updating rate limit..." +fi + +# We'll execute a Python script inside the pod that: +# 1. Gets a challenge from the local API +# 2. Computes the PBKDF2 hash +# 3. Authenticates and gets a JWT token +# 4. Finds the default API key +# 5. Optionally updates its rate limit (if not --check mode) + +# Pass CHECK_ONLY and RATE_LIMIT to the Python script +# For check-only mode, we pass "CHECK" as the rate limit +if [ "$CHECK_ONLY" = true ]; then + RATE_LIMIT_ARG="'CHECK'" +else + RATE_LIMIT_ARG="$RATE_LIMIT" +fi + +kubectl exec -n "$NAMESPACE" "$POD" -- python3 -c " +import json +import hashlib +import binascii +import urllib.request +import urllib.error + +RUNTIME_API_URL = '$RUNTIME_API_URL' +ADMIN_PASSWORD = '''$ADMIN_PASSWORD''' +RATE_LIMIT_ARG = $RATE_LIMIT_ARG # This will be an int, None (from 'null'), or 'CHECK' + +CHECK_ONLY = RATE_LIMIT_ARG == 'CHECK' +RATE_LIMIT = None if CHECK_ONLY else RATE_LIMIT_ARG + +def api_request(path, method='GET', data=None, token=None): + \"\"\"Make an HTTP request to the Runtime API.\"\"\" + url = f'{RUNTIME_API_URL}{path}' + headers = {'Content-Type': 'application/json'} + if token: + headers['Authorization'] = f'Bearer {token}' + + req = urllib.request.Request(url, method=method, headers=headers) + if data: + req.data = json.dumps(data).encode('utf-8') + + try: + with urllib.request.urlopen(req) as response: + return json.loads(response.read().decode('utf-8')) + except urllib.error.HTTPError as e: + error_body = e.read().decode('utf-8') + raise Exception(f'HTTP {e.code}: {error_body}') + +# Step 3a: Get authentication challenge +print(' Getting authentication challenge...') +challenge_resp = api_request('/api/admin/challenge') +challenge = challenge_resp['challenge'] +salt = challenge_resp['salt'] + +# Step 3b: Compute PBKDF2 hash +# The salt is: salt + challenge (concatenated as strings, then UTF-8 encoded) +# Parameters: 10000 iterations, 32-byte output +combined_salt = (salt + challenge).encode('utf-8') +dk = hashlib.pbkdf2_hmac('sha256', ADMIN_PASSWORD.encode(), combined_salt, 10000, dklen=32) +hash_hex = binascii.hexlify(dk).decode() + +# Step 3c: Authenticate and get JWT token +print(' Authenticating...') +login_resp = api_request('/api/admin/login', method='POST', data={ + 'challenge': challenge, + 'hash': hash_hex +}) +token = login_resp['token'] +print(' ✓ Authentication successful') + +# Step 3d: Get all API keys and find the 'default' key +print(' Finding default API key...') +keys = api_request('/api/admin/api-keys', token=token) + +default_key = None +for key in keys: + if key.get('name') == 'default': + default_key = key + break + +if not default_key: + print(' Error: Could not find API key named \"default\"') + print(f' Available keys: {[k.get(\"name\") for k in keys]}') + exit(1) + +key_id = default_key['id'] +current_limit = default_key.get('max_requests_per_minute') +current_display = 'unlimited' if current_limit is None else current_limit +print(f' ✓ Found default key (ID: {key_id})') +print() +print('================================================') +print(f'Current rate limit: {current_display}') +print('================================================') + +# Step 3e: Update the rate limit (only if not in check-only mode) +if not CHECK_ONLY: + new_display = 'unlimited' if RATE_LIMIT is None else RATE_LIMIT + print() + print(f' Updating rate limit to {new_display}...') + + updated_key = api_request(f'/api/admin/api-keys/{key_id}', method='PUT', token=token, data={ + 'max_requests_per_minute': RATE_LIMIT + }) + + final_limit = updated_key.get('max_requests_per_minute') + final_display = 'unlimited' if final_limit is None else final_limit + print(f' ✓ Rate limit updated successfully') + print() + print('================================================') + print(f'New rate limit: {final_display}') + print('================================================') +" +``` + +### Usage Examples + +**Check the current rate limit:** + +```bash +./set-rate-limit.sh --check +``` + +**Set a rate limit of 500 requests per minute:** + +```bash +./set-rate-limit.sh 500 +``` + +**Remove the rate limit (allow unlimited requests):** + +```bash +./set-rate-limit.sh null +``` + +### Expected Output + +**Checking the current rate limit:** + +``` +Checking current rate limit... + +Step 1: Finding runtime-api pod... + ✓ Found pod: openhands-runtime-api-5d4f6b7c8d-x2k9m +Step 2: Retrieving admin password from Kubernetes secret... + ✓ Admin password retrieved +Step 3: Connecting to runtime-api pod and checking rate limit... + Getting authentication challenge... + Authenticating... + ✓ Authentication successful + Finding default API key... + ✓ Found default key (ID: 1) + +================================================ +Current rate limit: unlimited +================================================ +``` + +**Setting a rate limit:** + +``` +Rate limit to set: 500 + +Step 1: Finding runtime-api pod... + ✓ Found pod: openhands-runtime-api-5d4f6b7c8d-x2k9m +Step 2: Retrieving admin password from Kubernetes secret... + ✓ Admin password retrieved +Step 3: Connecting to runtime-api pod and updating rate limit... + Getting authentication challenge... + Authenticating... + ✓ Authentication successful + Finding default API key... + ✓ Found default key (ID: 1) + +================================================ +Current rate limit: unlimited +================================================ + + Updating rate limit to 500... + ✓ Rate limit updated successfully + +================================================ +New rate limit: 500 +================================================ +``` + +## Choosing a Rate Limit Value + +The appropriate rate limit depends on your usage patterns: + +| Scenario | Suggested Limit | +|----------|-----------------| +| Small team (< 10 concurrent users) | 200-300 req/min | +| Medium deployment (10-50 users) | 500-1000 req/min | +| Large deployment or heavy automation | 1000+ req/min | + + + Setting the limit too low can cause sandbox operations to fail with 429 errors. + Monitor your Runtime API logs after making changes. + + +## Troubleshooting + +### Checking Current Rate Limit Status + +View the Runtime API logs to see rate limit events: + +```bash +kubectl logs -l app.kubernetes.io/name=runtime-api -n openhands --tail=100 | grep -i "rate limit" +``` + +When a rate limit is exceeded, you'll see messages like: + +``` +Rate limit exceeded for default at /start +``` + +### Still Seeing Rate Limits After Upgrading? + +If you upgraded your deployment but are still experiencing 429 errors, the most likely +cause is that you're running an older version of the Runtime API that has **hardcoded +rate limits**. + +#### Background: Rate Limiting History + +Prior to Helm chart version **0.2.8**, the Runtime API had a hardcoded limit of +**100 requests per minute** on all endpoints. This was not configurable — every +deployment was subject to this limit regardless of settings. + +Starting with chart version **0.2.8** (image `sha-1a920e8`), rate limiting was changed to: +- **No rate limit by default** — the internal API key is created without a limit +- **Configurable per-key** — administrators can optionally set limits via the admin API + +| Chart Version | Image Tag | Rate Limiting Behavior | +|---------------|-----------|------------------------| +| 0.2.8 (latest) | `sha-1a920e8` | No limit by default, configurable | +| 0.2.6 - 0.2.7 | `sha-7857be8` | No limit by default, configurable | +| 0.2.1 - 0.2.5 | `sha-20ec8b3` | **Hardcoded 100 req/min** | +| Earlier | Various | **Hardcoded 100 req/min** | + +#### Step 1: Check Your Chart Version + +```bash +helm list -n openhands | grep runtime-api +``` + +If you're on a version older than 0.2.6, you need to upgrade to remove the hardcoded limits. + +#### Step 2: Check the Running Image + +Verify what image is actually running in your cluster: + +```bash +kubectl get deployment -n openhands -l app.kubernetes.io/name=runtime-api \ + -o jsonpath='{.items[*].spec.template.spec.containers[*].image}' +``` + +You should see `ghcr.io/openhands/runtime-api:sha-1a920e8` (or `sha-7857be8` or newer). + +If you see an older image tag (like `sha-20ec8b3` or earlier), you're running the old +code with hardcoded limits. + +#### Step 3: Check the Error Message Format + +The error message format tells you which version of rate limiting is active: + +- **Old (hardcoded)**: `Rate limit exceeded` (generic message from slowapi library) +- **New (configurable)**: `Rate limit exceeded: 500 per 1 minute` (includes the specific limit) + +If you see the old format, the new code isn't running yet. + +#### Step 4: Upgrade the Chart + +To get configurable rate limiting, upgrade to chart version 0.2.8 or later: + +```bash +helm repo update +helm upgrade runtime-api -n openhands \ + oci://ghcr.io/all-hands-ai/helm-charts/runtime-api \ + -f your-values.yaml +``` + +After upgrading, verify the new pods are running: + +```bash +kubectl rollout status deployment -n openhands -l app.kubernetes.io/name=runtime-api +``` + +### Common Issues + +**429 errors after setting a limit**: Your limit may be too low. Check the logs to see +how many requests are being made, then adjust the limit accordingly. + +**Authentication failures**: JWT tokens expire after 24 hours. If you get 401 errors, +repeat the authentication steps to get a new token. + +**"Admin functionality is disabled" error**: The `ADMIN_PASSWORD` environment variable +may not be set in the Runtime API deployment. Check the deployment configuration. + +## Related Configuration + + + + Configure memory, CPU, and storage limits for sandboxes. + + + Return to the Kubernetes installation overview. + +