Skip to content

feat: Add Google Cloud Vertex AI provider support#22

Open
itdove wants to merge 18 commits intoLobsterTrap:midstreamfrom
itdove:vertex-claude
Open

feat: Add Google Cloud Vertex AI provider support#22
itdove wants to merge 18 commits intoLobsterTrap:midstreamfrom
itdove:vertex-claude

Conversation

@itdove
Copy link
Copy Markdown

@itdove itdove commented Apr 7, 2026

Summary

Adds complete support for Google Cloud Vertex AI as an inference provider, enabling OpenShell sandboxes to use Claude models via GCP Vertex AI with OAuth authentication.

This implementation includes full end-to-end testing and supports both direct Claude CLI usage and inference routing via inference.local.

Features

Vertex AI Provider

  • Provider discovery: Auto-discovers Vertex AI credentials from environment
  • OAuth token generation: Generates tokens from GCP Application Default Credentials
  • Credential injection: Injects actual values (not placeholders) for CLI tool compatibility
  • Region support: Configurable region via ANTHROPIC_VERTEX_REGION (defaults to us-central1)
  • Auto-configuration: Sets CLAUDE_CODE_USE_VERTEX=1 automatically

Inference Routing

  • URL construction: Builds Vertex-specific URLs with project ID, region, and :streamRawPredict endpoint
  • Model field handling: Removes model from request body (Vertex expects it in URL path)
  • Bearer auth: Uses OAuth tokens as Bearer tokens
  • API version: Uses vertex-2023-10-16 Anthropic API version
  • Model ID format: Supports @ separator (e.g., claude-sonnet-4-5@20250929)

Direct Credential Injection

  • Selective injection: Credentials needed by CLI tools are injected as actual environment variables
  • Vertex credentials: ANTHROPIC_VERTEX_PROJECT_ID, VERTEX_OAUTH_TOKEN, CLAUDE_CODE_USE_VERTEX, ANTHROPIC_VERTEX_REGION
  • Security: Only credentials essential for CLI tool compatibility are directly injected
  • HTTP proxy resolution: Other credentials continue using openshell:resolve:env:* placeholders

Network Policy Support

  • Custom policies: Sandboxes require network policy allowing Google Cloud endpoints
  • OAuth endpoints: oauth2.googleapis.com, accounts.google.com
  • Vertex AI endpoints: Regional Vertex AI endpoints (*-aiplatform.googleapis.com)
  • Inference routing: inference.local endpoint for privacy-aware routing

Changes

Core Implementation

  • crates/openshell-providers/src/providers/vertex.rs - Vertex AI provider plugin with OAuth generation
  • crates/openshell-core/src/inference.rs - VERTEX_PROFILE with Bearer auth and vertex API version
  • crates/openshell-server/src/inference.rs - Vertex URL construction with project ID and region
  • crates/openshell-router/src/backend.rs - Critical fix: Removes model field from request body for Vertex AI
  • crates/openshell-sandbox/src/secrets.rs - Direct credential injection for CLI compatibility
  • crates/openshell-providers/Cargo.toml - Add gcp_auth dependency
  • crates/openshell-providers/src/lib.rs - Register vertex provider
  • crates/openshell-cli/src/main.rs - Add Vertex to provider type enum

Examples

  • examples/vertex-ai/sandbox-policy.yaml - New: Network policy for Vertex AI endpoints
  • examples/vertex-ai/README.md - New: Quick start guide with documentation references

Development Improvements

  • tasks/scripts/cluster-deploy-fast.sh - Bash 3 compatibility fix (replaces mapfile)
  • scripts/rebuild-cluster.sh - New: Quick rebuild script for development workflow
  • scripts/setup-podman-macos.sh - Increase default memory from 8 GB to 12 GB for better build performance
  • cleanup-openshell-podman-macos.sh - Improved cleanup with sandbox deletion

Documentation

  • docs/sandboxes/manage-providers.md - Updated Vertex provider documentation, removed OAuth limitation note
  • docs/inference/configure.md - Updated Vertex AI setup guide with OAuth token generation
  • docs/get-started/install-podman-macos.md - Added rebuild/cleanup workflow documentation
  • CONTRIBUTING.md - Added development rebuild workflow

Usage

Prerequisites

# Install Google Cloud SDK
brew install google-cloud-sdk

# Configure Application Default Credentials
gcloud auth application-default login

# Set project ID and region
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
export ANTHROPIC_VERTEX_REGION=us-east5  # Optional, defaults to us-central1

Quick Start

# Create provider
openshell provider create --name vertex --type vertex --from-existing

# Create sandbox with Vertex AI
openshell sandbox create --name vertex-test --provider vertex \
  --upload ~/.config/gcloud/:.config/gcloud/ \
  --policy examples/vertex-ai/sandbox-policy.yaml

# Use Claude CLI (automatically uses Vertex AI)
claude

Inference Routing (Optional)

# Configure inference routing
openshell inference set --provider vertex --model claude-sonnet-4-5@20250929 --no-verify

# Test inside sandbox
curl -X POST https://inference.local/v1/messages \
  -H "content-type: application/json" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 32,
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Testing

Fully tested end-to-end on macOS with:

  • Podman Machine (12 GB RAM)
  • GCP project with Vertex AI Claude models enabled
  • Application Default Credentials configured
  • Provider creation and OAuth token generation
  • Direct Claude CLI usage in sandboxes
  • Inference routing via inference.local
  • Network policy enforcement
  • All regional Vertex AI endpoints

Key Test Results:

  • ✅ OAuth token generation from ADC
  • ✅ Credential injection into sandboxes
  • ✅ Claude CLI auto-detects Vertex AI
  • ✅ Inference routing removes model field correctly
  • ✅ Vertex API responds with successful completions

Technical Details

Router Fix (Critical)

The router was incorrectly inserting the model field into request bodies for all providers. Vertex AI's :streamRawPredict endpoint expects the model in the URL path, not the request body, causing "Extra inputs are not permitted" errors.

Fix: Router now detects Vertex AI endpoints (aiplatform.googleapis.com) and removes the model field from the request body while keeping it in the URL path.

Credential Flow

  1. User configures GCP Application Default Credentials
  2. Provider plugin generates OAuth token from ADC at creation time
  3. Credentials are stored in gateway database
  4. When creating sandboxes, credentials are injected as actual environment variables
  5. CLI tools (claude) automatically detect Vertex AI via CLAUDE_CODE_USE_VERTEX=1
  6. OAuth tokens are refreshed from the uploaded ~/.config/gcloud/ directory

URL Structure

Vertex AI requests go to:

https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/anthropic/models/{model}:streamRawPredict

The router constructs this URL and removes the model field from the JSON body.

Development Workflow

Rebuilding After Changes

# Quick rebuild for testing code changes
bash scripts/rebuild-cluster.sh

# Recreate provider and sandbox after rebuild
openshell provider create --name vertex --type vertex --from-existing
openshell sandbox create --name test --provider vertex \
  --upload ~/.config/gcloud/:.config/gcloud/ \
  --policy examples/vertex-ai/sandbox-policy.yaml

Breaking Changes

None. All changes are additive.

Related Issues

Addresses the need for Vertex AI provider support for users who:

  • Need to use Claude via GCP Vertex AI for billing/compliance
  • Want to use existing GCP credentials and infrastructure
  • Require OAuth-based authentication instead of API keys
  • Work in organizations with GCP-only AI policies

Checklist

  • Code follows project style guidelines
  • Tests pass (cargo check and cargo test succeed)
  • End-to-end testing completed successfully
  • Documentation updated
  • Example policy and README added
  • Commit messages follow Conventional Commits format
  • No secrets or credentials committed
  • Router fix verified with live Vertex AI endpoints
  • Network policy tested and working
  • OAuth token generation tested

itdove added 10 commits April 6, 2026 17:20
- Add vertex provider plugin with ANTHROPIC_VERTEX_PROJECT_ID credential
- Add vertex inference profile with Anthropic-compatible protocols
- Register vertex in provider registry and CLI
- Add vertex to supported inference provider types
- Fix scripts/podman.env to use correct env var names for local registry
- Update docs for simplified CLI install workflow

Known limitation: GCP OAuth authentication not yet implemented.
Vertex provider can be created and configured but API calls will fail
until OAuth token generation is added.
- Note that mise run cluster:build:full builds AND starts the gateway
- Add verification step after build completes
- Clarify that gateway is already running before sandbox creation
- Add vertex to supported provider types table in manage-providers.md
- Add Vertex AI provider tab in inference configuration docs
- Clarify two usage modes: direct API calls vs inference.local routing
- Document prerequisites (GCP project, Application Default Credentials)
- Note OAuth limitation only affects inference routing, not direct calls
- Keep Vertex docs in provider/inference pages, not installation guides
- Add gcp_auth dependency for OAuth token generation
- Generate OAuth tokens from Application Default Credentials in vertex provider
- Store tokens as VERTEX_OAUTH_TOKEN credential for router authentication
- Update inference profile to use Bearer auth with OAuth tokens
- Construct Vertex-specific URLs with :streamRawPredict endpoint
- Support project ID from credentials for URL construction
- Add model parameter to build_backend_url for Vertex routing
Avoid tokio runtime nesting panic by spawning OAuth token generation
in a separate OS thread with its own runtime. This allows provider
discovery to work when called from within an existing tokio context.
…r ordering

- Delete all sandboxes before destroying gateway
- Explicitly stop and remove cluster and registry containers by name
- Remove images by specific tags (localhost/openshell/*)
- Run cargo clean for build artifacts
- Add reinstall instructions to completion message
- Better error handling with 2>/dev/null redirects
…iables

Add selective direct injection for provider credentials that need to be
accessible as real environment variables (not placeholders). This allows
tools like `claude` CLI to read Vertex AI credentials directly.

Changes:
- Add direct_inject_credentials() list for credentials requiring direct access
- Modify from_provider_env() to support selective direct injection
- Inject ANTHROPIC_VERTEX_PROJECT_ID, VERTEX_OAUTH_TOKEN, and
  ANTHROPIC_VERTEX_REGION as actual values instead of placeholders
- Other credentials continue using openshell:resolve:env:* placeholders
  for HTTP proxy resolution

Security note: Directly injected credentials are visible via /proc/*/environ,
unlike placeholder-based credentials which are only resolved within HTTP
requests. Only credentials essential for CLI tool compatibility are included.
- Add CLAUDE_CODE_USE_VERTEX to direct injection list
- Automatically set CLAUDE_CODE_USE_VERTEX=1 in Vertex provider credentials
- Enables claude CLI to auto-detect Vertex AI without manual config

Now sandboxes with Vertex provider will automatically have:
- ANTHROPIC_VERTEX_PROJECT_ID (from env)
- VERTEX_OAUTH_TOKEN (generated from GCP ADC)
- CLAUDE_CODE_USE_VERTEX=1 (auto-set)

The claude CLI can now use Vertex AI with zero manual configuration.
…rmance

- Change Podman machine default memory from 8 GB to 12 GB
- Update documentation to reflect 12 GB default
- Update troubleshooting to suggest 16 GB for build issues

12 GB provides better performance for Rust compilation and reduces
out-of-memory issues during parallel builds.
Replace manual 'cargo build + cp' with 'cargo install --path'
Add verification step with 'openshell gateway info'
Keep correct 'mise run cluster:build:full' command
Vertex AI's :streamRawPredict endpoint expects the model in the URL
path, not in the request body. The router was incorrectly inserting
the model field, causing "Extra inputs are not permitted" errors.

Changes:
- Router now detects Vertex AI endpoints and removes model field
- Added bash 3 compatibility fix for cluster-deploy-fast.sh
- Added scripts/rebuild-cluster.sh for development workflow
- Updated documentation for Vertex AI setup and rebuild process

Fixes inference routing to Vertex AI via inference.local endpoint.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 267788ed-44a5-4208-9b9a-d84db885ddc7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

itdove and others added 7 commits April 6, 2026 23:22
Added examples/vertex-ai/ directory with:
- sandbox-policy.yaml: Network policy for Vertex AI endpoints
- README.md: Quick start guide with links to full documentation

Provides ready-to-use policy file for Vertex AI integration.
Podman does not support --push flag in build command like Docker buildx.
This commit fixes two issues:

1. docker-build-image.sh: Filter out --push flag and execute push as
   separate command after build completes

2. docker-publish-multiarch.sh: Use safe array expansion syntax to avoid
   unbound variable errors with set -u when EXTRA_TAGS is empty

Note: Multi-arch builds with Podman still require manual workflow due to
cross-compilation toolchain issues. Use /tmp/build-multiarch-local.sh
for local multi-arch builds with QEMU emulation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…h.sh

Add Podman-specific multi-architecture build logic to complement existing
Docker buildx support. Podman builds each platform sequentially using
manifest lists, while Docker buildx builds in parallel.

Changes:
- Detect Podman and use manifest-based approach for multi-arch builds
- Build each platform (arm64, amd64) separately with explicit TARGETARCH
- Create and push manifest list combining all architectures
- Preserve existing Docker buildx workflow unchanged
- Add informative logging about sequential vs parallel builds

Build times:
- Podman: Sequential builds (~30-40 min on Linux, ~45-60 min on macOS)
- Docker buildx: Parallel builds (~20-30 min)

This enables multi-arch image publishing on systems using Podman as the
container runtime, supporting both Apple Silicon and Intel architectures.
Fix CI formatting check failures:
- Split long .insert() calls across multiple lines
- Reformat MockDiscoveryContext initialization

No functional changes, formatting only.
Remove short-lived OAuth token generation and storage in gateway database.
Tokens are now generated on-demand inside sandboxes from uploaded ADC files.

Changes:
- Remove generate_oauth_token() function and gcp_auth dependency
- Remove VERTEX_OAUTH_TOKEN from direct credential injection
- Remove OAuth token insertion in discover_existing()
- Add unset IMAGE_TAG/TAG_LATEST in podman.env to prevent build conflicts
- Update Cargo.lock to remove gcp_auth dependency tree

Benefits:
- No stale token pollution in database
- Tokens generated fresh on-demand (auto-refresh via ADC)
- Simpler provider creation (synchronous, no async OAuth)
- Reduced dependency footprint (removes 32 packages)
- Better security (tokens not persisted in database)

Token lifecycle:
- Provider stores only ANTHROPIC_VERTEX_PROJECT_ID and region
- Sandboxes require --upload ~/.config/gcloud/ for token generation
- Claude CLI uses gcp_auth to generate/refresh tokens from ADC
- Tokens valid for 1 hour, automatically refreshed via refresh token
- Check for ADC in both GOOGLE_APPLICATION_CREDENTIALS and default location
- Add critical warning about --upload ~/.config/gcloud/ requirement
- Document security model for credential injection strategy
- Add comprehensive troubleshooting section with solutions for:
  - Authentication failures (missing ADC)
  - Project not found errors
  - Region not supported errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant