Classic frontend build/master by MpcOS77 · Pull Request #1 · Bct-crypto/AutoGPT

MpcOS77 · 2026-03-13T00:03:56Z

Changes 🏗️

Checklist 📋

For code changes:

I have clearly listed my changes in the PR description
I have made a test plan
I have tested my changes according to the test plan:
- ...

Example test plan

Create from scratch and execute an agent with at least 3 blocks
Import an agent from file upload, and confirm it executes correctly
Upload agent to marketplace
Import an agent from marketplace and confirm it executes correctly
Edit an agent from monitor, and confirm it executes correctly

For configuration changes:

.env.example is updated or already compatible with my changes
docker-compose.yml is updated or already compatible with my changes
I have included a list of my configuration changes in the PR description (under Changes)

Examples of configuration changes

Changing ports
Adding new services that need to communicate with each other
Secrets or environment variable changes
New or infrastructure changes such as databases

…ensitive actions (Significant-Gravitas#11756) ## Summary This PR introduces two explicit safe mode toggles for controlling agent execution behavior, providing clearer and more granular control over when agents should pause for human review. ### Key Changes **New Safe Mode Settings:** - **`human_in_the_loop_safe_mode`** (bool, default `true`) - Controls whether human-in-the-loop (HITL) blocks pause for review - **`sensitive_action_safe_mode`** (bool, default `false`) - Controls whether sensitive action blocks pause for review **New Computed Properties on LibraryAgent:** - `has_human_in_the_loop` - Indicates if agent contains HITL blocks - `has_sensitive_action` - Indicates if agent contains sensitive action blocks **Block Changes:** - Renamed `requires_human_review` to `is_sensitive_action` on blocks for clarity - Blocks marked as `is_sensitive_action=True` pause only when `sensitive_action_safe_mode=True` - HITL blocks pause when `human_in_the_loop_safe_mode=True` **Frontend Changes:** - Two separate toggles in Agent Settings based on block types present - Toggle visibility based on `has_human_in_the_loop` and `has_sensitive_action` computed properties - Settings cog hidden if neither toggle applies - Proper state management for both toggles with defaults **AI-Generated Agent Behavior:** - AI-generated agents set `sensitive_action_safe_mode=True` by default - This ensures sensitive actions are reviewed for AI-generated content ## Changes **Backend:** - `backend/data/graph.py` - Updated `GraphSettings` with two boolean toggles (non-optional with defaults), added `has_sensitive_action` computed property - `backend/data/block.py` - Renamed `requires_human_review` to `is_sensitive_action`, updated review logic - `backend/data/execution.py` - Updated `ExecutionContext` with both safe mode fields - `backend/api/features/library/model.py` - Added `has_human_in_the_loop` and `has_sensitive_action` to `LibraryAgent` - `backend/api/features/library/db.py` - Updated to use `sensitive_action_safe_mode` parameter - `backend/executor/utils.py` - Simplified execution context creation **Frontend:** - `useAgentSafeMode.ts` - Rewritten to support two independent toggles - `AgentSettingsModal.tsx` - Shows two separate toggles - `SelectedSettingsView.tsx` - Shows two separate toggles - Regenerated API types with new schema ## Test Plan - [x] All backend tests pass (Python 3.11, 3.12, 3.13) - [x] All frontend tests pass - [x] Backend format and lint pass - [x] Frontend format and lint pass - [x] Pre-commit hooks pass --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>

Add new LLM Picker for the new Builder. ### Changes 🏗️ - Enrich `LlmModelMeta` (in `llm.py`) with human readable model, creator and provider names and price tier (note: this is temporary measure and all LlmModelMeta will be removed completely once LLM Registry is ready) - Add provider icons - Add custom input field `LlmModelField` and its components&helpers ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] LLM model picker works correctly in the new Builder - [x] Legacy LLM model picker works in the old Builder

…cant-Gravitas#11812) ## Summary - Fixes AUTOGPT-SERVER-76H - Error parsing LibraryAgent from database due to null values in GraphSettings fields - When parsing LibraryAgent settings from the database, null values for `human_in_the_loop_safe_mode` and `sensitive_action_safe_mode` were causing Pydantic validation errors - Adds `BeforeValidator` annotations to coerce null values to their defaults (True and False respectively) ## Test plan - [x] Verified with unit tests that GraphSettings can now handle None/null values - [x] Backend tests pass - [x] Manually tested with all scenarios (None, empty dict, explicit values)

…ificant-Gravitas#11815) ## Changes 🏗️ On the **Old Builder**, when running an agent... ### Before <img width="800" height="614" alt="Screenshot 2026-01-21 at 21 27 05" src="https://github.com/user-attachments/assets/a3b2ec17-597f-44d2-9130-9e7931599c38" /> Credentials are there, but it is not recognising them, you need to click on them to be selected ### After <img width="1029" height="728" alt="Screenshot 2026-01-21 at 21 26 47" src="https://github.com/user-attachments/assets/c6e83846-6048-439e-919d-6807674f2d5a" /> It uses the new credentials UI and correctly auto-selects existing ones. ### Other Fixed a small timezone display glitch on the new library view. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run agent in old builder - [x] Credentials are auto-selected and using the new collapsed system credentials UI

…res (Significant-Gravitas#11817) ## Summary Adds graceful error handling to AsyncRedisEventBus and RedisEventBus so that connection failures log exceptions with full traceback while remaining non-breaking. This allows DatabaseManager to operate without Redis connectivity. ## Problem DatabaseManager was failing with "Authentication required" when trying to publish notifications via AsyncRedisNotificationEventBus. The service has no Redis credentials configured, causing `increment_onboarding_runs` to fail. ## Root Cause When `increment_onboarding_runs` publishes a notification: 1. Calls `AsyncRedisNotificationEventBus().publish()` 2. Attempts to connect to Redis via `get_redis_async()` 3. Connection fails due to missing credentials 4. Exception propagates, failing the entire DB operation Previous fix (Significant-Gravitas#11775) made the cache module lazy, but didn't address the notification bus which also requires Redis. ## Solution Wrap Redis operations in try-except blocks: - `publish_event`: Logs exception with traceback, continues without publishing - `listen_events`: Logs exception with traceback, returns empty generator - `wait_for_event`: Returns None on connection failure Using `logger.exception()` instead of `logger.warning()` ensures full stack traces are captured for debugging while keeping operations non-breaking. This allows services to operate without Redis when only using event bus for non-critical notifications. ## Changes - Modified `backend/data/event_bus.py`: - Added graceful error handling to `RedisEventBus` and `AsyncRedisEventBus` - All Redis operations now catch exceptions and log with `logger.exception()` - Added `backend/data/event_bus_test.py`: - Tests verify graceful degradation when Redis is unavailable - Tests verify normal operation when Redis is available ## Test Plan - [x] New tests verify graceful degradation when Redis unavailable - [x] Existing notification tests still pass - [x] DatabaseManager can increment onboarding runs without Redis ## Related Issues Fixes https://significant-gravitas.sentry.io/issues/7205834440/ (AUTOGPT-SERVER-76D)

…ficant-Gravitas#11818) ## Summary - Remove explicit schema qualification (`{schema}.vector` and `OPERATOR({schema}.<=>)`) from pgvector queries in `embeddings.py` and `hybrid_search.py` - Use unqualified `::vector` type cast and `<=>` operator which work because pgvector is in the search_path on all environments ## Problem The previous approach tried to explicitly qualify the vector type with schema names, but this failed because: - **CI environment**: pgvector is in `public` schema → `platform.vector` doesn't exist - **Dev (Supabase)**: pgvector is in `platform` schema → `public.vector` doesn't exist ## Solution Use unqualified `::vector` and `<=>` operator. PostgreSQL resolves these via `search_path`, which includes the schema where pgvector is installed on all environments. Tested on both local and dev environments with a test script that verified: - ✅ Unqualified `::vector` type cast - ✅ Unqualified `<=>` operator in ORDER BY - ✅ Unqualified `<=>` in SELECT (similarity calculation) - ✅ Combined query patterns matching actual usage ## Test plan - [ ] CI tests pass - [ ] Marketplace approval works on dev after deployment Fixes: AUTOGPT-SERVER-763, AUTOGPT-SERVER-764, AUTOGPT-SERVER-76B

### Changes 🏗️ <img width="1920" height="998" alt="Screenshot 2026-01-19 at 22 14 51" src="https://github.com/user-attachments/assets/ecd1c241-6f77-4702-9774-5e58806b0b64" /> This PR lays the groundwork for the new UX of AutoGPT Copilot. - moves the Copilot to its own route `/copilot` - Makes the Copilot the homepage when enabled - Updates the labelling of the homepage icons - Makes the Library the homepage when Copilot is disabled - Improves Copilot's: - session handling - styles and UX - message parsing ### Other improvements - Improve the log out UX by adding a new `/logout` page and using a re-direct ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run locally and test the above  --- > [!NOTE] > Launches the new Copilot experience and aligns API behavior with the UI. > > - **Routing/Home**: Add `/copilot` with `CopilotShell` (desktop sidebar + mobile drawer), make homepage route flag-driven; update login/signup/error redirects and root page to use `getHomepageRoute`. > - **Chat UX**: Replace legacy chat with `components/contextual/Chat/*` (new message list, bubbles, tool call/response formatting, stop button, initial-prompt handling, refined streaming/error handling); remove old platform chat components. > - **Sessions**: Add paginated session list (infinite load), auto-select/create logic, mobile/desktop navigation, and improved session fetching/claiming guards. > - **Auth/Logout**: New `/logout` flow with delayed redirect; gate various queries on auth state and logout-in-progress. > - **Backend**: `GET /api/chat/sessions/{id}` returns `null` instead of 404; service saves assistant message on `StreamFinish` to avoid loss and prevents duplicate saves; OpenAPI updated accordingly. > - **Misc**: Minor UI polish in library modals, loader styling, docs (CONTRIBUTING) additions, and small formatting fixes in block docs generator. > > Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 1b4776d. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).

…TL (Significant-Gravitas#11813) ### Changes 🏗️ - Added Vitest and React Testing Library for frontend unit testing - Configured MSW (Mock Service Worker) for API mocking in tests - Created test utilities and setup files for integration tests - Added comprehensive testing documentation in `AGENTS.md` - Updated Orval configuration to generate MSW mock handlers - Added mock server and browser implementations for development testing ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run `pnpm test:unit` to verify tests pass - [x] Verify MSW mock handlers are generated correctly - [x] Check that test utilities work with sample component tests #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**)

…nt-Gravitas#11820) ### Changes 🏗️ - Renamed the `test` job to `e2e_test` in the CI workflow for better clarity - Added a new `integration_test` job to the CI workflow that runs unit tests using `pnpm test:unit` - Created a basic integration test for the MainMarketplacePage component to verify CI functionality ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified the CI workflow runs both e2e and integration tests - [x] Confirmed the integration test for MainMarketplacePage passes #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes

…ignificant-Gravitas#11822) ### Changes 🏗️ This PR includes two database migration fixes: #### 1. Remove redundant Supabase extensions migration Removes the `20260112173500_add_supabase_extensions_to_platform_schema` migration which was attempting to manage Supabase-provided extensions and schemas. **What was removed:** - Migration that created extensions (pgcrypto, uuid-ossp, pg_stat_statements, pg_net, pgjwt, pg_graphql, pgsodium, supabase_vault) - Schema creation for these extensions **Why it was removed:** - These extensions and schemas are pre-installed and managed by Supabase automatically - The migration was redundant and could cause schema drift warnings - Attempting to manage Supabase-owned resources in our migrations is an anti-pattern #### 2. Fix pgvector extension schema handling Improves the `20260109181714_add_docs_embedding` migration to handle cases where pgvector exists in the wrong schema. **Problem:** - If pgvector was previously installed in `public` schema, `CREATE EXTENSION IF NOT EXISTS` would succeed but not actually install it in the `platform` schema - This causes `type "vector" does not exist` errors because the type isn't in the search_path **Solution:** - Detect if vector extension exists in a different schema than the current one - Drop it with CASCADE and reinstall in the correct schema (platform) - Use dynamic SQL with `EXECUTE format()` to explicitly specify the target schema - Split exception handling: catch errors during removal, but let installation fail naturally with clear PostgreSQL errors **Impact:** - No functional changes - Supabase continues to provide extensions as before - pgvector now correctly installs in the platform schema - Cleaner migration history - Prevents schema-related errors ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified migrations run successfully without the redundant file - [x] Confirmed Supabase extensions are still available - [x] Tested pgvector migration handles wrong-schema scenario - [x] No schema drift warnings #### For configuration changes: - [x] .env.default is updated or already compatible with my changes - [x] docker-compose.yml is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) - N/A - No configuration changes required

Significant-Gravitas#11825)  we met some reality when merging into the docs site but this fixes it ### Changes 🏗️ updates paths, adds some guides  update to match reality ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan:  - [x] deploy it and validate  --- > [!NOTE] > Aligns block integrations documentation with GitBook. > > - Changes generator default output to `docs/integrations/block-integrations` and writes overview `README.md` and `SUMMARY.md` at `docs/integrations/` > - Adds GitBook frontmatter and hint syntax to overview; prefixes block links with `block-integrations/` > - Introduces `generate_summary_md` to build GitBook navigation (including optional `guides/`) > - Preserves per-block manual sections and adds optional `extras` + file-level `additional_content` > - Updates sync checker to validate parent `README.md` and `SUMMARY.md` > - Rewrites `docs/integrations/README.md` with GitBook frontmatter and updated links; adds `docs/integrations/SUMMARY.md` > - Adds new guides: `guides/llm-providers.md`, `guides/voice-providers.md` > > Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit fdb7ff8. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).  --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>

… in E2B sandbox (Significant-Gravitas#11761) Introduces a new ClaudeCodeBlock that enables execution of coding tasks using Anthropic's Claude Code in an E2B sandbox. This block unlocks powerful agentic coding capabilities - Claude Code can autonomously create files, install packages, run commands, and build complete applications within a secure sandboxed environment. Changes 🏗️ - New file backend/blocks/claude_code.py: - ClaudeCodeBlock - Execute tasks using Claude Code in an E2B sandbox - Dual credential support: E2B API key (sandbox) + Anthropic API key (Claude Code) - Session continuation support via session_id, sandbox_id, and conversation_history - Automatic file extraction with path, relative_path, name, and content fields - Configurable timeout, setup commands, and working directory - dispose_sandbox option to keep sandbox alive for multi-turn conversations Checklist 📋 For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Create and execute ClaudeCodeBlock with a simple prompt ("Create a hello world HTML file") - [x] Verify files output includes correct path, relative_path, name, and content - [x] Test session continuation by passing session_id and sandbox_id back - [x] Build "Any API → Instant App" demo agent combining Firecrawl + ClaudeCodeBlock + GitHub blocks - [x] Verify generated files are pushed to GitHub with correct folder structure using relative_path Here are two example agents i made that can be used to test this agent, they require github, anthropic and e2b access via api keys that are set via the user/on the platform is testing on dev The first agent is my Any API → Instant App "Transform any API documentation into a fully functional web application. Just provide a docs URL and get a complete, ready-to-deploy app pushed to a new GitHub repository." [Any API → Instant App_v36.json](https://github.com/user-attachments/files/24600326/Any.API.Instant.App_v36.json) The second agent is my Idea to project "Simply enter your coding project's idea and this agent will make all of the base initial code needed for you to start working on that project and place it on github for you!" [Idea to project_v11.json](https://github.com/user-attachments/files/24600346/Idea.to.project_v11.json) If you have any questions or issues let me know. References https://e2b.dev/blog/python-guide-run-claude-code-in-an-e2b-sandbox https://github.com/e2b-dev/e2b-cookbook/tree/main/examples/anthropic-claude-code-in-sandbox-python https://code.claude.com/docs/en/cli-reference I tried to use E2b's "anthropic-claude-code" template but it kept complaining it was out of date, so I make it manually spin up a E2b instance and make it install the latest claude code and it uses that

…1827) ## Changes 🏗️ - Make the loading UX better when switching between chats or loading a new chat - Make session/chat management logic more manageable - Improving "Deep thinking" loading states - Fix bug that happened when returning to chat after navigating away ## Checklist 📋 ### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run the app locally and test the above

…try spam (Significant-Gravitas#11832) ## Summary Refactors error handling in the embedding service to prevent Sentry alert spam. Previously, batch operations would log one error per failed file, causing hundreds of duplicate alerts. Now, exceptions bubble up from individual functions and are aggregated at the batch level, producing a single log entry showing all unique error types with counts. ## Changes ### Removed Error Swallowing - Removed try/except blocks from `generate_embedding()`, `store_content_embedding()`, `ensure_content_embedding()`, `get_content_embedding()`, and `ensure_embedding()` - These functions now raise exceptions instead of returning None/False on failure - Added docstring notes: "Raises exceptions on failure - caller should handle" ### Improved Batch Error Aggregation - Updated `backfill_all_content_types()` to aggregate unique errors - Collects all exceptions from batch results - Groups by error type and message, shows counts - Single log entry per content type instead of per-file ### Example Output Before: 50 separate error logs for same issue After: `BLOCK: 50/100 embeddings failed. Errors: PrismaError: type vector does not exist (50x)` ## Motivation This was triggered by the AUTOGPT-SERVER-7D2 Sentry issue where pgvector errors created hundreds of duplicate alerts. Even after the root cause was fixed (stale database connections), the error logging pattern would create spam for any future issues. ## Impact - ✅ Reduces Sentry noise - single alert per batch instead of per-file - ✅ Better diagnostics - shows all unique error types with counts - ✅ Cleaner code - removed ~24 lines of unnecessary error swallowing - ✅ Proper exception propagation follows Python best practices ## Testing - Existing tests should pass (error handling moved to batch level) - Error aggregation logic tested via asyncio.gather(return_exceptions=True) ## Related Issues - Fixes Sentry alert spam from AUTOGPT-SERVER-7D2

… popup, and race condition fixes (Significant-Gravitas#11810) ## Summary This PR implements comprehensive improvements to the human-in-the-loop (HITL) review system, including safety features, architectural changes, and bug fixes: ### Key Features - **SECRT-1798: One-time safety popup** - Shows informational popup before first run of AI-generated agents with sensitive actions/HITL blocks - **SECRT-1795: Auto-approval toggle UX** - Toggle in pending reviews panel to auto-approve future actions from the same node - **Node-specific auto-approval** - Changed from execution-specific to node-specific using special key pattern `auto_approve_{graph_exec_id}_{node_id}` - **Consolidated approval checking** - Merged `check_auto_approval` into `check_approval` using single OR query for better performance - **Race condition prevention** - Added execution status check before resuming to prevent duplicate execution when approving while graph is running - **Parallel auto-approval creation** - Uses `asyncio.gather` for better performance when creating multiple auto-approval records ## Changes ### Backend Architecture - **`human_review.py`**: - Added `check_approval()` function that checks both normal and auto-approval in single query - Added `create_auto_approval_record()` for node-specific auto-approval using special key pattern - Added `get_auto_approve_key()` helper to generate consistent auto-approval keys - **`review/routes.py`**: - Added execution status check before resuming to prevent race conditions - Refactored auto-approval record creation to use parallel execution with `asyncio.gather` - Removed obvious comments for cleaner code - **`review/model.py`**: Added `auto_approve_future_actions` field to `ReviewRequest` - **`blocks/helpers/review.py`**: Updated to use consolidated `check_approval` via database manager client - **`executor/database.py`**: Exposed `check_approval` through DatabaseManager RPC for block execution context - **`data/block.py`**: Fixed safe mode checks for sensitive action blocks ### Frontend - **New `AIAgentSafetyPopup`** component with localStorage-based one-time display - **`PendingReviewsList`**: - Replaced "Approve all future actions" button with toggle - Toggle resets data to original values and disables editing when enabled - Shows warning message explaining auto-approval behavior - **`RunAgentModal`**: Integrated safety popup before first run - **`usePendingReviews`**: Added polling for real-time badge updates - **`FloatingSafeModeToggle` & `SafeModeToggle`**: Simplified visibility logic - **`local-storage.ts`**: Added localStorage key for popup state tracking ### Bug Fixes - Fixed "Client is not connected to query engine" error by using database manager client pattern - Fixed race condition where approving reviews while graph is RUNNING could queue execution twice - Fixed migration to only drop FK constraint, not non-existent column - Fixed card data reset when auto-approve toggle changes ### Code Quality - Removed duplicate/obvious comments - Moved imports to top-level instead of local scope in tests - Used walrus operator for cleaner conditional assignments - Parallel execution for auto-approval record creation ## Test plan - [ ] Create an AI-generated agent with sensitive actions (e.g., email sending) - [ ] First run should show the safety popup before starting - [ ] Subsequent runs should not show the popup - [ ] Clear localStorage (`AI_AGENT_SAFETY_POPUP_SHOWN`) to verify popup shows again - [ ] Create an agent with human-in-the-loop blocks - [ ] Run it and verify the pending reviews panel appears - [ ] Enable the "Auto-approve all future actions" toggle - [ ] Verify editing is disabled and shows warning message - [ ] Click "Approve" and verify subsequent blocks from same node auto-approve - [ ] Verify auto-approval persists across multiple executions of same graph - [ ] Disable toggle and verify editing works normally - [ ] Verify "Reject" button still works regardless of toggle state - [ ] Test race condition: Approve reviews while graph is RUNNING (should skip resume) - [ ] Test race condition: Approve reviews while graph is REVIEW (should resume) - [ ] Verify pending reviews badge updates in real-time when new reviews are created

…ificant-Gravitas#11819) ## Summary - Add support for delegating agent generation to an external microservice when `AGENTGENERATOR_HOST` is configured - Falls back to built-in LLM-based implementation when not configured (default behavior) - Add comprehensive tests for the service client and core integration (34 tests) ## Changes - Add `agentgenerator_host`, `agentgenerator_port`, `agentgenerator_timeout` settings to `backend/util/settings.py` - Add `service.py` client for external Agent Generator API endpoints: - `/api/decompose-description` - Break down goals into steps - `/api/generate-agent` - Generate agent from instructions - `/api/update-agent` - Generate patches to update existing agents - `/api/blocks` - Get available blocks - `/health` - Health check - Update `core.py` to delegate to external service when configured - Export `is_external_service_configured` and `check_external_service_health` from the module ## Related PRs - Infrastructure repo: https://github.com/Significant-Gravitas/AutoGPT-cloud-infrastructure/pull/273 ## Test plan - [x] All 34 new tests pass (`poetry run pytest test/agent_generator/ -v`) - [ ] Deploy with `AGENTGENERATOR_HOST` configured and verify external service is used - [ ] Verify built-in implementation still works when `AGENTGENERATOR_HOST` is empty

…ystem (Significant-Gravitas#11828) Adds analytics tracking to the chat copilot system for better observability of user interactions and agent operations. ### Changes 🏗️ **PostHog Analytics Integration:** - Added `posthog` dependency (v7.6.0) to track chat events - Created new tracking module (`backend/api/features/chat/tracking.py`) with events: - `chat_message_sent` - When a user sends a message - `chat_tool_called` - When a tool is called (includes tool name) - `chat_agent_run_success` - When an agent runs successfully - `chat_agent_scheduled` - When an agent is scheduled - `chat_trigger_setup` - When a trigger is set up - Added PostHog configuration to settings: - `POSTHOG_API_KEY` - API key for PostHog - `POSTHOG_HOST` - PostHog host URL (defaults to `https://us.i.posthog.com`) **OpenRouter Tracing:** - Added `user` and `session_id` fields to chat completion API calls for OpenRouter tracing - Added `posthogDistinctId` and `posthogProperties` (with environment) to API calls **Files Changed:** - `backend/api/features/chat/tracking.py` - New PostHog tracking module - `backend/api/features/chat/service.py` - Added user message tracking and OpenRouter tracing - `backend/api/features/chat/tools/__init__.py` - Added tool call tracking - `backend/api/features/chat/tools/run_agent.py` - Added agent run/schedule tracking - `backend/util/settings.py` - Added PostHog configuration fields - `pyproject.toml` - Added posthog dependency ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified code passes linting and formatting - [x] Verified PostHog client initializes correctly when API key is provided - [x] Verified tracking is gracefully skipped when PostHog is not configured #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) **New environment variables (optional):** - `POSTHOG_API_KEY` - PostHog project API key - `POSTHOG_HOST` - PostHog host URL (optional, defaults to US cloud)

…ed data display (Significant-Gravitas#11834) ### Changes 🏗️ - Refactored node execution results storage to maintain a history of executions instead of just the latest result - Added support for viewing accumulated output data across multiple executions - Implemented a cleaner UI for viewing historical execution results with proper grouping - Added functionality to clear execution results when starting a new run - Created helper functions to normalize and process execution data consistently - Updated the NodeDataViewer component to display both latest and historical execution data - Added ability to view input data alongside output data in the execution history ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Create and run a flow with multiple blocks that produce output - [x] Verify that execution results are properly accumulated and displayed - [x] Run the same flow multiple times and confirm historical data is preserved - [x] Test the "View more data" functionality to ensure it displays all execution history - [x] Verify that execution results are properly cleared when starting a new run

…nificant-Gravitas#11839) ## Summary This PR adds security checks to prevent execution of disabled blocks across all block execution endpoints. - Add `disabled` flag check to main web API endpoint (`/api/blocks/{block_id}/execute`) - Add `disabled` flag check to external API endpoint (`/api/blocks/{block_id}/execute`) - Add `disabled` flag check to chat tool block execution Previously, block execution endpoints only checked if a block existed but did not verify the `disabled` flag, allowing any authenticated user to execute disabled blocks. ## Test plan - [x] Verify disabled blocks return 403 Forbidden on main API endpoint - [x] Verify disabled blocks return 403 Forbidden on external API endpoint - [x] Verify disabled blocks return error response in chat tool execution - [x] Verify enabled blocks continue to execute normally

…ravitas#11843) ## Changes 🏗️ This prevents Posthog from being initialised locally, where we should not be collecting analytics during local development. ## Checklist 📋 ### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run locally and test the above

…-Gravitas#11844) ### Background The chat service previously supported including page context (URL and content) in user messages. This functionality is being removed. ### Changes 🏗️ - Removed page context handling from `stream_chat_completion` in the chat service - User messages are now passed directly without URL/content context injection - Removed associated logging for page context ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verify chat functionality works without page context - [x] Confirm no regressions in basic chat message handling

…tion (Significant-Gravitas#11833) ## Summary Add interactive UI to collect user answers when the agent-generator service returns clarifying questions during agent creation/editing. Previously, when the backend asked clarifying questions, the frontend would just display them as text with no way for users to answer. This caused the chat to keep retrying without the necessary context. ## Changes - **ChatMessageData type**: Add `clarification_needed` variant with questions field - **ClarificationQuestionsWidget**: New component with interactive form to collect answers - **parseToolResponse**: Detect and parse `clarification_needed` responses from backend - **ChatMessage**: Render the widget when clarification is needed ## How It Works 1. User requests to create/edit agent 2. Backend returns `ClarificationNeededResponse` with list of questions 3. Frontend shows interactive form with text inputs for each question 4. User fills in answers and clicks "Submit Answers" 5. Answers are sent back as context to the tool 6. Backend receives full context and continues ## UI Features - Shows all questions with examples (if provided) - Input validation (all questions must be answered to submit) - Visual feedback (checkmarks when answered) - Numbered questions for clarity - Submit button disabled until all answered - Follows same design pattern as `credentials_needed` flow ## Related - Backend support for clarification was added in Significant-Gravitas#11819 - Fixes the issue shown in the screenshot where users couldn't answer clarifying questions ## Test plan - [ ] Test creating agent that requires clarifying questions - [ ] Verify questions are displayed in interactive form - [ ] Verify all questions must be answered before submitting - [ ] Verify answers are sent back to backend as context - [ ] Verify agent creation continues with full context

…ant-Gravitas#11829) We are removing Langfuse tracing from the chat/copilot system in favor of using OpenRouter's broadcast feature, which keeps our codebase simpler. Langfuse prompt management is retained for fetching system prompts. ### Changes 🏗️ **Removed Langfuse tracing:** - Removed `@observe` decorators from all 11 chat tool files - Removed `langfuse.openai` wrapper (now using standard `openai` client) - Removed `start_as_current_observation` and `propagate_attributes` context managers from `service.py` - Removed `update_current_trace()`, `update_current_span()`, `span.update()` calls **Retained Langfuse prompt management:** - `langfuse.get_prompt()` for fetching system prompts - `_is_langfuse_configured()` check for prompt availability - Configuration for `langfuse_prompt_name` **Files modified:** - `backend/api/features/chat/service.py` - `backend/api/features/chat/tools/*.py` (11 tool files) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `poetry run format` passes - [x] Verified no `@observe` decorators remain in chat tools - [x] Verified Langfuse prompt fetching is still functional (code preserved)

…ons (Significant-Gravitas#11848) ## Changes 🏗️ Implements automatic context window management to prevent chat failures when conversations exceed token limits. ### Problem - **Issue**: [SECRT-1800] Long chat conversations stop working when context grows beyond model limits (~113k tokens observed) - **Root Cause**: Chat service sends ALL messages to LLM without token-aware compression, eventually exceeding Claude Opus 4.5's 200k context window ### Solution Implements a sliding window with summarization strategy: 1. Monitors token count before sending to LLM (triggers at 120k tokens) 2. Keeps last 15 messages completely intact (preserves recent conversation flow) 3. Summarizes older messages using gpt-4o-mini (fast & cheap) 4. Rebuilds context: `[system_prompt] + [summary] + [recent_15_messages]` 5. Full history preserved in database (only compresses when sending to LLM) ### Changes Made - **Added** `_summarize_messages()` helper function to create concise summaries using gpt-4o-mini - **Modified** `_stream_chat_chunks()` to implement token counting and conditional summarization - **Integrated** existing `estimate_token_count()` utility for accurate token measurement - **Added** graceful fallback - continues with original messages if summarization fails ## Motivation and Context 🎯 Without context management, users with long chat sessions (250+ messages) experience: - Complete chat failure when hitting 200k token limit - Lost conversation context - Poor user experience This fix enables: - ✅ Unlimited conversation length - ✅ Transparent operation (no UX changes) - ✅ Preserved conversation quality (recent messages intact) - ✅ Cost-efficient (~$0.0001 per summarization) ## Testing 🧪 ### Expected Behavior - Conversations < 120k tokens: No change (normal operation) - Conversations > 120k tokens: - Log message: `Context summarized: {tokens} tokens, kept last 15 messages + summary` - Chat continues working smoothly - Recent context remains intact ### How to Verify 1. Start a chat session in copilot 2. Send 250-600 messages (or 50+ with large code blocks) 3. Check logs for "Context summarized:" message 4. Verify chat continues working without errors 5. Verify conversation quality remains good ## Checklist ✅ - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] My changes generate no new warnings - [x] I have tested my changes and verified they work as expected

…ignificant-Gravitas#11855) ## Summary Long-running chat tools (like `create_agent` and `edit_agent`) were timing out because no SSE data was sent during tool execution. GCP load balancers and proxies have idle connection timeouts (~60 seconds), and when the external Agent Generator service takes longer than this, the connection would drop. This PR adds SSE heartbeat comments during tool execution to keep connections alive. ### Changes 🏗️ - **response_model.py**: Added `StreamHeartbeat` response type that emits SSE comments (`: heartbeat\n\n`) - **service.py**: Modified `_yield_tool_call()` to: - Run tool execution in a background asyncio task - Yield heartbeat events every 15 seconds while waiting - Handle task failures with explicit error responses (no silent failures) - Handle cancellation gracefully - **create_agent.py**: Improved error messages with more context and details - **edit_agent.py**: Improved error messages with more context and details ### How It Works ``` Tool Call → Background Task Started │ ├── Every 15 seconds: yield `: heartbeat\n\n` (SSE comment) │ └── Task Complete → yield tool result OR error response ``` SSE comments (`: heartbeat\n\n`) are: - Ignored by SSE clients (don't trigger events) - Keep TCP connections alive through proxies/load balancers - Don't affect the AI SDK data protocol ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] All chat service tests pass (17 tests) - [x] Verified heartbeats are sent during long tool execution - [x] Verified errors are properly reported to frontend

…ignificant-Gravitas#11845) ## Summary - Fixes race condition when multiple concurrent requests try to process the same reviews (e.g., double-click, multiple browser tabs) - Previously the second request would fail with "Reviews not found, access denied, or not in WAITING status" - Now handles this gracefully by treating already-processed reviews with the same decision as success ## Changes - Added `get_reviews_by_node_exec_ids()` function that fetches reviews regardless of status - Modified `process_all_reviews_for_execution()` to handle already-processed reviews - Updated route to use idempotent validation ## Test plan - [x] Linter passes (`poetry run ruff check`) - [x] Type checker passes (`poetry run pyright`) - [x] Formatter passes (`poetry run format`) - [ ] Manual testing: double-click approve button should not cause errors Fixes AUTOGPT-SERVER-7HE

…icant-Gravitas#11853) ## Changes 🏗️ - **Fix infinite loop in copilot page** - use Zustand selectors instead of full store object to get stable function references - **Centralize chat streaming logic** - move all streaming files from `providers/chat-stream/` to `components/contextual/Chat/` for better colocation and reusability - **Rename `copilot-store` → `copilot-page-store`**: Clarify scope - **Fix message duplication** - Only replay chunks from active streams (not completed ones) since backend already provides persisted messages in `initialMessages` - **Auto-focus chat input** - Focus textarea when streaming ends and input is re-enabled - **Graceful error display** - Render tool response errors in muted style (small text + warning icon) instead of raw "Error: ..." text ## Checklist 📋 ### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Navigate to copilot page - no infinite loop errors - [x] Start a new chat, send message, verify streaming works - [x] Navigate away and back to a completed session - no duplicate messages - [x] After stream completes, verify chat input receives focus - [x] Trigger a tool error - verify it displays with muted styling

…ignificant-Gravitas#11854) ## Summary Disabled blocks (e.g., webhook blocks without `platform_base_url` configured) were being indexed and returned in chat tool search results. This PR ensures they are properly filtered out. ### Changes 🏗️ - **find_block.py**: Skip disabled blocks when enriching search results - **content_handlers.py**: - Skip disabled blocks during embedding indexing - Update `get_stats()` to only count enabled blocks for accurate coverage metrics ### Why Blocks can be disabled for various reasons (missing OAuth config, no platform URL for webhooks, etc.). These blocks shouldn't appear in search results since users cannot use them. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified disabled blocks are filtered from search results - [x] Verified disabled blocks are not indexed - [x] Verified stats accurately reflect enabled block count

## Changes 🏗️ On the **Copilot** page: - prevent unnecessary sidebar repaints - show a disclaimer when switching chats on the sidebar to terminate a current stream - handle loading better - save streams better when disconnecting ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run the app locally and test the above

…tto (Significant-Gravitas#12213) ## Summary Enables Otto (the AutoGPT copilot) to connect to any MCP (Model Context Protocol) server, discover its tools, and execute them — with the same credential login UI used in the graph builder. **Why a dedicated `run_mcp_tool` instead of reusing `run_block` + MCPToolBlock?** Two blockers make `run_block` unworkable for MCP: 1. **No discovery mode** — `MCPToolBlock` errors with "No tool selected" when `selected_tool` is empty; the agent can't learn what tools exist before picking one. 2. **Credential matching bug** — `find_matching_credential()` (the block execution path) does NOT check MCP server URLs; it would match any stored MCP OAuth credential regardless of server. The correct `_credential_is_for_mcp_server()` helper only applies in the graph path. ## Changes ### Backend - **New `run_mcp_tool` copilot tool** (`run_mcp_tool.py`) — two-stage flow: 1. `run_mcp_tool(server_url)` → discovers available tools via `MCPClient.list_tools()` 2. `run_mcp_tool(server_url, tool_name, tool_arguments)` → executes via `MCPClient.call_tool()` - Lazy auth: fast DB credential lookup first (`MCPToolBlock._auto_lookup_credential`); on HTTP 401/403 with no stored creds, returns `SetupRequirementsResponse` so the frontend renders the existing CredentialsGroupedView OAuth login card - **New response models** in `models.py`: `MCPToolsDiscoveredResponse`, `MCPToolOutputResponse`, `MCPToolInfo` - **Exclude MCPToolBlock** from `find_block` / `run_block` (`COPILOT_EXCLUDED_BLOCK_TYPES`) - **System prompt update** — MCP section with two-step flow, `input_schema` guidance, auth-wait instruction, and registry URL (`registry.modelcontextprotocol.io`) ### Frontend - **`RunMCPToolComponent`** — routes between credential prompt (reuses `SetupRequirementsCard` from RunBlock) and result card; discovery step shows only a minimal in-progress animation (agent-internal, not user-facing) - **`MCPToolOutputCard`** — renders tool result as formatted JSON or plain text - **`helpers.tsx`** — type guards (`isMCPToolOutput`, `isSetupRequirementsOutput`, `isErrorOutput`), output parsing, animation text - Registered `tool-run_mcp_tool` case in `ChatMessagesContainer` ## Test plan - [ ] Call `run_mcp_tool(server_url)` with a public MCP server → see discovery animation, agent gets tool list - [ ] Call `run_mcp_tool(server_url, tool_name, tool_arguments)` → see `MCPToolOutputCard` with result - [ ] Call with an auth-required server and no stored creds → `SetupRequirementsCard` renders with MCP OAuth button - [ ] After connecting credentials, retry → executes successfully - [ ] `find_block("MCP")` returns no results (MCPToolBlock excluded) - [ ] Backend unit tests: mock `MCPClient` for discovery + execution + auth error paths --------- Co-authored-by: Otto (AGPT) <otto@agpt.co>

@majdyz

…ificant-Gravitas#12250) Requested by @majdyz When CoPilot compacts (summarizes/truncates) conversation history to fit within context limits, the user now sees it rendered like a tool call — a spinner while compaction runs, then a completion notice. **Backend:** - Added `compaction_start_events()`, `compaction_end_events()`, `compaction_events()` in `response_model.py` using the existing tool-call SSE protocol (`tool-input-start` → `tool-input-available` → `tool-output-available`) - All three compaction paths (legacy `service.py`, SDK pre-query, SDK mid-stream) use the same pattern - Pre-query and SDK-internal compaction tracked independently so neither suppresses the other **Frontend:** - Added `compaction` tool category to `GenericTool` with `ArrowsClockwise` icon - Shows "Summarizing earlier messages…" with spinner while running - Shows "Earlier messages were summarized" when done - No expandable accordion — just the status line **Cleanup:** - Removed unused `system_notice_start/end_events`, `COMPACTION_STARTED_MSG` - Removed unused `system_notice_events`, `system_error_events`, `_system_text_events` Closes SECRT-2053 --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>

@majdyz

Significant-Gravitas#12267) Requested by @majdyz When two concurrent requests write to the same workspace file path with `overwrite=True`, the retry after deleting the conflicting file could also hit a `UniqueViolationError`. This raw Prisma exception was bubbling up unhandled to Sentry as a high-priority alert (AUTOGPT-SERVER-7ZA). Now the retry path catches `UniqueViolationError` specifically and converts it to a `ValueError` with a clear message, matching the existing pattern for the non-overwrite path. **Change:** `autogpt_platform/backend/backend/util/workspace.py` — added a specific `UniqueViolationError` catch before the generic `Exception` catch in the retry block. **Risk:** Minimal — only affects the already-failing retry path. No behavior change for success paths. --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>

…ignificant-Gravitas#12256) ## Summary - Add text-to-speech action button to CoPilot assistant messages using the browser Web Speech API - Add share action button that uses the Web Share API with clipboard fallback - Replace inline SVG copy icon with Phosphor CopyIcon for consistency ## Linked Issue SECRT-2052 ## Test plan - [ ] Verify copy button still works - [ ] Click speaker icon and verify TTS reads aloud - [ ] Click stop while playing and verify speech stops - [ ] Click share icon and verify native share or clipboard fallback Note: This PR should be merged after SECRT-2051 PR --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…Significant-Gravitas#12277) ## Summary OpenRouter Broadcast silently drops traces for the Anthropic-native `/api/v1/messages` endpoint unless an `x-session-id` HTTP header is present. This was confirmed by systematic testing against our Langfuse integration: | Test | Endpoint | `x-session-id` header | Broadcast to Langfuse | |------|----------|-----------------------|----------------------| | 1 | `/chat/completions` | N/A (body fields work) | ✅ | | 2 | `/messages` (body fields only) | ❌ | ❌ | | 3 | `/messages` (header + body) | ✅ | ✅ | | 4 | `/messages` (`metadata.user_id` only) | ❌ | ❌ | | 5 | `/messages` (header only) | ✅ | ✅ | **Root cause:** OpenRouter only triggers broadcast for the `/messages` endpoint when the `x-session-id` HTTP header is present — body-level `session_id` and `metadata.user_id` are insufficient. ### Changes - **SDK path:** Inject `x-session-id` and `x-user-id` via `ANTHROPIC_CUSTOM_HEADERS` env var in `_build_sdk_env()`, which the Claude Agent SDK CLI reads and attaches to every outgoing API request - **Non-SDK path:** Add `trace` object (`trace_name` + `environment`) to `extra_body` for richer broadcast metadata in Langfuse This creates complementary traces alongside the existing OTEL integration: broadcast provides cost/usage data from OpenRouter while OTEL provides full tool-call observability with `userId`, `sessionId`, `environment`, and `tags`. ## Test plan - [x] Verified via test script: `/messages` with `x-session-id` header → trace appears in Langfuse with correct `sessionId` - [x] Verified `/chat/completions` with `trace` object → trace appears with custom `trace_name` - [x] Pre-commit hooks pass (ruff, black, isort, pyright) - [ ] Deploy to dev and verify broadcast traces appear for real copilot SDK sessions

…tool calling (Significant-Gravitas#12276) ## Summary - Remove ~1200 lines of broken/unmaintained non-SDK copilot streaming code (retry logic, parallel tool calls, context window management) - Add `stream_chat_completion_baseline()` as a clean fallback LLM path with full tool-calling support when `CHAT_USE_CLAUDE_AGENT_SDK=false` (e.g. when Anthropic is down) - Baseline reuses the same shared `TOOL_REGISTRY`, `get_available_tools()`, and `execute_tool()` as the SDK path - Move baseline code to dedicated `baseline/` folder (mirrors `sdk/` structure) - Clean up SDK service: remove unused params, fix model/env resolution, fix stream error persistence - Clean up config: remove `max_retries`, `thinking_enabled` fields (non-SDK only) ## Changes | File | Action | |------|--------| | `backend/copilot/baseline/__init__.py` | New — package export | | `backend/copilot/baseline/service.py` | New — baseline streaming with tool-call loop | | `backend/copilot/baseline/service_test.py` | New — multi-turn keyword recall test | | `backend/copilot/service.py` | Remove ~1200 lines of legacy code, keep shared helpers only | | `backend/copilot/executor/processor.py` | Simplify branching to SDK vs baseline | | `backend/copilot/sdk/service.py` | Remove unused params, fix model/env separation, fix stream error persistence | | `backend/copilot/config.py` | Remove `max_retries`, `thinking_enabled` | | `backend/copilot/service_test.py` | Keep SDK test only (baseline test moved) | | `backend/copilot/parallel_tool_calls_test.py` | Deleted (tested removed code) | ## Test plan - [x] `poetry run format` passes - [x] CI passes (all 3 Python versions, types, CodeQL) - [ ] SDK path works unchanged in production - [x] Baseline path (`CHAT_USE_CLAUDE_AGENT_SDK=false`) streams responses with tool calling - [x] Baseline emits correct Vercel AI SDK stream protocol events

@ntindle

…ant-Gravitas#12285) Requested by @ntindle After logging in with email/password, the page navigates but renders a blank/unauthenticated state (just logo + cookie banner). A manual page refresh fixes it. The `login` server action calls `signInWithPassword()` server-side but doesn't call `revalidatePath()`, so Next.js serves cached RSC payloads that don't reflect the new auth state. The OAuth callback route already does this correctly. **Fix:** Add `revalidatePath(next, "layout")` after successful login, matching the OAuth callback pattern. Closes SECRT-2059

## Summary - Skip CLI version check at worker init (saves ~300ms/request) - Pre-warm bundled CLI binary at startup to warm OS page caches (~500ms saved on first request per worker) - Parallelize E2B setup, system prompt fetch, and transcript download with `asyncio.gather()` (saves ~200-500ms) - Enable Langfuse prompt caching with configurable TTL (default 300s) ## Test plan - [ ] `poetry run pytest backend/copilot/sdk/service_test.py -s -vvv` - [ ] Manual: send copilot messages via SDK path, verify resume still works on multi-turn - [ ] Check executor logs for "CLI pre-warm done" messages

… instructions (Significant-Gravitas#12279) ## Summary - Large tool outputs (>80K chars) are now persisted to session workspace storage before truncation, preventing permanent data loss - Truncated output includes a head preview (50K chars) with clear retrieval instructions referencing `read_workspace_file` with offset/length - Added `offset` and `length` parameters to `ReadWorkspaceFileTool` for paginated reads of large files without re-triggering truncation ## Problem Tool outputs exceeding 100K chars were permanently lost — truncated by `StreamToolOutputAvailable.model_post_init` using middle-out truncation. The model had no way to retrieve the full output later, causing recursive read loops where the agent repeatedly tries to re-read truncated data. ## Solution 1. **`BaseTool.execute()`** — When output exceeds 80K chars, persist full output to workspace at `tool-outputs/{tool_call_id}.json`, then replace with a head preview wrapped in `<tool-output-truncated>` tags containing retrieval instructions 2. **`ReadWorkspaceFileTool`** — New `offset`/`length` parameters enable paginated reads so the agent can fetch slices without re-triggering truncation 3. **Graceful fallback** — If workspace write fails, returns raw output unchanged for existing truncation to handle ## Test plan - [x] `base_test.py`: 5 tests covering persist+preview, fallback on error, small output passthrough, large output persistence, anonymous user skip - [x] `workspace_files_test.py`: Ranged read test covering offset+length slice, offset-only, offset beyond file length - [ ] CI passes - [ ] Review comments addressed

…ficant-Gravitas#12274) Resolves: OPEN-3018 Google Drive picker fields on INPUT blocks were missing connection handles, making them non-chainable in the new builder. ### Changes 🏗️ - **Render `TitleFieldTemplate` with `InputNodeHandle`** — uses `getHandleId()` with `fieldPathId.$id` (which correctly resolves to e.g. `agpt_%_spreadsheet`), fixing the previous `_@_` handle error caused by using `idSchema.$id` (undefined for custom RJSF FieldProps) - **Override `showHandles: !!nodeId`** in uiOptions — the INPUT block's `generate-ui-schema.ts` sets `showHandles: false`, but Google Drive fields need handles to be chainable - **Hide picker content when handle is connected** — uses `useEdgeStore.isInputConnected()` to detect wired connections and conditionally hides the picker/placeholder UI ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Add a Google Drive file input block to a graph in the new builder - [x] Verify the connection handle appears on the input - [x] Connect another block's output to the Google Drive input handle - [x] Verify the picker UI hides when connected and reappears when disconnected - [x] Verify the Google Drive picker still works normally on non-INPUT block nodes 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Changes input-handle ID generation and conditional rendering for Google Drive fields in the builder; regressions could break edge connections or hide the picker unexpectedly on some nodes. > > **Overview** > Google Drive picker fields now render a proper RJSF `TitleFieldTemplate` (and thus input handles) using a computed `handleId` derived from `fieldPathId.$id`, and force `showHandles` on when a `nodeId` is present. > > The picker/placeholder UI is now conditionally hidden when `useEdgeStore.isInputConnected()` reports the input handle is connected, preventing duplicate input UI when the value comes from an upstream node. > > Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 1f1df53. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).  --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: abhi1992002 <abhimanyu1992002@gmail.com> Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>

@0ubbe

…gnup (Significant-Gravitas#12287) Requested by @0ubbe Password signup was missing the backend `createUser()` call that the OAuth callback flow already had. This caused `getOnboardingStatus()` to fail/hang for new users whose backend record didn't exist yet, resulting in an infinite spinner after account creation. ## Root Cause | Flow | createUser() | getOnboardingStatus() | Result | |------|-------------|----------------------|--------| | OAuth signup | ✅ Called | ✅ Works | Redirects correctly | | Password signup | ❌ Missing | ❌ Fails/hangs | Infinite spinner | ## Fix Adds `createUser()` call in `signup/actions.ts` after session is set, before onboarding status check — matching the OAuth callback pattern. Includes error handling with Sentry reporting. ## Testing - Create a new password account → should redirect without spinner - OAuth signup unaffected (no changes to that flow) Fixes OPEN-3023 --------- Co-authored-by: Lluis Agusti <hi@llu.lu>

…ploads (Significant-Gravitas#12226) ## Summary Builder node file inputs were stored as base64 data URIs directly in graph JSON, bloating saves and causing lag. This PR uploads files to the existing workspace system and stores lightweight `workspace://` references instead. ## What changed - **Upload**: When a user picks a file in a builder node input, it gets uploaded to workspace storage and the graph stores a small `workspace://file-id#mime/type` URI instead of a huge base64 string. - **Delete**: When a user clears a file input, the workspace file is soft-deleted from storage so it doesn't leave orphaned files behind. - **Execution**: Wired up `workspace_id` on `ExecutionContext` so blocks can resolve `workspace://` URIs during graph runs. `store_media_file()` already knew how to handle them. - **Output rendering**: Added a renderer that displays `workspace://` URIs as images, videos, audio players, or download cards in node output. - **Proxy fix**: Removed a `Content-Type: text/plain` override on multipart form responses that was breaking the generated hooks' response parsing. Existing graphs with base64 `data:` URIs continue to work — no migration needed. ## Test plan - [x] Upload file in builder → spinner shows, completes, file label appears - [x] Save/reload graph → `workspace://` URI persists, not base64 - [x] Clear file input → workspace file is deleted - [x] Run graph → blocks resolve `workspace://` files correctly - [x] Output renders images/video/audio from `workspace://` URIs - [x] Old graphs with base64 still work --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…with Langfuse feedback (Significant-Gravitas#12260) ## Summary - Feedback is submitted to the backend Langfuse integration (`/api/chat/sessions/{id}/feedback`) for observability - Downvote opens a modal dialog for optional detailed feedback text (max 2000 chars) - Buttons are hidden during streaming and appear on hover; once feedback is selected they stay visible ## Changes - **`AssistantMessageActions.tsx`** (new): Renders copy (CopySimple), thumbs-up, and thumbs-down buttons using `MessageAction` from the design system. Visual states for selected feedback (green for upvote, red for downvote with filled icons). - **`FeedbackModal.tsx`** (new): Dialog with a textarea for optional downvote comment, using the design system `Dialog` component. - **`useMessageFeedback.ts`** (new): Hook managing per-message feedback state and backend submission via `POST /api/chat/sessions/{id}/feedback`. - **`ChatMessagesContainer.tsx`** (modified): Renders `AssistantMessageActions` after `MessageContent` for assistant messages when not streaming. - **`ChatContainer.tsx`** (modified): Passes `sessionID` prop through to `ChatMessagesContainer`. ## Test plan - [ ] Verify action buttons appear on hover over assistant messages - [ ] Verify buttons are hidden during active streaming - [ ] Click copy button → text copied to clipboard, success toast shown - [ ] Click upvote → green highlight, "Thank you" toast, button locked - [ ] Click downvote → red highlight, feedback modal opens - [ ] Submit feedback modal with/without comment → modal closes, feedback sent - [ ] Cancel feedback modal → modal closes, downvote stays locked - [ ] Verify feedback POST reaches `/api/chat/sessions/{id}/feedback` ### Linear issue Closes SECRT-2051 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

@majdyz

…vision blocks (Significant-Gravitas#12273) Requested by @majdyz When users upload images or PDFs to CoPilot, the AI couldn't see the content because the CLI's Zod validator rejects large base64 in MCP tool results and even small images were misidentified (the CLI silently drops or corrupts image content blocks in tool results). ## Approach Embed uploaded images directly as **vision content blocks** in the user message via `client._transport.write()`. The SDK's `client.query()` only accepts string content, so we bypass it for multimodal messages — writing a properly structured user message with `[...image_blocks, {"type": "text", "text": query}]` directly to the transport. This ensures the CLI binary receives images as native vision blocks, matching how the Anthropic API handles multimodal input. For binary files accessed via workspace tools at runtime, we save them to the SDK's ephemeral working directory (`sdk_cwd`) and return a file path for the CLI's built-in `Read` tool to handle natively. ## Changes ### Vision content blocks for attached files — `service.py` - `_prepare_file_attachments` downloads workspace files before the query, converts images to base64 vision blocks (`{"type": "image", "source": {"type": "base64", ...}}`) - When vision blocks are present, writes multimodal user message directly to `client._transport` instead of using `client.query()` - Non-image files (PDFs, text) are saved to `sdk_cwd` with a hint to use the Read tool ### File-path based access for workspace tools — `workspace_files.py` - `read_workspace_file` saves binary files to `sdk_cwd` instead of returning base64, returning a path for the Read tool ### SDK context for ephemeral directory — `tool_adapter.py` - Added `sdk_cwd` context variable so workspace tools can access the ephemeral directory - Removed inline base64 multimodal block machinery (`_extract_content_block`, `_strip_base64_from_text`, `_BLOCK_BUILDERS`, etc.) ### Frontend — rendering improvements - `MessageAttachments.tsx` — uses `OutputRenderers` system (`globalRegistry` + `OutputItem`) for image/video preview rendering instead of custom components - `GenericTool.tsx` — uses `OutputRenderers` system for inline image rendering of base64 content - `routes.py` — returns 409 for duplicate workspace filenames ### Tests - `tool_adapter_test.py` — removed multimodal extraction/stripping tests, added `get_sdk_cwd` tests - `service_test.py` — rewritten for `_prepare_file_attachments` with file-on-disk assertions Closes OPEN-3022 --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>

…-Gravitas#12281) ## Summary Depends on Significant-Gravitas#12276 (baseline code). - Swap shared OpenAI client to `langfuse.openai.AsyncOpenAI` — auto-captures all LLM calls (token usage, latency, model, prompts) as Langfuse generations when configured - Add `propagate_attributes()` context in baseline streaming for `user_id`/`session_id` attribution, matching the SDK path's OTEL tracing - No-op when Langfuse is not configured — `langfuse.openai.AsyncOpenAI` falls back to standard `openai.AsyncOpenAI` behavior ## Observability parity | Aspect | SDK path | Baseline path (after this PR) | |--------|----------|-------------------------------| | LLM call tracing | OTEL via `configure_claude_agent_sdk()` | `langfuse.openai.AsyncOpenAI` auto-instrumentation | | User/session context | `propagate_attributes()` | `propagate_attributes()` | | Langfuse prompts | Shared `_build_system_prompt()` | Shared `_build_system_prompt()` | | Token/cost tracking | Via OTEL spans | Via Langfuse generation objects | ## Test plan - [x] `poetry run format` passes (pyright, ruff, black, isort) - [ ] Verify Langfuse traces appear for baseline path with `CHAT_USE_CLAUDE_AGENT_SDK=false` - [ ] Verify SDK path tracing is unaffected

…ificant-Gravitas#12289) ## Summary Handle empty/None `tool_call.arguments` in the baseline copilot path that cause OpenRouter 400 errors when converting to Anthropic format. ## Changes **`backend/copilot/baseline/service.py`**: - Default empty `tc["arguments"]` to `"{}"` to prevent OpenRouter from failing on empty tool arguments during format conversion. ## Test plan - [x] Existing baseline tests pass - [ ] Verify on staging: trigger a tool call in baseline mode and confirm normal flow works

…gnificant-Gravitas#12288) ## Summary - Adds `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION` config flag to let the copilot SDK path use the Claude CLI's own subscription auth (from `claude login`) instead of API keys - When enabled, the SDK subprocess inherits CLI credentials — no `ANTHROPIC_BASE_URL`/`AUTH_TOKEN` override is injected - Forces SDK mode regardless of LaunchDarkly flag (baseline path uses `openai.AsyncOpenAI` which requires an API key) - Validates CLI installation on first use with clear error messages ## Setup ```bash npm install -g @anthropic-ai/claude-code claude login # then set in .env: CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true ``` ## Changes | File | Change | |------|--------| | `copilot/config.py` | New `use_claude_code_subscription` field + env var validator | | `copilot/sdk/service.py` | `_validate_claude_code_subscription()` + `_build_sdk_env()` early-return + fail-fast guard | | `copilot/executor/processor.py` | Force SDK mode via short-circuit `or` | ## Test plan - [ ] Set `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true`, unset all API keys - [ ] Run `claude login` on the host - [ ] Start backend, send a copilot message — verify SDK subprocess uses CLI auth - [ ] Verify existing OpenRouter/API key flows still work (no regression)

…cant-Gravitas#12257) ## Summary - Adds per-turn work-done counters (e.g. "3 searches", "1 agent run") shown as plain text on the final assistant message of each user/assistant interaction pair - Counters aggregate tool calls by category (searches, agents run, blocks run, agents created/edited, agents scheduled) - Copy and TTS actions now appear only on the final assistant message per turn, with text aggregated from all assistant messages in that turn - Removes the global JobStatsBar above the chat input Resolves: SECRT-2026 ## Test plan - [ ] Work-done counters appear only on the last assistant message of each turn (not on intermediate assistant messages) - [ ] Counters increment correctly as tool call parts appear in messages - [ ] Internal operations (add_understanding, search_docs, get_doc_page, find_block) are NOT counted - [ ] Max 3 counter categories shown, sorted by volume - [ ] Copy/TTS actions appear only on the final assistant message per turn - [ ] Copy/TTS aggregate text from all assistant messages in the turn - [ ] No counters or actions shown while streaming is still in progress - [ ] No type errors, lint errors, or format issues introduced Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

@majdyz

…ntent block validation error (Significant-Gravitas#12294) Requested by @majdyz ## Problem CoPilot throws `400 Invalid Anthropic Messages API request` errors on first message, both locally and on Dev. ## Root Cause The CLI's built-in `ToolSearch` tool returns `tool_reference` content blocks (`{"type": "tool_reference", "tool_name": "mcp__copilot__find_block"}`). When the CLI constructs the next Anthropic API request, it passes these blocks as-is in the `tool_result.content` field. However, the Anthropic Messages API only accepts `text` and `image` content block types in tool results. This causes a Zod validation error: ``` messages[3].content[0].content: Invalid input: expected string, received array ``` The error only manifests when using **OpenRouter** (`ANTHROPIC_BASE_URL` set) because the Anthropic TypeScript SDK performs stricter client-side Zod validation in that code path vs the subscription auth path. PR Significant-Gravitas#12288 bumped `claude-agent-sdk` from `0.1.39` to `^0.1.46`, which upgraded the bundled Claude CLI from `v2.1.49` to `v2.1.69` where this issue was introduced. ## Fix Pin to `0.1.45` which has a CLI version that doesn't produce `tool_reference` content blocks in tool results. ## Testing - CoPilot first message should work without 400 errors via OpenRouter - SDK compat tests should still pass

…ficant-Gravitas#12301) The Copilot browser tool (`browser_navigate`, `browser_act`, `browser_screenshot`) has been broken on dev because `agent-browser` CLI + Chromium were never installed in the backend Docker image. ### Changes 🏗️ - Added `npx playwright install-deps chromium` to install Chromium runtime libraries (libnss3, libatk, etc.) - Added `npm install -g agent-browser` to install the CLI - Added `agent-browser install` to download the Chromium binary - Layer is placed after existing COPY-from-builder lines to preserve Docker cache ordering ### Root cause Every `browser_navigate` call fails with: ``` WARNING [browser_navigate] open failed for <url>: agent-browser is not installed (run: npm install -g agent-browser && agent-browser install). ``` The error originates from `FileNotFoundError` in `agent_browser.py:101` when the subprocess tries to execute the `agent-browser` binary which doesn't exist in the container. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `agent-browser` binary is missing from current dev pod via `kubectl logs` - [x] Confirmed session `01eeac29-5a7` shows repeated failures for all URLs - [ ] After deploy: verify browser_navigate works in a Copilot session on dev #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**)

…essages (Significant-Gravitas#12302) ### Changes 🏗️ Fixes a race condition in `update_session_title()` where the background title generation task could overwrite the Redis session cache with a stale snapshot, causing the copilot to "forget" its previous turns. **Root cause:** `update_session_title()` performs a read-modify-write on the Redis cache (read full session → set title → write back). Meanwhile, `upsert_chat_session()` writes a newer version with more messages during streaming. If the title task reads early (e.g., 34 messages) and writes late (after streaming persisted 101 messages), the stale 34-message version overwrites the 101-message version. When the next message lands on a different pod, it loads the stale session from Redis. **Fix:** Replace the read-modify-write with a simple cache invalidation (`invalidate_session_cache`). The title is already updated in the DB; the next access just reloads from DB with the correct title and messages. No locks, no deserialization of the full session blob, no risk of stale overwrites. **Evidence from prod logs (session `41a3814c`):** - Pod `tm2jb` persisted session with 101 messages - Pod `phflm` loaded session from Redis cache with only 35 messages (66 messages lost) - The title background task ran between these events, overwriting the cache ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/model_test.py` — 15/15 pass - [x] All pre-commit hooks pass (ruff, black, isort, pyright) - [ ] After deploy: verify long sessions no longer lose context on multi-pod setups

…gnificant-Gravitas#12303) ## Summary Fixes copilot sessions "forgetting" previous turns due to stale transcript storage. **Root cause:** The transcript upload logic used byte size comparison (`existing >= new → skip`) to prevent overwriting newer transcripts with older ones. However, with `--resume` the CLI compacts old tool results, so newer transcripts can have **fewer bytes** despite containing **more conversation events**. This caused the stored transcript to freeze at whatever the largest historical upload was — every subsequent turn downloaded the same stale transcript and the agent lost context of recent turns. **Evidence from prod session `41a3814c`:** - Stored transcript: 764KB (frozen, never updated) - Turn 1 output: 379KB (75 lines) → upload skipped (764KB >= 379KB) - Turn 2 output: 422KB (71 lines) → upload skipped (764KB >= 422KB) - Turn 3 output: **empty** → upload skipped - Agent resumed from the same stale 764KB transcript every turn, losing context of the PR it created **Fix:** Remove the size comparison entirely. The executor holds a cluster lock per session, so concurrent uploads cannot race. Just always overwrite with the latest transcript. ## Test plan - [x] `poetry run pytest backend/copilot/sdk/transcript_test.py` — 25/25 pass - [x] All pre-commit hooks pass - [ ] After deploy: verify multi-turn sessions retain context across turns

… dev

@Pwuts

…gnificant-Gravitas#12312) When `add_graph_execution` is called from a context where the global Prisma client isn't connected (e.g. CoPilot tools, external API), the call to `get_or_create_workspace(user_id)` crashes with `ClientNotConnectedError` because it directly accesses `UserWorkspace.prisma()`. The fix adds `workspace_db` to the existing `if prisma.is_connected()` fallback pattern, consistent with how all other DB calls in the function already work. **Sentry:** AUTOGPT-SERVER-83T (and ~15 related issues going back to Jan 2026) --- Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co> Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>

@ntindle

…-context model (Significant-Gravitas#12318) ## Summary Major refactor to eliminate CLI transcript race conditions and simplify the codebase by building transcripts directly from SDK messages instead of reading CLI files. ## Problem The previous approach had race conditions: - SDK reads CLI transcript file during stop hook - CLI may not have finished writing → incomplete transcript - Complex merge logic to detect and fix incomplete writes - ~200 lines of synthetic entry detection and merge code ## Solution **Atomic Full-Context Transcript Model:** - Build transcript from SDK messages during streaming (`TranscriptBuilder`) - Each upload REPLACES the previous transcript entirely (atomic) - No CLI file reading → no race conditions - Eliminates all merge complexity ## Key Changes ### Core Refactor - **NEW**: `transcript_builder.py` - Build JSONL from SDK messages during streaming - **SIMPLIFIED**: `transcript.py` - Removed merge logic, simplified upload/download - **SIMPLIFIED**: `service.py` - Use TranscriptBuilder, removed stop hook callback - **CLEANED**: `security_hooks.py` - Removed `on_stop` parameter ### Performance & Code Quality - **orjson migration**: Use `backend.util.json` (2-3x faster than stdlib) - Added `fallback` parameter to `json.loads()` for cleaner error handling - Moved SDK imports to top-level per code style guidelines ### Bug Fixes - Fixed garbage collection bug in background task handling - Fixed double upload bug in timeout handling - Downgraded PII-risk logging from WARNING to DEBUG - Added 30s timeout to prevent session lock hang ## Code Removed (~200 lines) - `merge_with_previous_transcript()` - No longer needed - `read_transcript_file()` - No longer needed - `CapturedTranscript` dataclass - No longer needed - `_on_stop()` callback - No longer needed - Synthetic entry detection logic - No longer needed - Manual append/merge logic in finally block - No longer needed ## Testing - ✅ All transcript tests passing (24/24) - ✅ Verified with real session logs showing proper transcript growth - ✅ Verified with Langfuse traces showing proper turn tracking (1-8) ## Transcript Growth Pattern From session logs: - **Turn 1**: 2 entries (initial) - **Turn 2**: 5 entries (+3), 2257B uploaded - **Turn N**: ~2N entries (linear growth) Each upload is the **complete atomic state** - always REPLACES, never incremental. ## Files Changed ``` backend/copilot/sdk/transcript_builder.py (NEW) | +140 lines backend/copilot/sdk/transcript.py | -198, +125 lines backend/copilot/sdk/service.py | -214, +160 lines backend/copilot/sdk/security_hooks.py | -33, +10 lines backend/copilot/sdk/transcript_test.py | -85, +36 lines backend/util/json.py | +45 lines ``` **Net result**: -200 lines, more reliable, faster JSON operations. ## Migration Notes This is a **breaking change** for any code that: - Directly calls `merge_with_previous_transcript()` or `read_transcript_file()` - Relies on incremental transcript uploads - Expects stop hook callbacks All internal usage has been updated. --- @ntindle - Tagging for autogpt-reviewer

…ant-Gravitas#12323) ## Summary - Fixes tool results not being captured in the CoPilot transcript during SDK-based streaming - Adds `transcript_builder.add_user_message()` call with `tool_result` content block when a `StreamToolOutputAvailable` event is received - Ensures transcript accurately reflects the full conversation including tool outputs, which is critical for Langfuse tracing and debugging ## Context After the transcript refactor in Significant-Gravitas#12318, tool call results from the SDK streaming loop were not being recorded in the transcript. This meant Langfuse traces were missing tool outputs, making it hard to debug agent behavior. ## Test plan - [ ] Verify CoPilot conversation with tool calls captures tool results in Langfuse traces - [ ] Verify transcript includes tool_result content blocks after tool execution

majdyz and others added 30 commits January 21, 2026 00:56

hotfix(frontend): copilot simplication...

277b053

majdyz and others added 30 commits March 4, 2026 05:30

Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into HEAD

f6f268a

Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into…

be18436

… dev

Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT

0b9e066

Update frontend build based on commit 7ead4c0

669939a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classic frontend build/master#1

Classic frontend build/master#1
MpcOS77 wants to merge 1665 commits intoBct-crypto:masterfrom
MpcOS77:classic-frontend-build/master

MpcOS77 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants