Skip to content

Latest commit

 

History

History
749 lines (577 loc) · 23.3 KB

File metadata and controls

749 lines (577 loc) · 23.3 KB
layout default
title Browser Use Tutorial - Chapter 4: Form Automation
nav_order 4
has_children false
parent Browser Use Tutorial

Chapter 4: Form Automation - Intelligent Form Filling and Submission

Welcome to Chapter 4: Form Automation - Intelligent Form Filling and Submission. In this part of Browser Use Tutorial: AI-Powered Web Automation Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Automate form filling, validation, and submission with intelligent field detection and data handling.

Overview

Form automation is a key use case for web automation. Browser Use provides intelligent form detection, field mapping, and data entry capabilities. This chapter covers automated form filling, validation handling, and complex form workflows.

Basic Form Filling

Simple Form Automation

# basic_form_filling.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def basic_form_filling():
    """Demonstrate basic form filling operations"""

    agent = Agent(
        task="""
        Fill out the form at https://httpbin.org/forms/post:

        Personal Information:
        - Customer name: Sarah Johnson
        - Telephone: 555-0124
        - Email: sarah.johnson@example.com

        Order Details:
        - Pizza Size: Large
        - Toppings: Pepperoni, Mushrooms, Extra cheese
        - Delivery Instructions: Please knock loudly on door

        Payment Information:
        - Payment Method: Credit Card

        Submit the form and describe the response you receive.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=20
    )

    result = await agent.run()
    print(f"Form Submission Result: {result}")

if __name__ == "__main__":
    asyncio.run(basic_form_filling())

Form Field Detection

# form_field_detection.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def form_field_detection():
    """Demonstrate intelligent form field detection"""

    agent = Agent(
        task="""
        Analyze the form structure at https://httpbin.org/forms/post:

        1. Identify all form fields and their types
        2. Determine which fields are required vs optional
        3. Understand the field labels and expected data formats
        4. Note any validation rules or constraints
        5. Describe the overall form layout and organization

        Don't submit the form - just analyze its structure.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=15,
        use_vision=True  # Enable vision for form layout analysis
    )

    result = await agent.run()
    print(f"Form Analysis: {result}")

if __name__ == "__main__":
    asyncio.run(form_field_detection())

Advanced Form Handling

Complex Multi-Page Forms

# multi_page_forms.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def multi_page_form_handling():
    """Handle complex multi-page form workflows"""

    agent = Agent(
        task="""
        Handle a complex multi-step form process:

        1. Go to a site with multi-page forms (like registration wizards)
        2. Start filling out the first page of the form
        3. Navigate through all form steps/pages
        4. Handle any conditional fields that appear based on previous answers
        5. Complete the entire form submission process
        6. Verify successful submission

        If you can't find a perfect multi-page form, adapt this approach to any complex form you can find.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=35,
        use_vision=True
    )

    result = await agent.run()
    print(f"Multi-Page Form Result: {result}")

if __name__ == "__main__":
    asyncio.run(multi_page_form_handling())

Dynamic Form Fields

# dynamic_forms.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def dynamic_form_handling():
    """Handle forms with dynamic fields and conditional logic"""

    agent = Agent(
        task="""
        Handle a form with dynamic/conditional fields:

        1. Find a form that shows/hides fields based on selections
        2. Start with basic field selections
        3. Observe how the form changes based on your inputs
        4. Fill in the dynamically revealed fields
        5. Complete the form submission

        Look for forms with features like:
        - Country selection that shows province/state fields
        - Checkbox options that reveal additional input fields
        - Radio buttons that change available options
        - Multi-step wizards with conditional steps
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=30,
        use_vision=True
    )

    result = await agent.run()
    print(f"Dynamic Form Result: {result}")

if __name__ == "__main__":
    asyncio.run(dynamic_form_handling())

Data Entry Strategies

Structured Data Input

# structured_data_input.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def structured_data_entry():
    """Handle structured data entry from various sources"""

    # Sample data to enter
    customer_data = {
        "name": "Michael Chen",
        "email": "michael.chen@company.com",
        "phone": "555-0199",
        "company": "Tech Solutions Inc",
        "address": {
            "street": "123 Business St",
            "city": "San Francisco",
            "state": "CA",
            "zip": "94105"
        }
    }

    agent = Agent(
        task=f"""
        Fill out a comprehensive form using this structured data:

        Customer Information:
        - Name: {customer_data['name']}
        - Email: {customer_data['email']}
        - Phone: {customer_data['phone']}
        - Company: {customer_data['company']}

        Address Details:
        - Street: {customer_data['address']['street']}
        - City: {customer_data['address']['city']}
        - State: {customer_data['address']['state']}
        - ZIP: {customer_data['address']['zip']}

        Find an appropriate form to fill out with this data.
        If you can't find a real form, use https://httpbin.org/forms/post
        and adapt the data to fit the available fields.

        Verify all data is entered correctly before submitting.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=25
    )

    result = await agent.run()
    print(f"Structured Data Entry Result: {result}")

if __name__ == "__main__":
    asyncio.run(structured_data_entry())

Bulk Data Entry

# bulk_data_entry.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def bulk_data_entry():
    """Handle bulk data entry operations"""

    # Sample bulk data
    products = [
        {"name": "Laptop", "price": 1299.99, "category": "Electronics"},
        {"name": "Mouse", "price": 29.99, "category": "Accessories"},
        {"name": "Keyboard", "price": 79.99, "category": "Accessories"},
        {"name": "Monitor", "price": 399.99, "category": "Electronics"},
    ]

    agent = Agent(
        task=f"""
        Perform bulk data entry for these products:

        {chr(10).join(f"- {p['name']}: ${p['price']} ({p['category']})" for p in products)}

        1. Find a form or system that accepts product data entry
        2. Enter each product one by one
        3. Verify each entry is saved correctly
        4. Keep track of successful vs failed entries

        If you can't find a real product entry system, simulate the process
        by documenting what you would do for each product.

        Report the results of the bulk entry operation.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=40
    )

    result = await agent.run()
    print(f"Bulk Data Entry Result: {result}")

if __name__ == "__main__":
    asyncio.run(bulk_data_entry())

Form Validation and Error Handling

Validation Error Handling

# validation_handling.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def form_validation_handling():
    """Handle form validation errors and corrections"""

    agent = Agent(
        task="""
        Test form validation and error handling:

        1. Find a form with validation requirements
        2. Try to submit the form with invalid data first:
           - Missing required fields
           - Invalid email format
           - Incorrect data types
           - Values outside acceptable ranges

        3. Observe the validation errors
        4. Correct the invalid data
        5. Resubmit the form successfully

        Document the validation rules you discover and how you handle errors.
        Use https://httpbin.org/forms/post if you need a simple test form.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=30,
        use_vision=True  # Help detect validation messages
    )

    result = await agent.run()
    print(f"Validation Handling Result: {result}")

if __name__ == "__main__":
    asyncio.run(form_validation_handling())

CAPTCHA and Bot Protection

# captcha_handling.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def captcha_handling():
    """Handle CAPTCHA and bot protection systems"""

    agent = Agent(
        task="""
        Test interaction with CAPTCHA and bot protection:

        1. Find a form that has CAPTCHA protection
        2. Attempt to fill out the form
        3. Encounter the CAPTCHA challenge
        4. Describe the type of CAPTCHA (image, reCAPTCHA, etc.)
        5. Explain the limitations of automated solving

        Note: Most CAPTCHAs cannot be solved automatically by browsers.
        Focus on understanding the limitations and alternative approaches.

        Use a site like https://www.google.com/recaptcha/admin if you can find a test form.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=20,
        use_vision=True
    )

    result = await agent.run()
    print(f"CAPTCHA Handling Analysis: {result}")

if __name__ == "__main__":
    asyncio.run(captcha_handling())

Advanced Form Techniques

File Upload Forms

# file_upload_forms.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def file_upload_handling():
    """Handle file upload forms and operations"""

    agent = Agent(
        task="""
        Handle file upload functionality:

        1. Find a form that accepts file uploads
        2. Prepare or create a sample file to upload
        3. Fill out any accompanying form fields
        4. Upload the file successfully
        5. Verify the upload was successful

        Look for forms that accept:
        - Image uploads
        - Document uploads
        - CSV/data file uploads
        - Multiple file uploads

        If you can't find a real upload form, explain the process you would follow.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=25
    )

    result = await agent.run()
    print(f"File Upload Result: {result}")

if __name__ == "__main__":
    asyncio.run(file_upload_handling())

Form Data Extraction

# form_data_extraction.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def form_data_extraction():
    """Extract data from existing forms"""

    agent = Agent(
        task="""
        Extract form data and structure:

        1. Find a comprehensive form (registration, application, etc.)
        2. Analyze all form fields and their properties:
           - Field types (text, select, radio, checkbox, etc.)
           - Field labels and help text
           - Validation rules and requirements
           - Default values and placeholders

        3. Document the complete form structure
        4. Extract any example data or instructions provided

        Focus on understanding form design patterns and user experience considerations.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=20,
        use_vision=True
    )

    result = await agent.run()
    print(f"Form Data Extraction: {result}")

if __name__ == "__main__":
    asyncio.run(form_data_extraction())

Workflow Automation

Multi-Form Workflows

# multi_form_workflows.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def multi_form_workflow():
    """Handle complex workflows involving multiple forms"""

    agent = Agent(
        task="""
        Execute a complex multi-form workflow:

        1. Start with a registration or signup form
        2. Complete the initial registration
        3. Handle email verification if required
        4. Fill out a profile completion form
        5. Set up additional account preferences
        6. Complete any onboarding wizards

        This simulates a complete user onboarding process.
        Document each step and any challenges encountered.

        If you can't find a real multi-step process, break down a complex
        single form into logical steps and explain your approach.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=50,
        use_vision=True
    )

    result = await agent.run()
    print(f"Multi-Form Workflow Result: {result}")

if __name__ == "__main__":
    asyncio.run(multi_form_workflow())

Form Testing Automation

# form_testing.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def automated_form_testing():
    """Automate form testing scenarios"""

    test_scenarios = [
        "Test form with valid data - should submit successfully",
        "Test form with missing required fields - should show validation errors",
        "Test form with invalid email format - should show email validation error",
        "Test form with data exceeding length limits - should show length validation",
        "Test form with special characters in text fields - should handle encoding properly"
    ]

    for scenario in test_scenarios:
        print(f"\n--- Testing: {scenario} ---")

        agent = Agent(
            task=f"""
            {scenario}

            1. Find an appropriate test form
            2. Execute the test scenario
            3. Document the results and any error messages
            4. Verify expected behavior occurred

            Use https://httpbin.org/forms/post as your primary test form.
            """,
            llm=ChatOpenAI(model="gpt-4o"),
            max_steps=20
        )

        result = await agent.run()
        print(f"Test Result: {result}")

if __name__ == "__main__":
    asyncio.run(automated_form_testing())

Enterprise Form Automation

Integration with Business Systems

# enterprise_integration.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
import json

async def enterprise_form_integration():
    """Integrate form automation with enterprise systems"""

    # Sample enterprise data
    employee_data = {
        "employee_id": "EMP001",
        "name": "John Smith",
        "department": "Engineering",
        "start_date": "2024-01-15",
        "manager": "Jane Doe",
        "access_levels": ["email", "vpn", "development_tools"]
    }

    agent = Agent(
        task=f"""
        Automate enterprise employee onboarding:

        Employee Details:
        - ID: {employee_data['employee_id']}
        - Name: {employee_data['name']}
        - Department: {employee_data['department']}
        - Start Date: {employee_data['start_date']}
        - Manager: {employee_data['manager']}
        - Access Levels: {', '.join(employee_data['access_levels'])}

        1. Access HR system or employee management portal
        2. Create new employee record
        3. Fill out all required information
        4. Set up department and manager assignments
        5. Configure access permissions
        6. Submit and verify successful creation

        This demonstrates enterprise form automation for HR processes.
        If you can't access real systems, simulate the process with available forms.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=35
    )

    result = await agent.run()
    print(f"Enterprise Integration Result: {result}")

if __name__ == "__main__":
    asyncio.run(enterprise_form_integration())

Compliance and Audit Trails

# compliance_automation.py
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def compliance_form_automation():
    """Handle compliance-related form automation"""

    compliance_data = {
        "report_type": "GDPR Data Processing",
        "data_controller": "Example Corp",
        "processing_purpose": "Customer relationship management",
        "data_categories": ["Personal identification", "Contact information", "Purchase history"],
        "legal_basis": "Contract performance",
        "retention_period": "7 years",
        "security_measures": ["Encryption", "Access controls", "Audit logging"]
    }

    agent = Agent(
        task=f"""
        Complete a compliance reporting form:

        Report Details:
        - Type: {compliance_data['report_type']}
        - Data Controller: {compliance_data['data_controller']}
        - Processing Purpose: {compliance_data['processing_purpose']}
        - Data Categories: {', '.join(compliance_data['data_categories'])}
        - Legal Basis: {compliance_data['legal_basis']}
        - Retention Period: {compliance_data['retention_period']}
        - Security Measures: {', '.join(compliance_data['security_measures'])}

        1. Access compliance reporting system
        2. Select appropriate report type
        3. Fill out all compliance fields accurately
        4. Attach any required documentation
        5. Submit report and obtain confirmation

        This demonstrates regulatory compliance automation.
        Maintain audit trail and accuracy throughout the process.
        """,
        llm=ChatOpenAI(model="gpt-4o"),
        max_steps=30,
        save_conversation_path="./compliance_audit.json"  # Audit trail
    )

    result = await agent.run()
    print(f"Compliance Automation Result: {result}")

if __name__ == "__main__":
    asyncio.run(compliance_form_automation())

Form Automation Flow

flowchart TD
    A[Agent navigates to form page]
    B[DOM analysis identifies form fields]
    C[Field types detected: text select checkbox radio]
    D[Agent maps input data to fields]
    E[input_text actions fill text fields]
    F[select_option actions choose dropdowns]
    G[click actions select checkboxes]
    H[Submit button clicked]
    I[Success or error response validated]
    A --> B
    B --> C
    C --> D
    D --> E
    D --> F
    D --> G
    E --> H
    F --> H
    G --> H
    H --> I
Loading

Summary

In this chapter, we've covered:

  • Basic Form Filling: Simple form automation with structured data
  • Form Field Detection: Intelligent analysis of form structure and requirements
  • Complex Form Handling: Multi-page and dynamic form workflows
  • Data Entry Strategies: Structured and bulk data input operations
  • Validation Handling: Error detection and correction in form submissions
  • Advanced Techniques: File uploads, data extraction, and CAPTCHA handling
  • Workflow Automation: Multi-form processes and testing automation
  • Enterprise Integration: Business system integration and compliance automation

Key Takeaways

  1. Intelligent Field Detection: AI understands form structure and field purposes
  2. Dynamic Form Handling: Adapts to conditional fields and multi-page forms
  3. Validation Management: Handles errors gracefully and corrects invalid data
  4. Structured Data Input: Efficiently processes bulk and complex data entry
  5. Workflow Orchestration: Manages complex multi-step form processes
  6. Enterprise Integration: Connects with business systems and compliance requirements
  7. Error Recovery: Robust handling of form submission failures and retries

Next Steps

Now that you can automate form filling and submission, let's explore data extraction techniques for gathering information from websites.


Ready for Chapter 5? Data Extraction

Generated for Awesome Code Docs

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for form, asyncio, Agent so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 4: Form Automation - Intelligent Form Filling and Submission as an operating subsystem inside Browser Use Tutorial: AI-Powered Web Automation Agents, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around ChatOpenAI, agent, result as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 4: Form Automation - Intelligent Form Filling and Submission usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for form.
  2. Input normalization: shape incoming data so asyncio receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through Agent.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • Browser Use Repository Why it matters: authoritative reference on Browser Use Repository (github.com).
  • Browser Use Releases Why it matters: authoritative reference on Browser Use Releases (github.com).
  • Browser Use Docs Why it matters: authoritative reference on Browser Use Docs (docs.browser-use.com).
  • Browser Use Cloud Why it matters: authoritative reference on Browser Use Cloud (cloud.browser-use.com).

Suggested trace strategy:

  • search upstream code for form and asyncio to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections