An autonomous AI agent designed to play Pac-Man using real-time computer vision and hybrid intelligence. This project combines local high-speed processing for immediate reflexes with cloud-based AI (Google Gemini) for high-level strategic planning.
- Real-time Screen Capture: Uses
mssfor low-latency, pixel-perfect game state acquisition. - Computer Vision Pipeline:
- Custom
MapExtractorto build a grid representation of the game level. ObjectDetectorCVusing template matching to track Pac-Man, ghosts, and pellets.
- Custom
- Hybrid AI Architecture:
- Local Agent: Fast, deterministic policy for collision avoidance and pathfinding (30 FPS).
- Cloud Strategist: Asynchronous integration with Google Gemini to analyze game state and provide strategic advice.
- Modular Design: Decoupled modules for Capture, Vision, Agent, and Control, allowing for easy experimentation with different algorithms (e.g., RL vs. Heuristic).
- Language: Python 3.x
- Computer Vision: OpenCV (
cv2), NumPy - Input/Output:
mss(Screen Capture),pyautogui/keyboard(Control) - AI Integration: Google Generative AI SDK (Gemini)
├── agent/ # Decision making logic (Pathfinding, Policies)
├── ai_google/ # Google Gemini integration for strategic advice
├── capture/ # Screen capture implementation
├── control/ # Keyboard input simulation
├── vision/ # Computer vision pipeline (Detection, Mapping)
├── docs/ # Documentation and Architecture details
├── tools/ # Calibration and utility scripts
└── main.py # Application entry point
-
Clone the repository
git clone https://github.com/snowholt/Inteligent-PacMan.git cd Inteligent-PacMan -
Install Dependencies
pip install -r requirements.txt
(Note: Ensure you have
opencv-python,numpy,mss,google-generativeai, etc. installed) -
Configuration
- Update
config.pywith your screen region coordinates. - Set your Google API Key in
.env:GOOGLE_API_KEY=your_api_key_here
- Update
-
Run the Agent Open your Pac-Man game window and run:
python main.py
The system follows a robotic Sense-Plan-Act loop:
- Sense: Capture screen frame -> Detect objects -> Update World Model.
- Plan: Calculate costs -> Consult Policy/Gemini -> Determine next move.
- Act: Send keystroke to OS.
For more details, see ARCHITECTURE.md.