PPO RL AutoDRV - Compute Backend

Reinforcement Learning backend for autonomous driving using Proximal Policy Optimization (PPO). Trains an agent to navigate in a Unity 3D environment via ZeroMQ communication.

Unity Game World: PPO_AutoDRW_Unity3d_GameWorld

Installation

# Clone and setup
git clone <repo-url>
cd PPO_RL_AutoDRV_Compute_Backend
pip install -r requirements.txt

# Verify
python -c "import torch; print('CUDA:', torch.cuda.is_available())"

Quick Start

Training

Edit app.py:

MODE = "train"
CONFIG_FILE = "config.json"

Start backend:
```
python app.py
```
Launch Unity client from game world repo

Outputs:

Logs: logs/train_<timestamp>.log
Checkpoints: models/checkpoints/ppo_episode_<N>.pth (every 50 episodes)
Best model: ppo_best.pth
Final model: models/ppo_autodrive.pth

Resume Training

Edit config.json:

"training": {
  "resume_from_checkpoint": "models/checkpoints/ppo_episode_1000.pth"
}

Inference

Edit app.py:
```
MODE = "inference"
```

Edit config.json:

"inference": {
  "model_path": "ppo_best.pth"
}

Run: python app.py + launch Unity client

Configuration

Main settings in config.json:

Server:

host: 127.0.0.1 (localhost)
port: 65432 (ZeroMQ connection)
tickrate: Updates per second

Environment:

max_ray_distances: Ray sensor max distances
max_speed: Vehicle max speed
reward_collected_value: Reward for collectibles
collision_penalty: Collision penalty
survival_reward: Per-step reward
straight_driving_reward: Bonus for straight driving

Training:

total_episodes: Training episode count
update_frequency: Policy update interval
save_frequency: Checkpoint save interval
resume_from_checkpoint: Path to resume from (or null)

PPO:

lr_actor, lr_critic: Learning rates
gamma: Discount factor (0.99)
epsilon: PPO clip parameter (0.2)
entropy_coef: Exploration bonus
batch_size: Training batch size

Environment Details

Observation Space (11D):

5 ray distances (normalized)
5 ray hit indicators (binary)
1 speed value

Action Space (3 discrete):

0: Turn Left
1: Straight
2: Turn Right

Rewards:

Survival: +0.1/step
Straight driving: +0.05
Collection: +15.0
Collision: -10.0

Model Architecture

Actor (Policy): Input(11) → FC(256) → ReLU → FC(256) → ReLU → FC(3) → Softmax

Critic (Value): Input(11) → FC(256) → ReLU → FC(256) → ReLU → FC(1)

Project Structure

PPO_RL_AutoDRV_Compute_Backend/
├── app.py                      # Main entry point
├── config.json                 # Configuration
├── requirements.txt            # Dependencies
├── src/
│   ├── server.py              # ZeroMQ server
│   ├── environment.py         # Gym environment
│   ├── ppo_model.py           # PPO algorithm
│   ├── ppo_controller.py      # Agent controller
│   ├── connection_manager.py  # Connection handling
│   └── helpers.py             # Utilities
├── models/
│   ├── ppo_autodrive.pth      # Final model
│   ├── ppo_best.pth           # Best model
│   └── checkpoints/           # Training checkpoints
└── logs/                       # Training logs

Troubleshooting

Connection Issues:

Verify Unity and Python use same host:port
Check firewall settings

Training Issues:

Lower learning rates if unstable
Adjust reward structure
Increase collision_penalty if too aggressive

GPU Not Working:

python -c "import torch; print(torch.cuda.is_available())"

Communication Protocol: See CommunicationDesign.md for ZeroMQ protocol details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
logs		logs
models		models
runs		runs
src		src
.gitignore		.gitignore
CommunicationDesign.md		CommunicationDesign.md
README.md		README.md
TODO.txt		TODO.txt
app.py		app.py
config.json		config.json
config_quicktest.json		config_quicktest.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO RL AutoDRV - Compute Backend

Installation

Quick Start

Training

Resume Training

Inference

Configuration

Environment Details

Model Architecture

Project Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPO RL AutoDRV - Compute Backend

Installation

Quick Start

Training

Resume Training

Inference

Configuration

Environment Details

Model Architecture

Project Structure

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages