Server Address: 127.0.0.1 (localhost)
Port: 65432
Protocol: ZeroMQ (REQ/REP pattern) with JSON messaging
Connection Type: Client-Server (Unity = REQ Client, Python = REP Server)
Receive Timeout: Auto-configured based on tickrate (minimum 2 seconds, default 3× tick interval)
Disconnect Detection: Server automatically detects client disconnection via timeout and socket status checks
The server uses ZeroMQ's Request-Reply (REQ/REP) pattern:
- Unity Client (REQ): Sends requests (game state) and waits for replies
- Python Server (REP): Receives requests, processes them, and sends replies (steering/config)
- Strict Message Alternation: Unity MUST send a request before receiving a reply
- Synchronous Communication: Unity blocks until it receives a reply from Python
- Connection Format:
tcp://127.0.0.1:65432
1. Unity: Connect to server
2. Unity: Send first game_state message (handshake)
3. Server: Receive handshake, send configuration reply
4. Unity: Receive config, synchronize tickrate
5. Unity: Send actual game_state messages
6. Server: Process and send steering commands
7. Episodes continue on same connection until:
- Unity disconnects
- Server timeout (no message received within receive_timeout)
- Server shutdown
Important: Configuration is sent ONCE per connection, not per episode. Multiple episodes can run on the same connection without re-handshaking.
First reply from server contains configuration:
{
"type": "config",
"tickrate": 30,
"tick_interval_ms": 33.33,
"max_episode_steps": 1000,
"message": "Server configuration. Please synchronize your update rate."
}| Field | Type | Description |
|---|---|---|
type |
string | Always "config" for configuration messages |
tickrate |
int | Server tickrate in Hz (updates per second) |
tick_interval_ms |
float | Time interval between updates in milliseconds |
max_episode_steps |
int | Maximum steps per episode before truncation |
message |
string | Informational message |
Index 0: Forward Ray - Max Distance: 7.0
Index 1: Forward-Left Ray - Max Distance: 4.5
Index 2: Forward-Right Ray- Max Distance: 4.5
Index 3: Right Ray - Max Distance: 3.5
Index 4: Left Ray - Max Distance: 3.5
- Max Speed:
2.5units - Steering Speed Penalty:
-0.5units (when steering ≠ 0) - Max Episode Steps:
1000steps
- Survival Reward:
+0.1per step - Reward Collected:
+15.0 - Collision Penalty:
-10.0
- Client stays connected across episodes
- Server automatically starts new episode when previous ends (if client connected)
- No re-handshake needed between episodes
- Configuration received on first connection applies to all episodes in session
Server detects disconnection through:
- Timeout: No message received within
receive_timeout_ms(calculated asmax(2000ms, tickrate_interval × 3)) - Socket Status: ZMQ socket event checks during episode transitions
- Send Failures: Unable to send response to client
When client disconnects:
- Server logs "CLIENT DISCONNECTED" with session statistics
- Server resets to
WAITING_FOR_CLIENTstate - Connection manager state changes to
LISTENING - Server ready to accept new client connection
- Episode/step counters preserved for statistics
If Unity needs to reconnect:
- Close existing socket
- Create new REQ socket
- Connect to server
- Send handshake (first game_state message)
- Receive new configuration
- Resume normal operation
{
"message": "game_state",
"id": 123,
"gameState": {
"rayDistances": [7.0, 4.5, 4.5, 3.5, 3.5],
"rayHits": [0, 0, 0, 0, 0],
"carSpeed": 2.5,
"rewardCollected": 0,
"collisionDetected": 0,
"respawns": 0,
"elapsedTime": 10.5
}
}| Field | Type | Required | Values/Range | Description |
|---|---|---|---|---|
message |
string | ✅ | "game_state" |
Message type identifier |
id |
int | ✅ | Any positive int | Message sequence number (increment each message) |
rayDistances |
float[] | ✅ | [0.0, maxDist] | Distance to nearest obstacle for each ray |
rayHits |
int[] | ✅ | 0 or 1 | Ray hit indicator (0=clear, 1=hit) |
carSpeed |
float | ✅ | [0.0, 2.5] | Current car linear velocity |
rewardCollected |
int | ✅ | 0 or 1 | Signal: 1 if reward collected this frame, else 0 |
collisionDetected |
int | ✅ | 0 or 1 | Signal: 1 if collision occurred this frame, else 0 |
respawns |
int | ✅ | ≥ 0 | Total number of respawns in episode |
elapsedTime |
float | ✅ | ≥ 0.0 | Time elapsed in episode (seconds) |
{
"steering": 0,
"reward": 0.1,
"episode_reward": 15.3,
"step": 42,
"total_steps": 1337,
"episode": 5,
"total_episodes": 5,
"terminated": false,
"truncated": false
}| Field | Type | Description |
|---|---|---|
steering |
int | Steering command: -1 (left), 0 (straight), 1 (right) |
reward |
float | Reward received for this step |
episode_reward |
float | Cumulative reward for current episode |
step |
int | Current step number in episode |
total_steps |
int | Total steps across all episodes |
episode |
int | Current episode number |
total_episodes |
int | Total episodes completed |
terminated |
bool | Episode ended due to collision/respawn |
truncated |
bool | Episode ended due to max steps reached |
| Value | Direction | Unity Action |
|---|---|---|
-1 |
Turn LEFT | Apply left steering input |
0 |
Go STRAIGHT | No steering input |
1 |
Turn RIGHT | Apply right steering input |
Unity Python Server
| |
|--1. Connect ZeroMQ-------------->| (Listening)
| |
|--2. Send handshake (game_state)->| (Receive first message)
| | (Send configuration)
|<-3. Receive config --------------|
| {tickrate: 30, ...} |
| (Synchronize tickrate) |
| |
|--4. Send game_state (id:1) ----->| (Process state, get action)
|<-5. Receive response ------------|
| {steering:0, reward:0.1, ...} |
| (Apply steering) |
| (Wait tick_interval) |
| |
|--6. Send game_state (id:2) ----->| (Process state, reward: +15.1)
| rewardCollected: 1 | (Log: "Reward collected!")
|<-7. Receive response ------------|
| {steering:1, reward:15.1, ...} |
| (Apply steering) |
| (Wait tick_interval) |
| |
|--8. Send game_state (id:3) ----->| (Process state, penalty: -9.9)
| collisionDetected: 1 | (Log: "Collision detected!")
| | (Episode ends)
|<-9. Receive response ------------|
| {steering:0, terminated:true} |
| | (New episode starts)
| |
|--10. Continue loop ------------->|
|... |...
// ❌ WRONG - Flags stay set forever
void OnRewardCollected() {
rewardCollected = 1; // Set but never reset
}
// ✅ CORRECT - Reset after sending
int SendGameStateAndGetSteering() {
// ... send message ...
// Reset single-frame signals immediately after sending
currentState.rewardCollected = 0;
currentState.collisionDetected = 0;
return steering;
}// ❌ WRONG - Using fixed tickrate
void Update() {
if (Time.time % 0.033f < Time.deltaTime) { // Hardcoded 30Hz
SendGameState();
}
}
// ✅ CORRECT - Using server's tickrate
void Update() {
timeSinceLastUpdate += Time.deltaTime;
if (timeSinceLastUpdate >= tickInterval) { // Server-provided interval
SendGameState();
timeSinceLastUpdate = 0f;
}
}// Return max distance if no hit
float CastRay(Vector3 direction, float maxDistance, int index) {
if (Physics.Raycast(transform.position, direction, out hit, maxDistance)) {
rayHits[index] = 1;
return hit.distance; // Actual distance
}
rayHits[index] = 0;
return maxDistance; // ✅ Return max, not 0!
}// ✅ Increment ID for each message
messageId++; // 1, 2, 3, 4...Episodes end when:
- Collision:
collisionDetected = 1 - Respawn:
respawns > 0 - Truncation:
step >= 1000
Server automatically starts new episode if Unity stays connected.
Option 1 (Package Manager):
Assets → Package Manager → Add from git URL
https://github.com/NetMQ/NetMQ.git
Option 2 (NuGet):
Install NetMQ package via NuGet for Unity
using NetMQ;
using NetMQ.Sockets;
using UnityEngine;- Host:
127.0.0.1 - Port:
65432 - Default Tickrate:
30 Hz(configurable in server.py main()) - Max Episode Steps:
1000 - Receive Timeout:
max(2000ms, tickrate_interval × 3) - Accept Timeout:
1000ms(for new client connections)
- config: Server → Unity (first message of connection only, NOT per episode)
- game_state: Unity → Server (every tick, all episodes)
- response: Server → Unity (steering + rewards + episode stats, every tick)
-1: Turn Left0: Go Straight1: Turn Right
- Survival:
+0.1per step - Cube Collection:
+15.0 - Collision:
-10.0