This project explores Imitation Learning in the LunarLander environment from Gymnasium.
Instead of learning through trial-and-error like traditional Reinforcement Learning, the agent learns to act by mimicking expert behavior using supervised learning techniques.
The goal is to train a model that can replicate expert decisions and successfully land the spacecraft.
Imitation Learning bridges the gap between:
- Supervised Learning (learning from labeled data)
- Reinforcement Learning (learning from rewards)
In this project, we apply Behavior Cloning, where the model learns a direct mapping: state -> action.
- Environment: LunarLander (Gymnasium)
- State Space: 8-dimensional continuous vector
- Action Space: 4 discrete actions:
- Do nothing
- Fire left engine
- Fire main engine
- Fire right engine
The objective is to land safely between the flags with minimal velocity and correct orientation.
- Generated expert trajectories using a heuristic / rule-based policy
- Collected
(state, action)pairs
- Trained a neural network using supervised learning
- Input: state vector
- Output: predicted action
- Loss function: Cross-Entropy
- Optimization: Gradient-based optimization
- Goal: Minimize difference between predicted and expert actions
- The trained agent is able to imitate expert-like behavior
- Successfully performs controlled descent in many scenarios
- Demonstrates stable policy learning without reward-based training
├── notebook.ipynb # Main training and evaluation notebook ├── data/ # Collected expert data (if exists) ├── models/ # Saved models (optional) └── README.md
- Open the notebook in Google Colab / Jupyter
- Run all cells
- The agent will:
- Train on expert data
- Evaluate performance in the environment
- Python
- TensorFlow / Keras
- Gymnasium
- NumPy
- Matplotlib
- Imitation Learning can simplify RL problems by removing the need for reward design
- Behavior Cloning is simple but sensitive to distribution shift
- Demonstrates how supervised learning can be applied to sequential decision-making
- Implement DAgger for better generalization
- Compare with RL methods (DQN / Policy Gradient)
- Improve robustness to unseen states