CartPole Game using Reinforcement Learning

Hello, This project uses Reinforcement Learning to train the classic CartPole Game. The aim was to make the Pole balance for a longer period of time.

Overview

We have used Gymnasium which is a maintained fork of OpenAI's gym. This was done by understanding the environment and actions which were provided by the library.
We used Q learning to train the model and test it using the Q-table values. The results were that we could make the pole stay in position for about 60 steps. Here is the video which shows 58 steps taken by the CartPole.

Game_9_step_58.mp4

Setup

All the requirements for the code to run are provided in the requirements.txt file.
You can install the libraries directly by running the following command locally/globally.

pip install -r requirements.txt
You can run the Q_learning_based file in the python_files directory. Modify the code depending upon the episodes number, video path, result file path, etc.
File descriptions:
- Python-Files: It consists of Q_learning_based.py file for the Q_learning algorithm.
- Collab_Notebooks: It consists of Q_learning_based.ipynb file for running in collab without any dependencies. All the dependencies are already set-up in the Notebook.
- results: It consists of the Q\_learning.txt file which contain the result of the Q_learning algorithm.
- video: It consists of the video results obtained from the Q_learning algorithm with the help of Imageio library.
- requirements.txt consists of the libraries required to run the Q_learning file.

Environment

CartPole-v_1 is used for the Reinforcement Learning algorithms.
The Observation Space consists of four Continuous features:
1. Position (-0.5 to 0.5)
2. Velocity ( -inf to inf )
3. Angle ( -0.24 to 0.24 )
4. Angular Velocity ( -inf to inf )
The Action Space consists of 2 Discrete states:
1. 0: Pushing the cart to the left.
2. 1: Pushing the card to the right.
Reward was given everytime whenever a step was taken by the agent.
There were 3 termination conditions for ending an episode
1. Pole angle becomes greater than 12 degree.
2. Position is not in the region of 2.4 units.
3. The total number of steps become more than 500 in an episode.
We have diminished the observation space:
1. Velocity ( -50000 to +50000 )
2. Angular Velocity ( -50000 to +50000 )
The API of Gymnasium enabled us to use their methods and the above description of the environment is also explained in the detailed manner here.

Algorithm

We have used Q-Learning algorithm for the task with the ε-greedy Exploration policy.
For this, we created a class for the Q_learning algorithm. This consists of the methods for training, testing, Saving video, Exploration Policy, Discretizing function.
When we initialize the Q_learning class, it setups the Environment from Gymnasium and initializes any additional data structure required for frames, statistics related to Environment.
The major methods used are:
- Discretize: This is a function which is used to discretize the observation space provided and remove the convert the velocity and angular velocity to a lower speed since their range included infinity.
- Training: This is a method which consists of training the agent using the Q-learning algorithm. The exact formula used for the Q-learning is as follows: Q(s,a) = Q(s,a) + α( reward + γ(Q(s',a')) - Q(s,a) )
- Stats_train: This method prints the Statistics obtained after training the agent.
- Testing: This method test the Agent on the different starting states. It also takes the hyperparameter which signifies the number of times of testing.
- Save_video: This saves the video in the directory specified with respect to the local path.

Results

Finally, the agent could stay balanced upto 60-80 steps.
Some of the selected videos are shown here. Others can be seen in the video directory.
46 Steps

Game_1_step_46.mp4

44 Steps

Game_5_step_44.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Collab-Notebooks		Collab-Notebooks
Python-Files		Python-Files
results		results
video/Q_learning		video/Q_learning
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CartPole Game using Reinforcement Learning

Hello, This project uses Reinforcement Learning to train the classic CartPole Game. The aim was to make the Pole balance for a longer period of time.

Table of Contents

Overview

We have used Gymnasium which is a maintained fork of OpenAI's gym. This was done by understanding the environment and actions which were provided by the library.

We used Q learning to train the model and test it using the Q-table values. The results were that we could make the pole stay in position for about 60 steps. Here is the video which shows 58 steps taken by the CartPole.

Setup

All the requirements for the code to run are provided in the requirements.txt file.

You can install the libraries directly by running the following command locally/globally.

You can run the Q_learning_based file in the python_files directory. Modify the code depending upon the episodes number, video path, result file path, etc.

File descriptions:

Environment

CartPole-v_1 is used for the Reinforcement Learning algorithms.

The Observation Space consists of four Continuous features:

The Action Space consists of 2 Discrete states:

Reward was given everytime whenever a step was taken by the agent.

There were 3 termination conditions for ending an episode

We have diminished the observation space:

The API of Gymnasium enabled us to use their methods and the above description of the environment is also explained in the detailed manner here.

Algorithm

We have used Q-Learning algorithm for the task with the ε-greedy Exploration policy.

For this, we created a class for the Q_learning algorithm. This consists of the methods for training, testing, Saving video, Exploration Policy, Discretizing function.

When we initialize the Q_learning class, it setups the Environment from Gymnasium and initializes any additional data structure required for frames, statistics related to Environment.

The major methods used are:

Results

Finally, the agent could stay balanced upto 60-80 steps.

Some of the selected videos are shown here. Others can be seen in the video directory.

46 Steps

44 Steps

References

Gymnasium

Imageio

Matplotlib

License

This project is licensed under © Uday Bhardwaj and Vedansh Sharma.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

All the requirements for the code to run are provided in the `requirements.txt` file.

You can run the `Q_learning_based` file in the python_files directory. Modify the code depending upon the episodes number, video path, result file path, etc.

`CartPole-v_1` is used for the Reinforcement Learning algorithms.

The API of `Gymnasium` enabled us to use their methods and the above description of the environment is also explained in the detailed manner here.

Packages