Hello, This project uses Reinforcement Learning to train the classic CartPole Game. The aim was to make the Pole balance for a longer period of time.
Index
-
We have used Gymnasium which is a maintained fork of OpenAI's gym. This was done by understanding the environment and actions which were provided by the library.
-
We used Q learning to train the model and test it using the Q-table values. The results were that we could make the pole stay in position for about 60 steps. Here is the video which shows 58 steps taken by the CartPole.
Game_9_step_58.mp4
-
pip install -r requirements.txt
-
- Python-Files: It consists of
Q_learning_based.pyfile for the Q_learning algorithm. - Collab_Notebooks: It consists of
Q_learning_based.ipynbfile for running in collab without any dependencies. All the dependencies are already set-up in the Notebook. - results: It consists of the
Q\_learning.txtfile which contain the result of the Q_learning algorithm. - video: It consists of the video results obtained from the Q_learning algorithm with the help of
Imageiolibrary. - requirements.txt consists of the libraries required to run the
Q_learningfile.
- Python-Files: It consists of
-
- Position (-0.5 to 0.5)
- Velocity ( -inf to inf )
- Angle ( -0.24 to 0.24 )
- Angular Velocity ( -inf to inf )
-
- 0: Pushing the cart to the left.
- 1: Pushing the card to the right.
-
- Pole angle becomes greater than 12 degree.
- Position is not in the region of 2.4 units.
- The total number of steps become more than 500 in an episode.
-
- Velocity ( -50000 to +50000 )
- Angular Velocity ( -50000 to +50000 )
-
The API of
Gymnasiumenabled us to use their methods and the above description of the environment is also explained in the detailed manner here.
-
-
Discretize: This is a function which is used to discretize the observation space provided and remove the convert the velocity and angular velocity to a lower speed since their range included infinity. -
Training: This is a method which consists of training the agent using the Q-learning algorithm. The exact formula used for the Q-learning is as follows: Q(s,a) = Q(s,a) + α( reward + γ(Q(s',a')) - Q(s,a) ) -
Stats_train: This method prints the Statistics obtained after training the agent. -
Testing: This method test the Agent on the different starting states. It also takes the hyperparameter which signifies the number of times of testing. -
Save_video: This saves the video in the directory specified with respect to the local path.
-
Game_1_step_46.mp4
Game_5_step_44.mp4
-
This project is licensed under © Uday Bhardwaj and Vedansh Sharma.