Reinforcement Learning Module, Part 2 #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

gabriel-trigo wants to merge 34 commits into google:copybara_push from gabriel-trigo:PR_rl2-gabriel

.github/ISSUE_TEMPLATE.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,16 +1,14 @@
  
    ## Expected Behavior

    ## Actual Behavior

    ## Steps to Reproduce the Problem

    1.

    1.

    1.

    2.

    3.

    ## Specifications

    - Version:

    - Platform:
      
    - Platform:

.github/PULL_REQUEST_TEMPLATE.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,6 +1,28 @@
  
    Fixes #<issue_number_goes_here>

    ## Description

    > It's a good idea to open an issue first for discussion.

    [Provide a one sentence summary of the changes implemented.]

    - [ ] Tests pass

    - [ ] Appropriate changes to documentation are included in the PR
      
    [Link to related issues (e.g. "Closes #123", "Resolves #456").]

    ## Details

    Details:

    - [Provide additional details, as applicable.]

    - [Provide additional details, as applicable.]

    - [Provide additional details, as applicable.]

    ## Checklist

    - [ ] I have read the [Contributor's Guide](https://google.github.io/sbsim/contributing/).

    - [ ] I have signed the [Contributor License Agreement](https://cla.developers.google.com/) (first time contributors only).

    - [ ] I have set up [pre-commit hooks](https://google.github.io/sbsim/contributing/#pre-commit-hooks) by running `pre-commit install` (one time only), and the pre-commit hooks pass.

    - [ ] I have added appropriate [unit tests](https://google.github.io/sbsim/contributing/#testing), and the tests pass.

    - [ ] I have added [docstrings](https://google.github.io/sbsim/contributing/#documentation) and updated the documentation as necessary, and I have previewed the [documentation site](https://google.github.io/sbsim/docs-site/) locally to make sure things look good.

    - [ ] I have self-reviewed my code (especially important if using AI agents).

    ---

    **Thank you for your contribution!**

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -21,15 +21,28 @@ data/sb1.zip @@
     data/sb1/
     # results files:
-    */**/output_data/
-    */**/metrics/
-    **/videos/
-    **/train/
-    **/eval/
-    smart_control/learning/
+    #*/**/output_data/
+    #*/**/metrics/
+    #**/videos/
+    #**/train/
+    #**/eval/
+    smart_control/configs/resources/sb1/train_sim_configs/generated/
+    # todo: use temp dir instead:
+    smart_control/configs/resources/sb1/train_sim_configs/generation_test/
     smart_control/simulator/videos
-    smart_control/refactor/data/
-    smart_control/refactor/experiment_results/
+    smart_control/reinforcement_learning/data/starter_buffers/*
+    !smart_control/reinforcement_learning/data/starter_buffers/.gitkeep
+    !smart_control/reinforcement_learning/data/starter_buffers/default
+    !smart_control/reinforcement_learning/data/starter_buffers/test
+    smart_control/reinforcement_learning/data/experiment_results/*
+    !smart_control/reinforcement_learning/data/experiment_results/.gitkeep
+    smart_control/reinforcement_learning/data/experiment_eval/*
+    !smart_control/reinforcement_learning/data/experiment_eval/.gitkeep
     # jupyter notebook checkpoints:
     smart_control/notebooks/.ipynb_checkpoints/
@@ Expand Down @@

.pre-commit-config.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -37,4 +37,4 @@ repos: @@
       rev: 0.7.22
       hooks:
       - id: mdformat
-        exclude: ^docs/api/
+        exclude: ^docs/api/|^\.github/

docs/api/reinforcement_learning/scripts.md

-Original file line number
+Diff line change
@@ -1,5 +1,9 @@
     # Scripts
+    ::: smart_control.reinforcement_learning.scripts.generate_gin_configs
     ::: smart_control.reinforcement_learning.scripts.populate_starter_buffer
     ::: smart_control.reinforcement_learning.scripts.train
+    ::: smart_control.reinforcement_learning.scripts.eval

docs/contributing.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -97,6 +97,10 @@ pytest --disable-pytest-warnings -k your_test_name_here @@
     # ignore specific test files and directories:
     pytest --ignore=path/to/your/test.py --ignore=path/to/other/
+    # display more logs:
+    pytest --disable-pytest-warnings -s --log-cli-level=INFO path/to/your/test.py
+    # display all logs:
+    pytest --disable-pytest-warnings -s --log-cli-level=DEBUG path/to/your/test.py
     ```
     ## Linting
@@ Expand Down @@

docs/guides/reinforcement_learning/scripts.md

-Original file line number
+Diff line change
@@ -0,0 +1,125 @@
+    # Reinforcement Learning Scripts
+    ## Configuration Generation
+    By default, when training an RL agent, it will use configuration options defined
+    in the base gin config file (see
+    "smart_control/configs/resources/\<dataset_id>/sim_config.gin").
+    However if you would like to use different configuration options, you can use
+    the configuration generation script to flexibly create alternative config files
+    with slight modifications to the base config file.
+    Generate different configuration files to use during training:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.generate_gin_configs
+    ```
+    By default, the script will use the following parameter grid:
+    - `time_steps`: `['300']`
+    - `num_days`: `['1', '7', '14', '30']`
+    - `start_timestamps`: ['2023-07-06']
+    Optionally pass any of these command line flags to customize the parameter grid:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.generate_gin_configs \
+      --time_steps 300,600,900 \
+      --num_days 1,7,14 \
+      --start_timestamps 2023-07-06,2023-08-06,2023-10-06
+    ```
+    This script will generate a different file for each combination of custom
+    parameter values you specify. The files will be written to the
+    "smart_control/configs/resources/\<dataset_id>/train_sim_configs/generated"
+    directory. Each file name will contain the parameter values you choose (e.g.
+    "step_300_days_1_start_20230706.gin").
+    ## Starter Buffer Population
+    Populate an initial replay buffer with initial exploration data, to provide a
+    starting point when training RL agents:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer
+    ```
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
+        --buffer_name buffer_xyz
+        --config_path smart_control/configs/resources/sb1/sim_config.gin
+    ```
+    This creates a directory corresponding with the buffer name in
+    "smart_control/reinforcement_learning/data/starter_buffers".
+    A "default" starter buffer has been created for example purposes:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
+        --buffer_name default \
+        --num_runs 5 \
+        --capacity 50000 \
+        --steps_per_run 100 \
+        --sequence_length 2
+    ```
+    A "test" starter buffer has been created for testing purposes:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
+        --buffer_name test \
+        --num_runs 1 \
+        --steps_per_run 1 \
+        --capacity 100 \
+        --sequence_length 2
+    ```
+    ## RL Agent Training
+    Train a reinforcement learning agent, choosing a unique name for the experiment:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.train \
+        --experiment_name="my-experiment-1"
+    ```
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.train \
+        --experiment_name="my-experiment-2" \
+        --starter_buffer_name="default" \
+        --agent_type="sac" \
+        --learner_iterations=3 \
+        --train_iterations=10 \
+        --collect_steps_per_training_iteration=5
+    ```
+    This will generate a new experiment results directory under
+    "smart_control/reinforcement_learning/data/experiment_results/`experiment_name`".
+    In the experiment results directory will be the following files and directories:
+    - "collect" directory
+    - "eval" directory
+    - "metrics" directory
+    - "replay_buffer" directory
+    - "experiment_parameters.json" file
+    - "experiment_parameters.txt" file
+    ## Evaluation
+    Evaluate a previously trained agent, specifying an experiment name that
+    references an existing experiment results directory:
+    ```sh
+    python -m smart_control.reinforcement_learning.scripts.eval \
+        --eval_experiment_name my-experiment-1
+    ```
+    ```sh
+    python scripts/eval.py
+      --policy-dir experiment_results/ddpg_train_run-july-6th_2025_04_07-12:50:40/policies/
+      --gin-config /home/gabriel-user/projects/sbsim/smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-11-06.gin
+      --experiment-name ddpg_train-summer_eval-winter
+    ```

docs/setup/linux.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -120,7 +120,7 @@ cd ../..
  
    By default, simulation videos are stored in the "simulator/videos" directory

    (which is ignored from version control). If you would like to customize this

    location, use the `SIM_VIDEOS_DIRPATH` environment variable.

    location, use the `SIM_VIDEOS_DIR` environment variable.

    You can pass environment variable(s) at runtime, or create a local ".env" file

    and set your desired value(s) there:

    @@ -129,7 +129,7 @@ and set your desired value(s) there:
  
    # this is the ".env" file...

    # customizing the directory where simulation videos are stored:

    SIM_VIDEOS_DIRPATH="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"

    SIM_VIDEOS_DIR="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"

    ```

    ## Notebook Setup

docs/setup/mac.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -121,7 +121,7 @@ cd ../..
  
    By default, simulation videos are stored in the "simulator/videos" directory

    (which is ignored from version control). If you would like to customize this

    location, use the `SIM_VIDEOS_DIRPATH` environment variable.

    location, use the `SIM_VIDEOS_DIR` environment variable.

    You can pass environment variable(s) at runtime, or create a local ".env" file

    and set your desired value(s) there:

    @@ -130,7 +130,7 @@ and set your desired value(s) there:
  
    # this is the ".env" file...

    # customizing the directory where simulation videos are stored:

    SIM_VIDEOS_DIRPATH="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"

    SIM_VIDEOS_DIR="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"

    ```

    ## Notebook Setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforcement Learning Module, Part 2 #98

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Reinforcement Learning Module, Part 2 #98

Are you sure you want to change the base?

Uh oh!

Reinforcement Learning Module, Part 2 #98

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!