diff --git a/README.md b/README.md
index 2649542a..92834ec9 100644
--- a/README.md
+++ b/README.md
@@ -25,11 +25,6 @@ from cloud storage.
 View the official [Documentation Site](https://google.github.io/sbsim/) for a
 complete auto-generated API reference.
 
-There is also a legacy unofficial
-[Community-run Documentation Site](https://gitwyd.github.io/sbsim_documentation/)
-containing more information about the project and the codebase. We plan to merge
-all this content into the official documentation site soon.
-
 ## Getting Started
 
 A great place to start is by reviewing the
diff --git a/docs/community/additional-resources.md b/docs/community/additional-resources.md
new file mode 100644
index 00000000..57d5c292
--- /dev/null
+++ b/docs/community/additional-resources.md
@@ -0,0 +1,28 @@
+---
+layout: "default"
+title: "Additional Resources"
+nav_order: 11
+parent: "Smart Control Project Documentation"
+---
+
+# Additional Resources
+
+- **Gin Configuration Guide**:
+
+  - Learn more about Gin configurations at [Gin Config Documentation](https://github.com/google/gin-config).
+
+- **TF-Agents Documentation**:
+
+  - Explore TF-Agents for reinforcement learning environments and agents at [TF-Agents](https://www.tensorflow.org/agents).
+
+- **Energy Cost Modeling**:
+
+  - Research energy cost and carbon emission modeling for deeper understanding.
+
+- **Reinforcement Learning Concepts**:
+
+  - Review reinforcement learning fundamentals to effectively develop and test agents.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/assets/visualization-module-images/action_plot.png b/docs/community/assets/visualization-module-images/action_plot.png
new file mode 100644
index 00000000..9ce4ef0e
Binary files /dev/null and b/docs/community/assets/visualization-module-images/action_plot.png differ
diff --git a/docs/community/assets/visualization-module-images/cum_reward_plot.png b/docs/community/assets/visualization-module-images/cum_reward_plot.png
new file mode 100644
index 00000000..03795aab
Binary files /dev/null and b/docs/community/assets/visualization-module-images/cum_reward_plot.png differ
diff --git a/docs/community/assets/visualization-module-images/reward_plot.png b/docs/community/assets/visualization-module-images/reward_plot.png
new file mode 100644
index 00000000..333d2fd5
Binary files /dev/null and b/docs/community/assets/visualization-module-images/reward_plot.png differ
diff --git a/docs/community/base-reward-function.md b/docs/community/base-reward-function.md
new file mode 100644
index 00000000..b071ab7b
--- /dev/null
+++ b/docs/community/base-reward-function.md
@@ -0,0 +1,33 @@
+---
+layout: "default"
+title: "BaseSetpointEnergyCarbonRewardFunction"
+nav_order: 1
+parent: "Reward Functions"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# BaseSetpointEnergyCarbonRewardFunction
+
+**Purpose**: Provides a base class for reward functions that consider productivity, energy cost, and carbon emissions.
+
+## Key Attributes
+
+- `max_productivity_personhour_usd`: Maximum productivity per person-hour in USD.
+- `productivity_midpoint_delta`: Temperature difference from setpoint at which productivity is half of the maximum.
+- `productivity_decay_stiffness`: Controls the slope of the productivity decay curve.
+
+## Key Methods
+
+- `__init__(...)`: Initializes the reward function with productivity parameters.
+- `compute_reward(energy_reward_info)`: Abstract method to compute the reward; to be implemented by subclasses.
+- `_sum_zone_productivities(energy_reward_info)`: Calculates cumulative productivity across all zones.
+- `_get_zone_productivity_reward(...)`: Computes productivity reward for a single zone based on temperature.
+- `_get_delta_time_sec(energy_reward_info)`: Calculates the time interval in seconds.
+- `_sum_electricity_energy_rate(energy_reward_info)`: Sums up electrical energy rates from devices.
+- `_sum_natural_gas_energy_rate(energy_reward_info)`: Sums up natural gas energy rates from devices.
+
+---
+
+[Back to Reward Functions](reward-functions.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/best-practices.md b/docs/community/best-practices.md
new file mode 100644
index 00000000..62399b11
--- /dev/null
+++ b/docs/community/best-practices.md
@@ -0,0 +1,42 @@
+---
+layout: "default"
+title: "Best Practices"
+nav_order: 10
+parent: "Smart Control Project Documentation"
+---
+
+# Best Practices
+
+- **Modular Design**:
+
+  - Keep modules and classes focused on single responsibilities.
+  - Facilitate reuse and maintainability.
+
+- **Error Handling**:
+
+  - Use try-except blocks where exceptions might occur.
+  - Provide meaningful error messages and log exceptions.
+
+- **Logging**:
+
+  - Use the `logging` module instead of print statements.
+  - Set appropriate logging levels (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`).
+
+- **Documentation**:
+
+  - Keep documentation up to date with code changes.
+  - Use comments to explain complex logic or decisions.
+
+- **Performance Optimization**:
+
+  - Profile code to identify bottlenecks.
+  - Optimize algorithms and data structures where necessary.
+
+- **Configuration Management**:
+
+  - Use configuration files (e.g., Gin) to manage parameters.
+  - Avoid hardcoding values in the codebase.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/building-simulation.md b/docs/community/building-simulation.md
new file mode 100644
index 00000000..0397e324
--- /dev/null
+++ b/docs/community/building-simulation.md
@@ -0,0 +1,36 @@
+---
+layout: "default"
+title: "Building Simulation"
+nav_order: 1
+parent: "Simulation Components"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Building Simulation
+
+**Purpose**: Represents the thermal and physical properties of the building, simulating how it responds to HVAC inputs and environmental conditions.
+
+## Key Classes and Components
+
+- **`FloorPlanBasedBuilding`**:
+
+  - Simulates the building's structure based on a floor plan and zone mappings.
+
+  - **Attributes**:
+
+    - `cv_size_cm`: Control volume size in centimeters.
+    - `floor_height_cm`: Height of each floor.
+    - `initial_temp`: Initial uniform interior temperature.
+    - `inside_air_properties`: Thermal properties of the air inside the building.
+    - `inside_wall_properties`: Thermal properties of the interior walls.
+    - `building_exterior_properties`: Thermal properties of the exterior building.
+    - `floor_plan_filepath`: Path to the floor plan file.
+    - `zone_map_filepath`: Path to the zone mapping file.
+    - `convection_simulator`: Simulates heat convection within the building.
+    - `reset_temp_values`: Function to reset temperature values.
+
+---
+
+[Back to Simulation Components](simulation-components.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/configuration.md b/docs/community/configuration.md
new file mode 100644
index 00000000..c5da8c71
--- /dev/null
+++ b/docs/community/configuration.md
@@ -0,0 +1,124 @@
+---
+layout: "default"
+title: "Configuration"
+nav_order: 6
+parent: "Smart Control Project Documentation"
+---
+
+# Configuration
+
+The project uses Gin configuration files (`*.gin`) to manage simulation settings, reward function parameters, and environment configurations.
+
+## 1. Gin Configuration Files
+
+Gin is a lightweight configuration framework for Python that allows parameter bindings via configuration files or command-line arguments.
+
+### Structure of Gin Files
+
+- **Parameter Definitions**: Define values for various parameters used throughout the simulation.
+- **Function and Class Bindings**: Bind parameters to specific functions and classes.
+- **References**: Use `@` to reference functions or classes, and `%` to reference parameters.
+
+### Example from a Gin File
+
+To illustrate, here is an example of a Gin configuration:
+
+```
+# paths
+controller_reader.ProtoReader.input_dir = @get_histogram_path()
+floor_plan_filepath = @get_zone_path()
+zone_map_filepath = @get_zone_path()
+metrics_path = @get_metrics_path()
+```
+
+## 2. Key Configuration Parameters
+
+### Simulation Parameters
+
+- **Weather Conditions**:
+  - `convection_coefficient`: Coefficient for heat convection between the building and the environment.
+  - `ambient_high_temp`, `ambient_low_temp`: High and low ambient temperatures for sinusoidal temperature variation.
+
+- **Building Properties**:
+  - `control_volume_cm`: Size of the control volume in centimeters.
+  - `floor_height_cm`: Height of each floor.
+  - `initial_temp`: Initial temperature inside the building.
+  - `exterior_cv_conductivity`, `exterior_cv_density`, `exterior_cv_heat_capacity`: Thermal properties of the exterior building.
+  - `interior_wall_cv_conductivity`, `interior_wall_cv_density`, `interior_wall_cv_heat_capacity`: Thermal properties of the interior walls.
+  - `interior_cv_conductivity`, `interior_cv_density`, `interior_cv_heat_capacity`: Thermal properties of the interior air.
+
+- **HVAC Settings**:
+  - `water_pump_differential_head`, `water_pump_efficiency`: Parameters for the water pump.
+  - `reheat_water_setpoint`: Setpoint temperature for reheating water.
+  - `boiler_heating_rate`, `boiler_cooling_rate`: Heating and cooling rates for the boiler.
+  - `fan_differential_pressure`, `fan_efficiency`: Parameters for the HVAC fan.
+  - `air_handler_heating_setpoint`, `air_handler_cooling_setpoint`: Temperature setpoints for the air handler.
+  - `air_handler_recirculation_ratio`: Recirculation ratio for the air handler.
+  - `vav_max_air_flowrate`, `vav_reheat_water_flowrate`: Maximum flow rates for VAV boxes.
+
+- **Occupancy Model**:
+  - `morning_start_hour`, `evening_start_hour`: Hours defining the occupancy schedule.
+  - `heating_setpoint_day`, `cooling_setpoint_day`: Setpoints during the day.
+  - `heating_setpoint_night`, `cooling_setpoint_night`: Setpoints during the night.
+  - `work_occupancy`, `nonwork_occupancy`: Occupancy levels during work and non-work hours.
+  - `earliest_expected_arrival_hour`, `latest_expected_arrival_hour`: Arrival times.
+  - `earliest_expected_departure_hour`, `latest_expected_departure_hour`: Departure times.
+
+- **Time Settings**:
+  - `time_step_sec`: Simulation time step in seconds.
+  - `start_timestamp`: Start time of the simulation.
+  - `time_zone`: Time zone for the simulation.
+
+### Reward Function Parameters
+
+- `max_productivity_personhour_usd`, `min_productivity_personhour_usd`: Productivity per person-hour.
+- `productivity_midpoint_delta`, `productivity_decay_stiffness`: Parameters for productivity decay curve.
+- `max_electricity_rate`, `max_natural_gas_rate`: Maximum energy rates for normalization.
+- `productivity_weight`, `energy_cost_weight`, `carbon_emission_weight`: Weights for reward components.
+
+### Action Normalization Parameters
+
+- **Supply Water Setpoint**:
+  - `min_normalized_value`, `max_normalized_value`: Normalized action value range.
+  - `min_native_value`, `max_native_value`: Native action value range (e.g., temperature in Kelvin).
+
+- **Supply Air Heating Temperature Setpoint**:
+  - Similar normalization parameters as above.
+
+### Observation Normalization Parameters
+
+- **Per-Measurement Normalizers**:
+  - For each measurement (e.g., `building_air_static_pressure_sensor`, `cooling_percentage_command`), define:
+    - `field_id`: Identifier for the field.
+    - `sample_mean`: Mean value used for normalization.
+    - `sample_variance`: Variance used for normalization.
+
+### Environment Parameters
+
+- `discount_factor`: Discount factor for future rewards.
+- `num_days_in_episode`: Number of days in an episode.
+- `metrics_reporting_interval`: Interval for reporting metrics.
+- `label`: Label for the simulation or environment.
+- `num_hod_features`, `num_dow_features`: Number of hour-of-day and day-of-week features.
+
+### Bindings and References
+
+Bind classes and functions to configured parameters, for example:
+
+```
+sim_building/TFSimulator:
+  building = @sim/FloorPlanBasedBuilding()
+  hvac = @sim/FloorPlanBasedHvac()
+  weather_controller = %weather_controller
+  time_step_sec = %time_step_sec
+  convergence_threshold = %convergence_threshold
+  iteration_limit = %iteration_limit
+  iteration_warning = %iteration_warning
+  start_timestamp = @sim/to_timestamp()
+```
+
+Reference parameters using `%` and functions or classes using `@`.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/ddpg.md b/docs/community/ddpg.md
new file mode 100644
index 00000000..20ed057d
--- /dev/null
+++ b/docs/community/ddpg.md
@@ -0,0 +1,24 @@
+---
+layout: "default"
+title: "Deep Deterministic Policy Gradient"
+nav_order: 3
+parent: "Learning Algorithms"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Deep Deterministic Policy Gradient (DDPG)
+Implementation of this algorithm can be found in the `DDPG_Demo.ipynb` notebook.
+
+---
+
+Deep Deterministic Policy Gradient (DDPG) is a model-free reinforcement learning algorithm designed for continuous action spaces. It combines the actor-critic framework with deterministic policy gradients. Key features of DDPG include:
+
+- **Actor-Critic Architecture**: Utilizes an actor network to select actions and a critic network to evaluate the Q-values of state-action pairs.
+- **Off-Policy Learning**: Learns using a replay buffer to sample past experiences, improving stability and efficiency.
+- **Target Networks**: Employs slowly updated target networks to stabilize training.
+- **Exploration**: Uses noise (e.g., Ornstein-Uhlenbeck process) for exploration in continuous spaces.
+
+
+[Back to Learning Algorithms](learning-algorithms.md)
+
+[Back to Home](../index.md)
\ No newline at end of file
diff --git a/docs/community/electricity-energy-cost.md b/docs/community/electricity-energy-cost.md
new file mode 100644
index 00000000..351a3e7a
--- /dev/null
+++ b/docs/community/electricity-energy-cost.md
@@ -0,0 +1,35 @@
+---
+layout: "default"
+title: "ElectricityEnergyCost"
+nav_order: 4
+parent: "Reward Functions"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# ElectricityEnergyCost
+
+**Purpose**: Models the cost and carbon emissions associated with electricity consumption.
+
+## Key Attributes
+
+- `weekday_energy_prices`: Energy prices for weekdays by hour.
+- `weekend_energy_prices`: Energy prices for weekends by hour.
+- `carbon_emission_rates`: Carbon emission rates by hour.
+
+## Key Methods
+
+- `__init__(...)`: Initializes the energy cost model with price and emission schedules.
+- `cost(start_time, end_time, energy_rate)`: Calculates the cost of electricity consumed over a time interval.
+- `carbon(start_time, end_time, energy_rate)`: Calculates the carbon emissions from electricity consumption over a time interval.
+
+## Calculation Logic
+
+- Determines the appropriate energy price and carbon emission rate based on the time and day.
+- Converts energy rates to costs and emissions using provided schedules.
+- Supports variable pricing and emission factors throughout the day.
+
+---
+
+[Back to Reward Functions](reward-functions.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/environment.md b/docs/community/environment.md
new file mode 100644
index 00000000..5149e414
--- /dev/null
+++ b/docs/community/environment.md
@@ -0,0 +1,139 @@
+---
+layout: "default"
+title: "Environment Module"
+nav_order: 2
+parent: "Smart Control Project Documentation"
+---
+
+# Environment Module
+
+The `environment` module provides the reinforcement learning environment where the agent interacts with the building simulation to control HVAC systems.
+
+## `environment/environment.py`
+
+**Purpose**: Implements a controllable building RL environment compatible with TF-Agents, allowing agents to control various setpoints with the goal of making the HVAC system more efficient.
+
+### Key Classes and Functions
+
+- **`Environment`** (inherits from `py_environment.PyEnvironment`):
+
+  - **Attributes**:
+
+    - **Building and Simulation Components**:
+
+      - `_building`: Instance of `BaseBuilding`, representing the simulated building environment.
+      - `_time_zone`: Time zone of the building/environment.
+      - `_start_timestamp` and `_end_timestamp`: Start and end times of the episode.
+      - `_step_interval`: Time interval between environment steps.
+      - `_num_timesteps_in_episode`: Number of timesteps in an episode.
+      - `_observation_request`: Template for requesting observations from the building.
+
+    - **Agent Interaction Components**:
+
+      - `_action_spec`: Specification of the actions that can be taken.
+      - `_observation_spec`: Specification of the observations.
+      - `_action_normalizers`: Mapping from action names to their normalizers.
+      - `_action_names`: List of action field names.
+      - `_field_names`: List of observation field names.
+      - `_observation_normalizer`: Normalizes observations received from the building.
+      - `_default_policy_values`: Default actions in the policy.
+
+    - **Reward and Metrics**:
+
+      - `_reward_function`: Instance of `BaseRewardFunction` used to compute rewards.
+      - `_discount_factor`: Discount factor for future rewards.
+      - `_metrics`: Stores metrics for analysis and visualization.
+      - `_metrics_reporting_interval`: Interval for reporting metrics.
+      - `_summary_writer`: Writes summary data for visualization tools like TensorBoard.
+
+    - **Episode and Step Tracking**:
+
+      - `_episode_ended`: Boolean flag indicating if the episode has ended.
+      - `_step_count`: Number of steps taken in the current episode.
+      - `_global_step_count`: Total number of steps taken across episodes.
+      - `_episode_count`: Number of episodes completed.
+      - `_episode_cumulative_reward`: Cumulative reward for the current episode.
+
+  - **Methods**:
+
+    - `__init__(...)`: Initializes the environment with the specified parameters and configurations.
+
+    - **Environment Lifecycle Methods**:
+
+      - `_reset()`: Resets the environment to its initial state, preparing for a new episode.
+      - `_step(action)`: Executes one time step in the environment given an action from the agent.
+      - `_has_episode_ended(last_timestamp)`: Checks if the episode has ended based on time or steps.
+
+    - **Agent Interaction Methods**:
+
+      - `action_spec()`: Returns the specification of the actions that can be taken.
+      - `observation_spec()`: Returns the specification of the observations.
+      - `_get_observation()`: Retrieves and processes observations from the building, including normalizing and handling missing data.
+      - `_create_action_request(action_array)`: Converts normalized agent actions into native action values for the building.
+
+    - **Reward Calculation Methods**:
+
+      - `_get_reward()`: Computes the reward for the last action taken based on the reward function.
+      - `_write_summary_reward_info_metrics(reward_info)`: Writes reward input metrics into summary logs.
+      - `_write_summary_reward_response_metrics(reward_response)`: Writes reward output metrics into summary logs.
+      - `_commit_reward_metrics()`: Aggregates and writes reward metrics, and resets accumulators.
+
+    - **Utility Methods**:
+
+      - `_get_action_spec_and_normalizers(...)`: Defines the action space and normalizers based on the building devices and action configurations.
+      - `_get_observation_spec(...)`: Defines the observation space based on the building devices and observation configurations.
+      - `_format_action(action, action_names)`: Reformat actions if necessary (extension point for subclasses).
+      - `render(mode)`: (Not implemented) Intended for rendering the environment state.
+
+    - **Properties**:
+
+      - `steps_per_episode`: Number of steps in an episode.
+      - `start_timestamp`: Start time of the episode.
+      - `end_timestamp`: End time of the episode.
+      - `default_policy_values`: Default actions in the policy.
+      - `label`: Label for the environment or episode.
+      - `current_simulation_timestamp`: Current simulation time.
+
+## Environment Workflow
+
+1. **Initialization**:
+
+   - The environment is initialized with specified parameters, including the building simulation, reward function, observation normalizer, and action configurations.
+   - Action and observation specifications are set up based on devices and configurations, involving action normalizers and mappings.
+   - Auxiliary features such as time of day and day of week are prepared.
+
+2. **Resetting the Environment**:
+
+   - The `_reset()` method resets the building simulation and environment metrics.
+   - Episode counters and timestamps are initialized.
+   - The initial observation is generated by calling `_get_observation()`.
+
+3. **Stepping through the Environment**:
+
+   - **Action Application**:
+
+     - The `_step(action)` method processes the action from the agent.
+     - Actions are normalized and converted into native values using `_create_action_request(action_array)`.
+     - The action request is sent to the building simulation via `self.building.request_action(action_request)`.
+     - The environment handles action responses, including logging and handling rejections.
+
+   - **Observation Retrieval**:
+
+     - After the action is applied, `_get_observation()` retrieves observations from the building.
+     - Observations are normalized using the observation normalizer.
+     - Time features (hour of day, day of week) and occupancy features are added to the observation.
+     - Missing or invalid observations are handled using past data.
+
+   - **Reward Calculation**:
+
+     - The reward is computed using `_get_reward()`, which invokes the reward function's `compute_reward()` method.
+     - Reward metrics are logged and written to summary writers if configured.
+
+   - **Episode Termination**:
+
+     - The environment checks if the episode has ended using `_has_episode_ended(last_timestamp)`.
+     - If the episode has ended, a terminal time step is returned, and the environment is reset for the next episode.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/glossary.md b/docs/community/glossary.md
new file mode 100644
index 00000000..1a837fc0
--- /dev/null
+++ b/docs/community/glossary.md
@@ -0,0 +1,24 @@
+---
+layout: "default"
+title: "Glossary"
+nav_order: 12
+parent: "Smart Control Project Documentation"
+---
+
+# Glossary
+
+- **HVAC**: Heating, Ventilation, and Air Conditioning systems.
+- **RL**: Reinforcement Learning.
+- **TF-Agents**: A library for reinforcement learning in TensorFlow.
+- **Setpoint**: The desired temperature or condition that a control system aims to maintain.
+- **Regret Function**: A type of reward function that measures the difference between the actual performance and the optimal performance.
+- **Gin Configuration**: A lightweight configuration framework for Python.
+- **Productivity Decay Curve**: A function describing how productivity decreases as conditions deviate from optimal setpoints.
+- **Energy Cost Models**: Models that calculate the cost and carbon emissions associated with energy consumption.
+- **Normalization**: The process of scaling data to a standard range or distribution.
+- **VAV**: Variable Air Volume systems used in HVAC to control the amount of air flow to different areas.
+- **Control Volume**: A defined region in space through which fluid may flow in and out, used in simulations to model physical systems.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/hvac-systems.md b/docs/community/hvac-systems.md
new file mode 100644
index 00000000..3984e32d
--- /dev/null
+++ b/docs/community/hvac-systems.md
@@ -0,0 +1,57 @@
+---
+layout: "default"
+title: "HVAC Systems"
+nav_order: 2
+parent: "Simulation Components"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# HVAC Systems
+
+**Purpose**: Models the heating, ventilation, and air conditioning systems of the building, including air handlers, boilers, and variable air volume (VAV) boxes.
+
+## Key Classes and Components
+
+- **`AirHandler`**:
+
+  - Controls the air flow and temperature in the building.
+
+  - **Attributes**:
+
+    - `recirculation`: Percentage of fresh air in the recirculation.
+    - `heating_air_temp_setpoint`: Setpoint for heating air temperature.
+    - `cooling_air_temp_setpoint`: Setpoint for cooling air temperature.
+    - `fan_differential_pressure`: Pressure difference across the fan.
+    - `fan_efficiency`: Efficiency of the fan.
+    - `max_air_flow_rate`: Maximum air flow rate.
+    - `sim_weather_controller`: Weather controller for ambient conditions.
+
+- **`Boiler`**:
+
+  - Provides heating by controlling water temperature.
+
+  - **Attributes**:
+
+    - `reheat_water_setpoint`: Setpoint for reheat water temperature.
+    - `water_pump_differential_head`: Pressure difference across the water pump.
+    - `water_pump_efficiency`: Efficiency of the water pump.
+    - `heating_rate`: Rate at which the boiler can increase temperature.
+    - `cooling_rate`: Rate at which the boiler can decrease temperature.
+
+- **`FloorPlanBasedHvac`**:
+
+  - Integrates the HVAC components into the building simulation.
+
+  - **Attributes**:
+
+    - `air_handler`: Instance of `AirHandler`.
+    - `boiler`: Instance of `Boiler`.
+    - `schedule`: Setpoint schedule for HVAC operation.
+    - `vav_max_air_flow_rate`: Maximum air flow rate for VAV boxes.
+    - `vav_reheat_max_water_flow_rate`: Maximum water flow rate for reheating.
+
+---
+
+[Back to Simulation Components](simulation-components.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/learning-algorithms.md b/docs/community/learning-algorithms.md
new file mode 100644
index 00000000..b3f251d5
--- /dev/null
+++ b/docs/community/learning-algorithms.md
@@ -0,0 +1,14 @@
+---
+layout: "default"
+title: "Learning Algorithms"
+nav_order: 5
+parent: "Smart Control Project Documentation"
+has_children: true
+---
+
+
+# Learning Algorithms
+
+One of the goals of building a simulation environment like the one presented here is to use it to train learning algorithms to optimize control of that building. On that note, we provide implementation of a few learning algorithms implemented to explore this possibility.
+
+----
\ No newline at end of file
diff --git a/docs/community/mcts.md b/docs/community/mcts.md
new file mode 100644
index 00000000..166f4062
--- /dev/null
+++ b/docs/community/mcts.md
@@ -0,0 +1,37 @@
+---
+layout: "default"
+title: "Monte Carlo Tree Search"
+nav_order: 1
+parent: "Learning Algorithms"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Monte Carlo Tree Search
+
+MCTS is not technically an RL algorithm, but can be attempted and leveraged as a potential baseline performance metric.
+
+In the proposed setup, each node in the MCTS tree is identified by a given sequence of actions taken since the episode's start. Nodes in different depths of the tree are a fixed number of `expand_steps` apart, such that the MCTS agent can only decide on a new action every `expand_steps` number of steps.
+
+Implementing MCTS in this setup comes with a few challenges, which we highlight here:
+
+## Slow Rollouts and Parallelism
+
+Typically, a rollout in an MCTS consists of choosing some default policy (e.g. random moves in a game), and then playing the episode to completion. In this case, doing this is very slow, as running an environment simulation is time consuming. Because of this, we only perform rollouts for a smaller fixed number of steps `rollout_steps` after the corresponding node's timestamp.
+
+To address this problem, we also adapt the implementation to support parallelism using `multiprocessing`. This was favored instead of multithreading due to the GIL lock. Because we are not dealing with async operations, multithreading should not provide a performance benefit, as only one thread is allowed to execute code at a time. Multiprocessing however, allows multiple distinct python processes, which should then obtain a speedup.
+
+The design of `SbsimMonteCarloTreeSearchNode` and `SbSimMonteCarloTreeSearch` was adapted in order to support `multiprocessing`. For example, since the node objects can't be serialized and thus can't be passed into workers, methods such as `SbsimMonteCarloTreeSearchNode.run_rollout` are made static to allow workers to execute them independently.
+
+## Node Selection
+
+As described above, our implementation performs many concurrent rollouts. To choose which nodes to expand, the method `SbSimMonteCarloTreeSearch.get_nodes_for_expansion` is used. This method chooses the `@ param num_nodes` nodes to expand by finding adding a node if it is not fully expanded, and then recursively calling itself on the node's children, in order from the highest to lowest score. The constant `@ param c_param` should control the balance between exploration and exploitation.
+
+## Node Evaluation
+
+Node evaluation: to evaluate a node, we consider its performance in comparison to the a baseline schedule policy. A node's score is given by the sum of the differences (when compared to the baseline policy return) of the rollout returns of all its children. To keep things fair, this is then normalized by the sum of the number of hours elapsed until the rollout end for all children. In summary, this means that each node is scored by the *average rollout return improvement per hour for all it's children*.
+
+---
+
+[Back to Learning Algorithms](learning-algorithms.md)
+
+[Back to Home](../index.md)
\ No newline at end of file
diff --git a/docs/community/metrics-interpretation.md b/docs/community/metrics-interpretation.md
new file mode 100644
index 00000000..b35c5a70
--- /dev/null
+++ b/docs/community/metrics-interpretation.md
@@ -0,0 +1,36 @@
+---
+layout: "default"
+title: "Metrics Interpretation"
+nav_order: 8
+parent: "Smart Control Project Documentation"
+---
+
+# Understanding the Logs in the Metrics directory
+
+The metrics files (observation response, action response, reward infos, zone info, and device info) are saved in the `metrics` path specified by you in the SAC_Demo notebook, organized by directories corresponding to different runs and each metric file corresponding to the metric for a single day. These can be read using the functions in smart_control.controller_reader.py, which starts by creating a ProtoReader with the path to the metrics directory. This section will explain the components of the different metric logs. 
+
+## Observation Responses
+
+For every five minutes, i.e. 300 second differences between timestamps, this log first lists all observation requests, i.e. which measurements for which sensor (identified by the device id).
+Then, it lists each observation response, which contains the timestamp, value for the measurement, and the validity of the measurement (true or false), as well as the corresponding request. 
+You can read the log file using ProtoReader.read_observation_responses(start_timestamp, end_timestamp). 
+Ex. reader = controller_reader.ProtoReader('dataset/SB1/19/')
+reader.read_observation_responses(pd.Timestamp('2019-03-16T00:00:00'), pd.Timestamp('2019-03-17T00:00:00'))
+Each timestamp in the result is of type protobuf.Timestamp, which can be converted to an interpretable pd.Timestamp object using the protos_to_pandas_timestamp() function in smart_control.utils. 
+
+## Action Responses
+Requests entail supply air temperature at two different devices and supply water temperature at one device, corresponding to action values.  Responses detail how the given device is responding to requested new value.
+
+## Reward Responses
+Split up into 5 minute intervals again, this provides the breakdown of the reward value at that time step. 
+
+## Reward Infos 
+These are organized in 5 minute intervals, with each entry organized by zone ID corresponding to the zone info file. Each zone contains the measurements providing the values for the described metrics corresponding to the agent_id (ex. "baseline_policy") and scenario_id (ex. "baseline_collect").  
+
+## Device Info
+This provides information regarding each device, identified by a unique device ID and code as well as its namespace and type. Each device has a specific set of observable fields and action fields that are listed underneath its identifiers. 
+
+## Zone Info
+Each entry represents a specific zone identified by zone_id and zone_descriotion, inside a building identified by building_id on the given floor. It also contains a list of IDs for the devices present inside the given zone. 
+
+[Back to Home](../index.md)
\ No newline at end of file
diff --git a/docs/community/natural-gas-energy-cost.md b/docs/community/natural-gas-energy-cost.md
new file mode 100644
index 00000000..c145215e
--- /dev/null
+++ b/docs/community/natural-gas-energy-cost.md
@@ -0,0 +1,33 @@
+---
+layout: "default"
+title: "NaturalGasEnergyCost"
+nav_order: 5
+parent: "Reward Functions"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# NaturalGasEnergyCost
+
+**Purpose**: Models the cost and carbon emissions associated with natural gas consumption.
+
+## Key Attributes
+
+- `gas_price_by_month`: Gas prices by month.
+
+## Key Methods
+
+- `__init__(gas_price_by_month)`: Initializes the energy cost model with monthly gas prices.
+- `cost(start_time, end_time, energy_rate)`: Calculates the cost of natural gas consumed over a time interval.
+- `carbon(start_time, end_time, energy_rate)`: Calculates the carbon emissions from natural gas consumption over a time interval.
+
+## Calculation Logic
+
+- Uses monthly gas prices to determine the cost.
+- Converts energy rates to costs and emissions based on standard conversion factors.
+- Accounts for natural gas being used primarily for heating.
+
+---
+
+[Back to Reward Functions](reward-functions.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/occupancy-models.md b/docs/community/occupancy-models.md
new file mode 100644
index 00000000..6bb6b724
--- /dev/null
+++ b/docs/community/occupancy-models.md
@@ -0,0 +1,33 @@
+---
+layout: "default"
+title: "Occupancy Models"
+nav_order: 4
+parent: "Simulation Components"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Occupancy Models
+
+**Purpose**: Simulates occupancy patterns within the building, affecting internal heat gains and productivity.
+
+## Key Classes and Components
+
+- **`RandomizedArrivalDepartureOccupancy`**:
+
+  - Models occupancy with randomized arrival and departure times.
+
+  - **Attributes**:
+
+    - `zone_assignment`: Occupancy level assigned to zones.
+    - `earliest_expected_arrival_hour`: Earliest possible arrival time.
+    - `latest_expected_arrival_hour`: Latest possible arrival time.
+    - `earliest_expected_departure_hour`: Earliest possible departure time.
+    - `latest_expected_departure_hour`: Latest possible departure time.
+    - `time_step_sec`: Time step in seconds.
+    - `time_zone`: Time zone for the simulation.
+
+---
+
+[Back to Simulation Components](simulation-components.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/regret-reward-function.md b/docs/community/regret-reward-function.md
new file mode 100644
index 00000000..24a08740
--- /dev/null
+++ b/docs/community/regret-reward-function.md
@@ -0,0 +1,41 @@
+---
+layout: "default"
+title: "SetpointEnergyCarbonRegretFunction"
+nav_order: 3
+parent: "Reward Functions"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# SetpointEnergyCarbonRegretFunction
+
+**Purpose**: Implements a reward function that calculates regret based on deviations from optimal productivity, energy cost, and carbon emissions.
+
+## Key Attributes
+
+- Inherits from `BaseSetpointEnergyCarbonRewardFunction`.
+- `max_productivity_personhour_usd`: Maximum productivity per person-hour in USD.
+- `min_productivity_personhour_usd`: Minimum productivity per person-hour in USD.
+- `max_electricity_rate`: Maximum electricity energy rate for normalization.
+- `max_natural_gas_rate`: Maximum natural gas energy rate for normalization.
+- `productivity_weight`: Weight for productivity in the regret calculation.
+- `energy_cost_weight`: Weight for energy cost in the regret calculation.
+- `carbon_emission_weight`: Weight for carbon emissions in the regret calculation.
+
+## Key Methods
+
+- `__init__(...)`: Initializes the reward function with parameters for regret calculation.
+- `compute_reward(energy_reward_info)`: Computes the normalized regret based on productivity, energy cost, and carbon emissions.
+
+## Regret Calculation Logic
+
+- Determines the maximum and minimum possible productivity.
+- Calculates the normalized productivity regret.
+- Normalizes the energy costs and carbon emissions against their maximum values.
+- Combines the normalized components using specified weights.
+- Produces a final reward value representing the regret.
+
+---
+
+[Back to Reward Functions](reward-functions.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/reinforcement-learning-module.md b/docs/community/reinforcement-learning-module.md
new file mode 100644
index 00000000..ef3c3179
--- /dev/null
+++ b/docs/community/reinforcement-learning-module.md
@@ -0,0 +1,25 @@
+---
+layout: "default"
+title: "Reinforcement Learning Module"
+nav_order: 10
+parent: "Smart Control Project Documentation"
+---
+# Reinforcement Learning Module
+## Structure
+The reinforcement learning module is meant to have the code necessary to train and evaluate RL agents. It has the
+following structure:
+```
+smart_control/reinforcement_learning/
+├── agents/              # RL agent implementations (SAC, TD3, DDPG)
+│   └── networks/           # Neural networks for agents
+├── observers/           # Monitoring and data collection during training/evaluation
+├── policies/            # Policy implementations (including baseline policies)
+├── replay_buffer/       # Experience replay buffer management
+├── scripts/             # Training and evaluation scripts
+├── utils/               # Utility functions and helpers
+└── visualization/       # Visualization tools for analysis
+```
+
+## Tutorials
+
+Check out this [tutorial](https://youtu.be/RbpkKciw0IQ) to help get started with the RL module.
diff --git a/docs/community/reinforcement-learning-module/agents.md b/docs/community/reinforcement-learning-module/agents.md
new file mode 100644
index 00000000..6e49d4f4
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/agents.md
@@ -0,0 +1,130 @@
+---
+layout: "default"
+title: "RL Agents"
+nav_order: 1
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Reinforcement Learning Agents
+
+The RL agents module provides implementations of reinforcement learning algorithms tailored for building control applications. All agents follow the TF-Agents interface for consistency and interoperability.
+
+## Agent Interface
+
+All agents in the Smart Control framework implement the [`tf_agents.agents.tf_agent.TFAgent`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/agents/TFAgent) interface, providing a consistent API for interacting with the models.
+
+## Agent Factory Functions
+
+The Smart Control framework provides factory functions for creating agents with sensible defaults:
+
+### SAC Agent
+
+```python
+from smart_control.reinforcement_learning.agents.sac_agent import create_sac_agent
+
+# Create an SAC agent with default parameters
+agent = create_sac_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec
+)
+
+# Create an SAC agent with custom parameters
+agent = create_sac_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec,
+    actor_fc_layers=(256, 256),
+    critic_obs_fc_layers=(256, 128),
+    critic_action_fc_layers=(256, 128),
+    critic_joint_fc_layers=(256, 128),
+    actor_learning_rate=3e-4,
+    critic_learning_rate=3e-4,
+    alpha_learning_rate=3e-4
+)
+```
+
+### TD3 Agent
+
+```python
+from smart_control.reinforcement_learning.agents.td3_agent import create_td3_agent
+
+# Create a TD3 agent with default parameters
+agent = create_td3_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec
+)
+
+# Create a TD3 agent with custom parameters
+agent = create_td3_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec,
+    actor_fc_layers=(256, 256),
+    critic_obs_fc_layers=(256, 128),
+    critic_action_fc_layers=(256, 128),
+    critic_joint_fc_layers=(256, 128)
+)
+```
+
+### DDPG Agent
+
+```python
+from smart_control.reinforcement_learning.agents.ddpg_agent import create_ddpg_agent
+
+# Create a DDPG agent with default parameters
+agent = create_ddpg_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec
+)
+
+# Create a DDPG agent with custom parameters
+agent = create_ddpg_agent(
+    time_step_spec=time_step_spec,
+    action_spec=action_spec,
+    actor_fc_layers=(128, 128),
+    critic_obs_fc_layers=(128, 64),
+    critic_action_fc_layers=(128, 64),
+    critic_joint_fc_layers=(128, 64)
+)
+```
+
+## Networks
+
+The networks used by each agent are kept in the `networks/` subdirectory in this module.
+
+## Adding a New Agent
+
+To experiment with different agents, you can create a new agent by adding a new agent file in the `agents/` directory,
+and any necessary networks in the `agents/networrks/` directory. Follow the existing agents as a template,
+and ensure that your agent implementation implements the
+[`tf_agents.agents.tf_agent.TFAgent`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/agents/TFAgent) interface to work out-of
+the box.
+
+After that, any training script (e.g. `train.py`) to include this new agent, which is simply adding another item to
+this part of the code:
+
+```python
+# train.py script
+...
+# Add your agent implementation here
+# Create agent based on type
+    logger.info(f"Creating {agent_type} agent")
+    if agent_type.lower() == 'sac':
+        logger.info("Creating SAC agent")
+        agent = create_sac_agent(time_step_spec=time_step_spec, action_spec=action_spec)
+    elif agent_type.lower() == 'td3':
+        logger.info("Creating TD3 agent")
+        agent = create_td3_agent(time_step_spec=time_step_spec, action_spec=action_spec)
+    elif agent_type.lower() == 'ddpg':
+        logger.info("Creating DDPG agent")
+        agent = create_ddpg_agent(time_step_spec=time_step_spec, action_spec=action_spec)
+    else:
+        logger.exception(f"Unsupported agent type: {agent_type}")
+        raise ValueError(f"Unsupported agent type: {agent_type}")
+...
+```
+
+---
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
diff --git a/docs/community/reinforcement-learning-module/observers.md b/docs/community/reinforcement-learning-module/observers.md
new file mode 100644
index 00000000..b5c2c178
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/observers.md
@@ -0,0 +1,81 @@
+---
+layout: "default"
+title: "Observers"
+nav_order: 5
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Observers
+
+Observers monitor agent behavior during training and evaluation by processing trajectory data at each step. Observers can be used to log information, visualize agent performance, or save data for later analysis.
+They are designed to be modular by using the Observer design pattern, allowing you to easily add or remove observers to your agent/actor depending
+on your needs.
+
+## Observer Interface
+
+All observers implement the `Observer` abstract base class, defined in `observers.base_observer.py`:
+
+```python
+class Observer(abc.ABC):
+    @abc.abstractmethod
+    def __call__(self, trajectory: trajectory_lib.Trajectory) -> None:
+        """Process a trajectory."""
+        pass
+
+    def reset(self) -> None:
+        """Reset observer state between episodes."""
+        pass
+
+    def close(self) -> None:
+        """Clean up resources."""
+        pass
+```
+
+## Available Observers
+
+The following observers are implemented and available.
+
+### PrintStatusObserver
+
+Logs training progress information to the console:
+
+```python
+from smart_control.reinforcement_learning.observers.print_status_observer import PrintStatusObserver
+
+print_observer = PrintStatusObserver(
+    status_interval_steps=10,
+    environment=env, # tf environment
+    replay_buffer=replay_buffer # replay buffer being used. Used to print replay buffer size.
+                                # Can be None if not using replay buffer
+)
+```
+
+### TrajectoryRecorderObserver
+
+Records trajectory data for later analysis and visualization. This is very useful for evaluation runs:
+
+```python
+from smart_control.reinforcement_learning.observers.trajectory_recorder_observer import TrajectoryRecorderObserver
+
+trajectory_observer = TrajectoryRecorderObserver(
+    save_dir=trajectory_dit, # directory where to save plots/data
+    environment=env # tf environment
+)
+```
+
+### CompositeObserver
+
+Combines multiple observers into a single observer. This is useful to make it simpler to pass observers
+to the agent/actor:
+
+```python
+from smart_control.reinforcement_learning.observers.composite_observer import CompositeObserver
+
+combined_observers = CompositeObserver([print_observer, replay_buffer_observer])
+# now, just need to pass observers = combined_observers to the agent/actor
+```
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
diff --git a/docs/community/reinforcement-learning-module/policies.md b/docs/community/reinforcement-learning-module/policies.md
new file mode 100644
index 00000000..b97c6f13
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/policies.md
@@ -0,0 +1,52 @@
+---
+layout: "default"
+title: "Policies"
+nav_order: 2
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Policies
+
+Policies determine the agent's behavior by mapping observations to actions. Policies can be deterministic (e.g. greedy policies) or stochastic (e.g. exploratory policies). A policy can also be learned by an agent,
+or it can be just a simple rule-based policy (e.g. a fixed building schedule).
+
+In this module, we provide two classes. `SchedulePolicy` is used to define rule-based policies that can be used as
+baselines for RL agents. `SavedModelPolicy` is a wrapper used to load and interact with policies that have been
+learned and saved.
+
+All of the policies in this project should [`tf_agents.policies.TFPolicy`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/policies/TFPolicy) to provide a consistent interface to interact with policies.
+
+## SchedulePolicy
+
+**Purpose**: Implements traditional rule-based control strategies based on time schedules.
+
+**Implementation**: `SchedulePolicy` in `schedule_policy.py`
+
+**Usage Example**: see the implementation of the `create_baseline_schedule_policy` function in `policies/schedule_policy.py`, which
+instantiates an example baseline policy.
+
+## SavedModelPolicy
+
+**Purpose**: Loads and uses a previously trained policy that has been saved.
+
+**Implementation**: `SavedModelPolicy` in `saved_model_policy.py`
+
+**Usage Example**:
+
+```python
+from smart_control.reinforcement_learning.policies.saved_model_policy import SavedModelPolicy
+
+# Load a saved policy
+policy = SavedModelPolicy(
+    saved_model_path="path/to/saved/policy",
+    time_step_spec=time_step_spec, # obtained through tf_env.time_step_spec()
+    action_spec=action_spec # obtained through tf_env.action_spec()
+)
+```
+
+---
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
diff --git a/docs/community/reinforcement-learning-module/replay_buffer.md b/docs/community/reinforcement-learning-module/replay_buffer.md
new file mode 100644
index 00000000..53108002
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/replay_buffer.md
@@ -0,0 +1,108 @@
+---
+layout: "default"
+title: "Replay Buffer"
+nav_order: 4
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Replay Buffer
+
+The replay buffer stores agent experiences for later reuse, enabling efficient learning from past interactions.
+
+This project provides a wrapper class around a Reverb replay buffer to facilitate interaction with it.
+
+## ReplayBufferManager
+
+The `ReplayBufferManager` class simplifies the creation and management of replay buffers:
+
+```python
+from smart_control.reinforcement_learning.replay_buffer.replay_buffer import ReplayBufferManager
+
+# Create a replay buffer manager
+replay_manager = ReplayBufferManager(
+    data_spec=agent.collect_data_spec, # agent is a TF-Agents agent
+    capacity=50000,
+    checkpoint_dir="path/to/checkpoint/dir",
+    sequence_length=2
+)
+
+# Create a new replay buffer
+replay_buffer, replay_buffer_observer = replay_manager.create_replay_buffer()
+
+# Or load an existing replay buffer
+replay_buffer, replay_buffer_observer = replay_manager.load_replay_buffer()
+```
+
+To add experiences to the replay buffer, you can add the `replay_buffer_observer` object returned above. For example:
+
+```python
+# Combine observers
+    replay_buffer, replay_buffer_observer = replay_manager.load_replay_buffer()
+
+    collect_actor = actor.Actor(
+        ...,
+        observers=[replay_buffer_observer],
+        ...,
+    )
+```
+
+### Key Methods
+
+- **`create_replay_buffer()`**: Creates a new replay buffer and observer
+- **`load_replay_buffer()`**: Loads an existing replay buffer from a checkpoint
+- **`get_dataset(batch_size, num_steps)`**: Creates a TensorFlow dataset for sampling
+- **`num_frames()`**: Returns the current number of frames in the buffer
+- **`clear()`**: Clears all data from the buffer
+- **`close()`**: Closes the buffer server and cleans up resources
+
+## Populating the Buffer
+
+### Initial Population
+
+To pre-populate the buffer with some initial experiences (e.g. training an off-policy algorithm) you can use the `populate_starter_buffer.py` script, at `scripts/populate_replay_buffer.py`. This uses the baseline schedule policy from `policies/schedule_policy.py` to pre-populate the buffer:
+
+```bash
+# Populate a starter buffer using a baseline policy
+python scripts/populate_starter_buffer.py \
+    --buffer-name my-starter-buffer \
+    --capacity 50000 \
+    --steps-per-run 672 \
+    --num-runs 10
+```
+
+## Sampling from the Buffer
+
+For training, experiences are sampled from the buffer as batches:
+
+```python
+# Create a dataset for sampling
+dataset = replay_buffer.as_dataset(
+    sample_batch_size=64,
+    num_steps=2,
+    num_parallel_calls=3
+).prefetch(3)
+```
+
+## Checkpointing
+
+Replay buffers can be checkpointed to disk for persistence:
+
+```python
+# Save the current state
+replay_buffer.py_client.checkpoint()
+
+# Load from checkpoint (done through ReplayBufferManager)
+replay_manager = ReplayBufferManager(
+    data_spec=agent.collect_data_spec,
+    capacity=50000,
+    checkpoint_dir="path/to/checkpoint/dir"
+)
+replay_buffer, observer = replay_manager.load_replay_buffer()
+```
+
+---
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
diff --git a/docs/community/reinforcement-learning-module/scripts.md b/docs/community/reinforcement-learning-module/scripts.md
new file mode 100644
index 00000000..5369de15
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/scripts.md
@@ -0,0 +1,175 @@
+---
+layout: "default"
+title: "Scripts"
+nav_order: 7
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Scripts
+
+The `scripts` module provides command-line utilities for training, evaluating, and managing reinforcement learning experiments. These scripts streamline common RL tasks with standardized workflows.
+
+## Training Script
+
+`train.py` trains a reinforcement learning agent using a pre-populated replay buffer, managing agent creation, experience collection, and metrics logging.
+
+### Parameters
+
+| Parameter                              | Required | Default                          | Description                                                                 |
+|----------------------------------------|----------|----------------------------------|-----------------------------------------------------------------------------|
+| `--starter-buffer-path`                | Yes      | N/A                              | Path to the pre-populated replay buffer                                     |
+| `--experiment-name`                    | Yes      | N/A                              | Name of the experiment for saving results                                   |
+| `--agent-type`                         | No       | `'sac'`                          | Type of agent to train: `'sac'`, `'td3'`, or `'ddpg'`                       |
+| `--train-iterations`                   | No       | `300`                            | Total number of training iterations                                         |
+| `--collect-steps-per-training-iteration`| No       | `50`                             | Number of environment steps to collect per training iteration               |
+| `--batch-size`                         | No       | `256`                            | Batch size for training (number of samples per gradient update)             |
+| `--log-interval`                       | No       | `1`                              | Interval (in steps) for logging training metrics                            |
+| `--eval-interval`                      | No       | `10`                             | Interval (in steps) for evaluating the agent                                |
+| `--num-eval-episodes`                  | No       | `1`                              | Number of episodes to run during each evaluation                            |
+| `--checkpoint-interval`                | No       | `10`                             | Interval (in steps) for checkpointing the replay buffer                     |
+| `--learner-iterations`                 | No       | `200`                            | Number of gradient updates to perform per training iteration                |
+| `--scenario-config-path`               | No       | `smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin` | Path to the scenario configuration file (.gin)                              |
+
+### Example Usage
+
+```bash
+python scripts/train.py \
+    --starter-buffer-path data/buffers/initial_buffer \
+    --experiment-name hvac_control_sac \
+    --agent-type sac \
+    --train-iterations 300 \
+    --collect-steps-per-training-iteration 50 \
+    --batch-size 256 \
+    --scenario-config-path configs/custom_config.gin
+```
+
+## Evaluation Script
+
+`eval.py` evaluates a trained policy or the baseline schedule policy in a configured environment, producing performance metrics and optional trajectory data.
+
+### Parameters
+
+| Parameter             | Required | Default                                                      | Description                                                                 |
+|-----------------------|----------|--------------------------------------------------------------|-----------------------------------------------------------------------------|
+| `--policy-dir`        | Yes      | N/A                                                          | Path to the saved policy directory or `"schedule"` for baseline policy      |
+| `--gin-config`        | No       | `smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-7_starttimestamp-2023-07-06.gin` | Path to the environment configuration file (.gin)                           |
+| `--num-eval-episodes` | No       | `1`                                                          | Number of episodes to run for evaluation                                    |
+| `--experiment-name`   | Yes      | N/A                                                          | Name of the evaluation experiment for saving results                        |
+| `--save-trajectory`   | No       | `True`                                                       | Whether to save detailed trajectory data for each episode                   |
+
+### Example Usage
+
+```bash
+python scripts/eval.py \
+    --policy-dir experiments/hvac_control_sac/policies/greedy_policy \
+    --gin-config configs/building_sim.gin \
+    --num-eval-episodes 5 \
+    --experiment-name sac_evaluation \
+    --save-trajectory False
+```
+
+## Buffer Population Script
+
+`populate_starter_buffer.py` populates an initial replay buffer with exploration data using the baseline schedule policy, aiding off-policy learning.
+
+### Parameters
+
+| Parameter                   | Required | Default                                                      | Description                                                                 |
+|-----------------------------|----------|--------------------------------------------------------------|-----------------------------------------------------------------------------|
+| `--buffer-name`             | Yes      | N/A                                                          | Name to identify the saved replay buffer                                    |
+| `--capacity`                | No       | `50000`                                                      | Maximum capacity of the replay buffer                                       |
+| `--steps-per-run`           | No       | `672`                                                        | Number of steps to collect per actor run (episode)                          |
+| `--num-runs`                | No       | `10`                                                         | Number of actor runs (episodes) to perform                                  |
+| `--sequence-length`         | No       | `2`                                                          | Sequence length for storing trajectories in the buffer                      |
+| `--env-gin-config-file-path`| No       | `smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin` | Path to the environment configuration file (.gin)                           |
+
+### Example Usage
+
+```bash
+python scripts/populate_starter_buffer.py \
+    --buffer-name initial_exploration \
+    --capacity 100000 \
+    --steps-per-run 1000 \
+    --num-runs 20 \
+    --sequence-length 2 \
+    --env-gin-config-file-path configs/custom_env.gin
+```
+
+## Configuration Generator Script
+
+`generate_gin_config_files.py` generates multiple gin config files from a parameter grid for systematic experimentation.
+
+### Parameters
+
+| Parameter           | Required | Default                          | Description                                                                 |
+|---------------------|----------|----------------------------------|-----------------------------------------------------------------------------|
+| `base_config`       | Yes      | N/A                              | Path to the base gin config file (positional argument)                      |
+| `--output-dir`      | No       | `'generated_configs'`            | Directory to save the generated config files                                |
+| `--time-steps`      | No       | `'300'`                          | Comma-separated list of `time_step_sec` values to grid over                 |
+| `--num-days`        | No       | `'1,7,14,30'`                   | Comma-separated list of `num_days_in_episode` values to grid over           |
+| `--start-timestamps`| No       | `'2023-07-06'`                   | Comma-separated list of `start_timestamp` dates to grid over                |
+
+### Example Usage
+
+```bash
+python scripts/generate_gin_config_files.py configs/base_config.gin \
+    --output-dir configs/generated \
+    --time-steps 300,600,900 \
+    --num-days 1,7,14 \
+    --start-timestamps 2023-07-06,2023-10-06
+```
+
+## Typical Workflow
+
+A typical RL experiment workflow includes:
+
+1. **Generate configurations:**
+
+   ```bash
+   python scripts/generate_gin_config_files.py configs/template.gin \
+       --output-dir configs/generated
+   ```
+
+2. **Populate initial buffer:**
+
+   ```bash
+   python scripts/populate_starter_buffer.py \
+       --buffer-name starter \
+       --env-gin-config-file-path configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin
+   ```
+
+3. **Train agent:**
+
+   ```bash
+   python scripts/train.py \
+       --starter-buffer-path data/buffers/starter \
+       --experiment-name my_experiment \
+       --scenario-config-path configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin
+   ```
+
+4. **Evaluate trained policy:**
+
+   ```bash
+   python scripts/eval.py \
+       --policy-dir experiments/my_experiment/policies/greedy_policy \
+       --gin-config configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin \
+       --experiment-name eval_my_experiment
+   ```
+
+5. **Compare against baseline:**
+
+   ```bash
+   python scripts/eval.py \
+       --policy-dir schedule \
+       --gin-config configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin \
+       --experiment-name baseline_evaluation
+   ```
+
+---
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
+
+---
diff --git a/docs/community/reinforcement-learning-module/visualization.md b/docs/community/reinforcement-learning-module/visualization.md
new file mode 100644
index 00000000..ebe5971d
--- /dev/null
+++ b/docs/community/reinforcement-learning-module/visualization.md
@@ -0,0 +1,76 @@
+---
+layout: "default"
+title: "Visualization"
+nav_order: 6
+parent: "Reinforcement Learning Module"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Visualization
+
+This part of the reinforcement learning module should include tools for visualizing agent behavior, environment states, and performance metrics. These tools help with understanding agent performance, debugging issues, and communicating results.
+
+## TrajectoryPlotter
+
+**Purpose**: Generates plots from trajectory data collected during agent evaluation.
+
+**Implementation**: `TrajectoryPlotter` in `visualization/trajectory_plotter.py`
+
+**Key Methods**:
+
+- **`plot_actions(actions, save_path, timestamps=None, title="Actions Over Time")`**:
+
+  - Visualizes action values over time
+  - Shows the agent's control decisions
+  - Optionally includes timestamps on the x-axis
+
+- **`plot_rewards(rewards, save_path, timestamps=None, title="Rewards Over Time")`**:
+
+  - Displays reward values at each time step
+  - Helps identify when the agent receives high or low rewards
+  - Useful for understanding the reward landscape
+
+- **`plot_cumulative_reward(rewards, save_path, timestamps=None, title="Cumulative Reward Over Time")`**:
+  - Shows the accumulation of reward over time
+  - Provides a clear picture of overall agent performance
+  - Useful for comparing different policies
+
+**Usage Example**:
+
+```python
+from smart_control.reinforcement_learning.visualization.trajectory_plotter import TrajectoryPlotter
+
+# Generate plots from collected data
+TrajectoryPlotter.plot_actions(
+    actions=action_data,
+    save_path='plots/actions.png',
+    timestamps=timestamp_data,
+    title='Agent Actions During Evaluation'
+)
+
+TrajectoryPlotter.plot_rewards(
+    rewards=reward_data,
+    save_path='plots/rewards.png',
+    title='Agent Rewards'
+)
+
+TrajectoryPlotter.plot_cumulative_reward(
+    rewards=reward_data,
+    save_path='plots/cumulative_reward.png',
+    title='Cumulative Reward'
+)
+```
+
+These methods should produce plots similar to these:
+
+![Action Plot](../assets/visualization-module-images/action_plot.png)
+
+![Rewards Plot](../assets/visualization-module-images/reward_plot.png)
+
+![Cumulative Reward Plot](../assets/visualization-module-images/cum_reward_plot.png)
+
+---
+
+[Back to Reinforcement Learning](../reinforcement-learning-module.md)
+
+[Back to Home](../../index.md)
diff --git a/docs/community/reward-functions.md b/docs/community/reward-functions.md
new file mode 100644
index 00000000..4fde4d66
--- /dev/null
+++ b/docs/community/reward-functions.md
@@ -0,0 +1,12 @@
+---
+layout: "default"
+title: "Reward Functions"
+nav_order: 4
+parent: "Smart Control Project Documentation"
+has_children: true
+---
+
+# Reward Functions
+
+The reward functions define how the agent's actions are evaluated, guiding the learning process towards desired outcomes.
+----
\ No newline at end of file
diff --git a/docs/community/sac.md b/docs/community/sac.md
new file mode 100644
index 00000000..08164f8a
--- /dev/null
+++ b/docs/community/sac.md
@@ -0,0 +1,16 @@
+---
+layout: "default"
+title: "Soft Actor Critic"
+nav_order: 2
+parent: "Learning Algorithms"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Soft Actor Critic
+The implementation of this algorithm can be found in the `SAC_Demo.ipynb` notebook.
+
+---
+
+[Back to Learning Algorithms](learning-algorithms.md)
+
+[Back to Home](../index.md)
\ No newline at end of file
diff --git a/docs/community/setpoint-reward-function.md b/docs/community/setpoint-reward-function.md
new file mode 100644
index 00000000..f86f123b
--- /dev/null
+++ b/docs/community/setpoint-reward-function.md
@@ -0,0 +1,40 @@
+---
+layout: "default"
+title: "SetpointEnergyCarbonRewardFunction"
+nav_order: 2
+parent: "Reward Functions"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# SetpointEnergyCarbonRewardFunction
+
+**Purpose**: Implements a reward function that balances productivity, energy cost, and carbon emissions.
+
+## Key Attributes
+
+- Inherits from `BaseSetpointEnergyCarbonRewardFunction`.
+- `electricity_energy_cost`: Instance of `BaseEnergyCost` for electricity.
+- `natural_gas_energy_cost`: Instance of `BaseEnergyCost` for natural gas.
+- `energy_cost_weight`: Weight for energy cost in the reward.
+- `carbon_cost_weight`: Weight for carbon emissions in the reward.
+- `carbon_cost_factor`: Cost per kilogram of carbon emitted.
+- `reward_normalizer_shift`: Shift applied to the reward for normalization.
+- `reward_normalizer_scale`: Scale applied to the reward for normalization.
+
+## Key Methods
+
+- `__init__(...)`: Initializes the reward function with energy cost and carbon emission parameters.
+- `compute_reward(energy_reward_info)`: Computes the reward value by considering productivity, energy costs, and carbon emissions.
+
+## Reward Calculation Logic
+
+- Calculates the productivity reward based on zone temperatures and occupancy.
+- Computes the energy costs and carbon emissions for electricity and natural gas.
+- Applies weights to each component (productivity, energy cost, carbon cost).
+- Normalizes and combines these components to produce the final reward.
+
+---
+
+[Back to Reward Functions](reward-functions.md)
+
+[Back to Home](../index.md)
diff --git a/docs/community/simulation-components.md b/docs/community/simulation-components.md
new file mode 100644
index 00000000..b53bfbcf
--- /dev/null
+++ b/docs/community/simulation-components.md
@@ -0,0 +1,22 @@
+---
+layout: "default"
+title: "Simulation Components"
+nav_order: 3
+parent: "Smart Control Project Documentation"
+has_children: true
+---
+
+# Simulation Components
+
+The simulation components model the physical building, HVAC systems, weather conditions, and occupancy patterns.
+
+## Sections
+
+- [Building Simulation](building-simulation.md)
+- [HVAC Systems](hvac-systems.md)
+- [Weather Controllers](weather-controllers.md)
+- [Occupancy Models](occupancy-models.md)
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/system-architecture.md b/docs/community/system-architecture.md
new file mode 100644
index 00000000..200cb441
--- /dev/null
+++ b/docs/community/system-architecture.md
@@ -0,0 +1,67 @@
+---
+layout: "default"
+title: "System Interaction and Architecture"
+nav_order: 7
+parent: "Smart Control Project Documentation"
+---
+
+# System Interaction and Architecture
+
+## Data Flow
+
+The system operates in discrete time steps within an episode, following the reinforcement learning loop:
+
+1. **Agent Action**:
+
+   - The agent selects an action based on the current observation.
+   - The action is normalized and sent to the environment.
+
+2. **Environment Response**:
+
+   - The environment applies the action to the building simulation.
+   - Actions are converted from normalized values to native setpoint values using action normalizers.
+   - The building simulation updates its state based on the action.
+
+3. **Observation Retrieval**:
+
+   - The environment retrieves observations from the building after the action.
+   - Observations are normalized and processed, including time and occupancy features.
+   - Missing or invalid observations are handled using previous data or default values.
+
+4. **Reward Calculation**:
+
+   - The reward function computes the reward based on productivity, energy cost, and carbon emissions.
+   - The reward is provided to the agent.
+
+5. **State Update**:
+
+   - The environment updates internal metrics and logs information.
+   - Checks if the episode has ended based on the number of steps or time.
+   - If the episode has ended, the environment resets for the next episode.
+
+## Component Interactions
+
+- **Environment and Building Simulation**:
+
+  - The `Environment` interacts with the `SimulatorBuilding`, which integrates the building simulation (`TFSimulator`), HVAC systems, weather controller, and occupancy model.
+  - Actions are applied to the building simulation, and observations are retrieved after each step.
+
+- **Reward Functions**:
+
+  - The environment uses the reward function (e.g., `SetpointEnergyCarbonRegretFunction`) to compute rewards based on the `RewardInfo` from the building.
+  - The reward function accesses energy consumption data, occupancy levels, and temperatures to compute productivity and costs.
+
+- **Energy Cost Models**:
+
+  - `ElectricityEnergyCost` and `NaturalGasEnergyCost` provide cost and carbon emission calculations based on energy usage and time.
+  - The reward function uses these models to compute energy costs and carbon emissions for the reward.
+
+- **Normalization and Configuration**:
+
+  - `ActionConfig` defines how actions are normalized and mapped to building setpoints.
+  - Observation normalizers are defined for each measurement to ensure consistent scaling.
+  - Gin configuration files specify parameters and bindings for all components.
+
+---
+
+[Back to Home](../index.md)
diff --git a/docs/community/td3.md b/docs/community/td3.md
new file mode 100644
index 00000000..4e3003f0
--- /dev/null
+++ b/docs/community/td3.md
@@ -0,0 +1,72 @@
+---
+layout: "default"
+title: "Twin Delayed Deep Deterministic Policy Gradient"
+nav_order: 4
+parent: "Learning Algorithms"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Twin Delayed Deep Deterministic Policy Gradient (TD3)
+
+Implementation of TD3 can be found in the `TD3_Demo.ipynb` notebook.
+
+---
+
+Twin Delayed Deep Deterministic Policy Gradient (TD3) is a model-free reinforcement learning algorithm for continuous action spaces. It improves upon DDPG by addressing overestimation bias and stabilizing training. Key features include:
+
+- **Delayed Policy Updates**: The actor network is updated less frequently than the critic networks.
+- **Target Policy Smoothing**: Noise is added and clipped on target actions during critic updates.
+- **Twin Critic Networks**: Two critic networks compute Q-values; the minimum is used to reduce overestimation.
+- **Custom Actor Network**: A tailored actor scales outputs to valid action ranges.
+- **Replay Buffer with Reverb**: A Reverb-based replay buffer stores trajectories for efficient off-policy learning.
+- **TF-Agents Integration**: Leverages TensorFlow Agents for agent construction, training, and evaluation.
+
+## Agent Architecture
+
+### Actor Network
+
+- Built using fully connected layers (e.g., 128, 128 units) with ReLU activations.
+- The final layer uses a tanh activation to scale outputs into the action space.
+- Ensures actions remain within specified bounds.
+
+### Critic Network
+
+- Uses separate fully connected layers for observations and actions.
+- Joins the outputs via a joint network to estimate Q-values.
+- Implemented with two critics to select the minimum value for target computation.
+
+## Training Details
+
+- **Optimizers**: Both actor and critic use the Adam optimizer (learning rate = 3e-4).
+- **Hyperparameters**:
+  - Discount factor (`gamma`): 0.99
+  - Target update tau: 0.005
+  - Target update period: 2 steps (delayed actor updates)
+- **Noise Parameters**:
+  - Exploration noise standard deviation: 0.1
+  - Target policy noise: 0.2 (clipped at 0.5)
+- **Replay Buffer**:
+  - Capacity: 50,000 transitions
+  - Built with Reverb to sample batches and decorrelate training data
+
+## Training Loop
+
+1. **Data Collection**  
+   The collect policy interacts with the environment to accumulate trajectories.
+
+2. **Gradient Updates**  
+   The agent performs critic updates on every step and actor updates at a delayed frequency to improve stability.
+
+3. **Evaluation**  
+   The eval policy is used to measure performance after each training iteration.
+
+## Noteworthy Aspects
+
+- **Stabilized Learning**: Target policy smoothing and delayed actor updates reduce variance and mitigate overestimation.
+- **Efficient Off-Policy Learning**: The use of a Reverb replay buffer ensures diverse and decorrelated training samples.
+- **Custom Network Design**: A bespoke actor network enforces action bounds, while twin critics improve Q-value estimates.
+- **TF-Agents Framework**: Seamless integration with TF-Agents simplifies policy definition, training, and evaluation.
+
+[Back to Learning Algorithms](learning-algorithms.md)
+
+[Back to Home](../index.md)
\ No newline at end of file
diff --git a/docs/community/tutorials.md b/docs/community/tutorials.md
new file mode 100644
index 00000000..9ecd80e4
--- /dev/null
+++ b/docs/community/tutorials.md
@@ -0,0 +1,16 @@
+---
+layout: "default"
+title: "Tutorials"
+nav_order: 11
+parent: "Smart Control Project Documentation"
+has_children: false
+---
+
+# Tutorials
+
+Here we list some video tutorials that can help you get started with the Smart Control project. These tutorials cover different aspects of the project, including setting up with data, basic data analysis demo,
+and reinforcement learning.
+
+- [Getting started with Reinforcement Learning in sbsimulator](https://youtu.be/RbpkKciw0IQ)
+- [Setting Up with Data](https://youtu.be/474-33bLtAs)
+- [Basic data Modeling Demo](https://youtu.be/GOhApaDYZmE)
\ No newline at end of file
diff --git a/docs/community/weather-controllers.md b/docs/community/weather-controllers.md
new file mode 100644
index 00000000..d3b160e4
--- /dev/null
+++ b/docs/community/weather-controllers.md
@@ -0,0 +1,28 @@
+---
+layout: "default"
+title: "Weather Controllers"
+nav_order: 3
+parent: "Simulation Components"
+grand_parent: "Smart Control Project Documentation"
+---
+
+# Weather Controllers
+
+**Purpose**: Simulates external weather conditions affecting the building.
+
+## Key Classes and Components
+
+- **`ReplayWeatherController`**:
+
+  - Uses recorded weather data to simulate ambient conditions.
+
+  - **Attributes**:
+
+    - `local_weather_path`: Path to local weather data.
+    - `convection_coefficient`: Coefficient for heat convection between the building and the environment.
+
+---
+
+[Back to Simulation Components](simulation-components.md)
+
+[Back to Home](../index.md)
diff --git a/mkdocs.yml b/mkdocs.yml
index 3627740c..a911f408 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -30,8 +30,8 @@ theme:
     - navigation.top # back to top button
     - navigation.footer # next and back buttons
     #- navigation.indexes # allows a directory to have an index.md
-    - content.action.edit  # Enables the "Edit this page" button
-    - content.action.view  # Enables the "View source of this page" button
+    - content.action.edit # Enables the "Edit this page" button
+    - content.action.view # Enables the "View source of this page" button
 
   favicon: assets/images/favicon.ico
 
@@ -43,7 +43,6 @@ theme:
 
   # https://squidfunk.github.io/mkdocs-material/setup/changing-the-colors/
   palette:
-
     # Light Mode
     - media: "(prefers-color-scheme: light)"
       scheme: default
@@ -70,37 +69,71 @@ theme:
 #
 
 nav:
-  - 'Home': index.md
-  - 'Local Development':
-    - 'Setup':
-      - 'Overview': setup.md
-      - 'Linux': setup/linux.md
-      - 'Mac': setup/mac.md
-      - 'Docker': setup/docker.md
-    - 'Contributing': contributing.md
-    - 'Documentation Site': docs-site.md
-  - 'API Reference':
-    - api/config.md
-    - 'Dataset':
-      - api/dataset/dataset.md
-      - api/dataset/partition.md
-    - api/environment.md
-    - api/models.md
-    - 'Reinforcement Learning':
-      - api/reinforcement_learning/agents.md
-      - api/reinforcement_learning/observers.md
-      - api/reinforcement_learning/policies.md
-      - api/reinforcement_learning/replay_buffer.md
-      - api/reinforcement_learning/scripts.md
-      - api/reinforcement_learning/utils.md
-    - api/reward.md
-    - 'Simulator':
-      - 'Overview': api/simulator/simulator.md
-      - api/simulator/building.md
-      - api/simulator/devices.md
-      - api/simulator/occupancy.md
-      - api/simulator/weather.md
-      - api/simulator/utils.md
+  - "Home": index.md
+  - "Local Development":
+      - "Setup":
+          - "Overview": setup.md
+          - "Linux": setup/linux.md
+          - "Mac": setup/mac.md
+          - "Docker": setup/docker.md
+      - "Contributing": contributing.md
+      - "Code of Conduct": code-of-conduct.md
+      - "Documentation Site": docs-site.md
+  - "API Reference":
+      - api/config.md
+      - "Dataset":
+          - api/dataset/dataset.md
+          - api/dataset/partition.md
+      - api/environment.md
+      - api/models.md
+      - "Reinforcement Learning":
+          - api/reinforcement_learning/agents.md
+          - api/reinforcement_learning/observers.md
+          - api/reinforcement_learning/policies.md
+          - api/reinforcement_learning/replay_buffer.md
+          - api/reinforcement_learning/scripts.md
+          - api/reinforcement_learning/utils.md
+      - api/reward.md
+      - "Simulator":
+          - "Overview": api/simulator/simulator.md
+          - api/simulator/building.md
+          - api/simulator/devices.md
+          - api/simulator/occupancy.md
+          - api/simulator/weather.md
+          - api/simulator/utils.md
+  - "Community Documentation": # NEW SECTION
+      - "Additional Resources": community/additional-resources.md
+      - "Base Reward Function": community/base-reward-function.md
+      - "Best Practices": community/best-practices.md
+      - "Building Simulation": community/building-simulation.md
+      - "Configuration": community/configuration.md
+      - "DDPG": community/ddpg.md
+      - "Electricity Energy Cost": community/electricity-energy-cost.md
+      - "Environment": community/environment.md
+      - "Glossary": community/glossary.md
+      - "HVAC Systems": community/hvac-systems.md
+      - "Learning Algorithms": community/learning-algorithms.md
+      - "MCTS": community/mcts.md
+      - "Metrics Interpretation": community/metrics-interpretation.md
+      - "Natural Gas Energy Cost": community/natural-gas-energy-cost.md
+      - "Occupancy Models": community/occupancy-models.md
+      - "Regret Reward Function": community/regret-reward-function.md
+      - "Reinforcement Learning Module": # NESTED SECTION FOR THE SUBDIRECTORY
+          - "Overview": community/reinforcement-learning-module.md # This points to the .md file in the folder
+          - "Agents": community/reinforcement-learning-module/agents.md
+          - "Observers": community/reinforcement-learning-module/observers.md
+          - "Policies": community/reinforcement-learning-module/policies.md
+          - "Replay Buffer": community/reinforcement-learning-module/replay_buffer.md
+          - "Scripts": community/reinforcement-learning-module/scripts.md
+          - "Visualization": community/reinforcement-learning-module/visualization.md
+      - "Reward Functions": community/reward-functions.md
+      - "SAC": community/sac.md
+      - "Setpoint Reward Function": community/setpoint-reward-function.md
+      - "Simulation Components": community/simulation-components.md
+      - "System Architecture": community/system-architecture.md
+      - "TD3": community/td3.md
+      - "Tutorials": community/tutorials.md
+      - "Weather Controllers": community/weather-controllers.md
 
 #
 # PLUGINS / EXTENSIONS
@@ -133,7 +166,7 @@ plugins:
       handlers:
         python:
           paths:
-            - .  # look in the current directory for the "smart_control" dir
+            - . # look in the current directory for the "smart_control" dir
           options:
             docstring_style: google
             docstring_section_style: "table" # "table", "list", "spacy"
diff --git a/smart_control/dataset/dataset.py b/smart_control/dataset/dataset.py
index 435f7ed1..331de1eb 100644
--- a/smart_control/dataset/dataset.py
+++ b/smart_control/dataset/dataset.py
@@ -78,7 +78,7 @@ def download(self, timeout=60):
     """Downloads the building's dataset from Google Cloud Storage.
 
     Only downloads and unzips the dataset if it doesn't already exist at the
-      expected [`building_dirpath`](./#smart_control.dataset.dataset.BuildingDataset.building_dirpath)
+      expected [`building_dirpath`](#smart_control.dataset.dataset.BuildingDataset.building_dirpath)
       location. Otherwise it will load the existing local data.
 
     Download speed is fairly quick, but unzipping takes a few moments.
@@ -119,7 +119,7 @@ def floorplan(self) -> np.ndarray:
       + 1: wall / boundary
       + 2: outside / external space
 
-    Use the [`display_floorplan`](./#smart_control.dataset.dataset.BuildingDataset.display_floorplan)
+    Use the [`display_floorplan`](#smart_control.dataset.dataset.BuildingDataset.display_floorplan)
       method to view an image of the floorplan.
     """
     return np.load(self.floorplan_filepath)
@@ -149,7 +149,7 @@ def display_floorplan(
       show (bool): Whether or not to show the image.
       save (bool): Whether or not to save the image (as a .png file).
       image_filepath (str): An optional custom filepath to use when saving the
-        image. Only applies if `save=True`. By default, saves to the [`floorplan_image_filepath`](./#smart_control.dataset.dataset.BuildingDataset.floorplan_image_filepath)
+        image. Only applies if `save=True`. By default, saves to the [`floorplan_image_filepath`](#smart_control.dataset.dataset.BuildingDataset.floorplan_image_filepath)
     """
     plt.imshow(self.floorplan, interpolation="nearest", cmap=cmap)
     if show:
diff --git a/smart_control/dataset/partition.py b/smart_control/dataset/partition.py
index ebfee3fc..0facd693 100644
--- a/smart_control/dataset/partition.py
+++ b/smart_control/dataset/partition.py
@@ -84,7 +84,7 @@ def metadata_filepath(self):
 
   @cached_property
   def metadata(self) -> dict:
-    """Metadata describing the partition [`data`](./#smart_control.dataset.partition.BuildingDatasetPartition.data).
+    """Metadata describing the partition [`data`](#smart_control.dataset.partition.BuildingDatasetPartition.data).
 
     Returns:
       A dictionary containing the following keys:
@@ -146,9 +146,9 @@ def action_ids_map(self) -> dict:
     """A mapping of unique action identifiers.
 
     Returns:
-      A dictionary where the keys are the [`action_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.action_ids)
+      A dictionary where the keys are the [`action_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.action_ids)
         and the values are unique integers referencing column indices in the
-        [`action_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.action_value_matrix)
+        [`action_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.action_value_matrix)
 
         For example:
 
@@ -167,9 +167,9 @@ def observation_ids_map(self) -> dict:
     """A mapping of unique observation identifiers.
 
     Returns:
-      A dictionary where the keys are the [`observation_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.observation_ids)
+      A dictionary where the keys are the [`observation_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.observation_ids)
         and the values are unique integers referencing column indices in the
-        [`observation_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.observation_value_matrix).
+        [`observation_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.observation_value_matrix).
 
         For example:
 
@@ -190,8 +190,8 @@ def reward_info_ids_map(self) -> dict:
     See: `RewardInfo` in "smart_control/proto/smart_control_reward.proto".
 
     Returns:
-      A dictionary where the keys are the [`reward_info_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_ids)
-        and the values are unique integers referencing column indices in the [`reward_info_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_value_matrix).
+      A dictionary where the keys are the [`reward_info_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_ids)
+        and the values are unique integers referencing column indices in the [`reward_info_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_value_matrix).
 
         For example:
 
@@ -212,8 +212,8 @@ def reward_ids_map(self) -> dict:
     See: `RewardResponse` in "smart_control/proto/smart_control_reward.proto".
 
     Returns:
-      A dictionary where the keys are the [`reward_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_ids)
-        and the values are unique integers referencing column indices in the [`reward_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_value_matrix).
+      A dictionary where the keys are the [`reward_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_ids)
+        and the values are unique integers referencing column indices in the [`reward_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_value_matrix).
     """
     return {
         "agent_reward_value": 0,
@@ -344,9 +344,9 @@ def actions_df(self) -> pd.DataFrame:
     """A time-series dataframe of numeric action values, constructed from the
     following components:
 
-      + Columns are the [`action_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.action_ids)
-      + Row indices are the [`action_timestamps`](./#smart_control.dataset.partition.BuildingDatasetPartition.action_timestamps)
-      + Cell values are from the [`action_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.action_value_matrix)
+      + Columns are the [`action_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.action_ids)
+      + Row indices are the [`action_timestamps`](#smart_control.dataset.partition.BuildingDatasetPartition.action_timestamps)
+      + Cell values are from the [`action_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.action_value_matrix)
 
     Returns:
       A `pandas.DataFrame`. Here is an example of the structure:
@@ -373,9 +373,9 @@ def observations_df(self) -> pd.DataFrame:
     """A time-series dataframe of numeric observation values, constructed from the
     following components:
 
-      + Columns are the [`observation_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.observation_ids)
-      + Row indices are the [`observation_timestamps`](./#smart_control.dataset.partition.BuildingDatasetPartition.observation_timestamps)
-      + Cell values are from the [`observation_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.observation_value_matrix)
+      + Columns are the [`observation_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.observation_ids)
+      + Row indices are the [`observation_timestamps`](#smart_control.dataset.partition.BuildingDatasetPartition.observation_timestamps)
+      + Cell values are from the [`observation_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.observation_value_matrix)
 
     Returns:
       A `pandas.DataFrame`. Here is an example of the structure:
@@ -402,9 +402,9 @@ def rewards_df(self) -> pd.DataFrame:
     """A time-series dataframe of numeric reward values, constructed from the
     following components:
 
-      + Columns are the [`reward_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_ids)
-      + Row indices are the [`reward_timestamps`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_timestamps)
-      + Cell values are from the [`reward_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_value_matrix)
+      + Columns are the [`reward_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_ids)
+      + Row indices are the [`reward_timestamps`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_timestamps)
+      + Cell values are from the [`reward_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_value_matrix)
 
     Returns:
       A `pandas.DataFrame`. Here is an example of the structure:
@@ -430,9 +430,9 @@ def reward_infos_df(self) -> pd.DataFrame:
     """A time-series dataframe of numeric reward info values, constructed from
     the following components:
 
-      + Columns are the [`reward_info_ids`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_ids)
-      + Row indices are the [`reward_info_timestamps`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_timestamps)
-      + Cell values are from the [`reward_info_value_matrix`](./#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_value_matrix)
+      + Columns are the [`reward_info_ids`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_ids)
+      + Row indices are the [`reward_info_timestamps`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_timestamps)
+      + Cell values are from the [`reward_info_value_matrix`](#smart_control.dataset.partition.BuildingDatasetPartition.reward_info_value_matrix)
 
     Returns:
       A `pandas.DataFrame`. Here is an example of the structure: