Skip to content

BayesWatch/SelfSum

Repository files navigation

Self-Sum: Teaching an Agent to Decide Itself When and What to Summarize

Table of Contents

Installation

Install veRL

conda create -n verl-agent python==3.12 -y
conda activate verl-agent

pip3 install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation

pip3 install -e .

pip3 install vllm==0.8.5

Install Supported Environments

⚠️ Important: To run an agent in any of these environments, you must first install and configure the corresponding environment. We strongly recommend installing each environment in its own dedicated conda environment to avoid potential package version conflicts.

1. ALFWorld

Install with pip:

pip3 install gymnasium==0.29.1
pip3 install stable-baselines3==2.6.0
pip install alfworld
pip install vllm==0.8.5

Download PDDL & Game files and pre-trained MaskRCNN detector (will be stored in ~/.cache/alfworld/):

alfworld-download -f

Use --extra to download pre-trained checkpoints and seq2seq data.

Play a Textworld game:

alfworld-play-tw

2. Sciworld

pip install scienceworld  # using the same verl-agent env above

3. WebShop

WebShop requires Python <=3.10, so begin by creating a new verl-agent-webshop environment

conda create -n verl-agent-webshop python==3.10 -y
conda activate verl-agent-webshop

Install WebShop

cd ./agent_system/environments/env_package/webshop/webshop
./setup.sh -d all

Note: If you encounter issues with gdown, you may need to visit https://drive.google.com/, get your Google Drive cookie, and paste it into .cache/gdown/cookies.txt. Or you may need to manually download the files.

After WebShop is installed, return to the root directory of the repository and install the verl package in verl-agent:

cd repo_root/
pip3 install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install -e .
pip3 install vllm==0.8.2
# spacy 3.7.2 requires typer<0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.
# weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.15.2 which is incompatible.

The warnings can be safely ignored.


4. Search

cd ./agent_system/environments/env_package/search/third_party
pip install -e .
pip install gym==0.26.2

Prepare dataset (data will be saved at ~/data/searchR1_processed_direct):

cd repo_root/
python examples/data_preprocess/preprocess_search_r1_dataset.py

Since faiss-gpu is not available via pip, we setup a separate conda environment for the local retrieval server. Running this server will use around 6GB of GPU memory per GPU, so make sure to account for this in your training run configuration. Build Retriever environments:

# Create and activate the retriever environment with Python 3.10
conda create -n retriever python=3.10 -y
conda activate retriever

# Install PyTorch (with GPU support) and related libraries
conda install numpy==1.26.4 # needed to stop incompatible version of numpy from being installed via pip
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124

# Install other Python packages
pip install transformers datasets pyserini huggingface_hub

# Install the GPU version of faiss
conda install faiss-gpu==1.8.0 -c pytorch -c nvidia -y

# Install the API service framework
pip install uvicorn fastapi

Download the index:

conda activate retriever

local_dir=~/data/searchR1
python examples/search/searchr1_download.py --local_dir $local_dir
cat $local_dir/part_* > $local_dir/e5_Flat.index
gzip -d $local_dir/wiki-18.jsonl.gz

Start the local flat e5 retrieval server:

conda activate retriever

# redirect the output to a file to avoid cluttering the terminal
# we have observed outputting to the terminal causing spikes in server response times
bash examples/search/retriever/retrieval_launch.sh > retrieval_server.log 

Run Examples

RL Training

We provide out-of-the-box scripts in the "examples/" directory for training agents in different environments.

Here are several important parameters in env to organize the memory:

env.systematic_action # string, can be "summary" or "summary,recall,...", it decides which action the agent can call besides predefined environmental action

env.summary.use_summarized_memory  # usually as True, means multi-turn interaction, if Flase, back to multiple dependent single turn

env.summary.use_auto_cut # Ture or Flase, wwhether or not auto cut the context to fix maximum length

env.summary.auto_cut_mode # string, can be "none", "fix_latest3", "latest3" and so on, check agent_system/multi_turn_rollout/rollout_loop.py

env.summmary.summary_interval # int, the number n can be 3 4 5, or 0, and negative. if n is positive, means the agent can summarize everny n steps, if n is 0, means there is no summarization, if it is negative, means the agent decide summary itself. if need to be used with env.systematic_action

Here are some examples:

1. GiGPO

if you just do not want summarization, and utilize pre-defined rules such as always using latest 3 interactions:

bash examples/mem_trainer/run_alfworld_no_sum_fix_latest3.sh # ALFWorld

if you just want summarization, and utilize pre-defined rules such as summarize every 3 steps:

bash examples/gigpo_trainer/run_alfworld_fix_sum_self_gen.sh # WebShop

if you want the agent to decide when and what to summarize itself:

bash examples/gigpo_trainer/run_alfworld_self_sum_self_gen.sh # Search

you can load sft models if you want in this case, the sft model can be found in https://huggingface.co/Merlin-Hongru/

2. EVEO

this is our modificated rl algorirhms with summarization-aware advantages, there are also several important parameters:

algorithm.eveo.enable_summary_advantage # True of Flase, if true, we can add some reward to sumamry steps if the result is correct

algorithm.eveo.summary_mode="independent"  # Different method to add summary advantages, can be "independent" or "dependent", check eveo/core_eveo.py
bash examples/eveo_trainer/run_alfworld_self_sum_self_gen.sh # Search

feel free to check details of each scripts, and try different benchmarks.

References

About

Official codebase for ACL Findings 2026: "Self-Sum: Teaching an Agent to Decide Itself When and What to Summarize"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors