Convert DPP to PyTest by matthew-pisano · Pull Request #186 · foundation-model-stack/aiu-fms-testing-utils

matthew-pisano · 2026-02-12T15:35:11Z

Summary

The DPP script is currently very long and complex to use. Additionally, there is no uniform way for running it within test automation.

This PR splits the original DPP script between a proper PyTest and a frontend with the old interface. Now, the test can be run with pytest without the complex and arbitrary parameters usually required. This logic has been internalized into the test to represent a single working path. Additionally, the --distributed parameter is now inferred based on whether the script is run using torchrun or not.

Running from PyTest

torchrun --nproc-per-node 4 -m pytest -s ./tests/testing/test_drive_paged_programs.py

Note: This test must be run with 4 cards or else it may fail.

The interface for the standalone script remains unchanged.

Internal changes:

The original ~1500 line long file has been split into several different files within the testing/dpp module.
Arbitrary strings have also been replaced with enums to handle invalid inputs.
Error handling has been shifted towards early failure when parameters are invalid or files are not found.
A representative DPP program criteria JSON added as a fixture. No need to manually specify a path.

Fixes:

Convert DPP to PyTest #185

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Ssukriti · 2026-02-20T21:56:27Z

Thank you for the effort that went into splitting utilities.

After careful review, my primary comments have been submitted .

data sampling part - please ensure that users can continue to pass ShareGpt and RAG datasets from file on disk - with type "shargept", "rag" as we have use cases for it and don't want to break existing behavior.

On file structure: @matthew-pisano I was wondering if we need another folder dpp under testing (aiu-fms-testing-utils/testing/dpp?) , or if all those files can reside in /utils https://github.com/foundation-model-stack/aiu-fms-testing-utils/tree/main/aiu_fms_testing_utils/utils, as they are essentially utils?
Please address my review changes in existing structure first as submitted, so I can review them before moving files around. I am open to leaving the files and structure as it is or moving them to utils, whatever you decide.

furthermore, after review changes are addressed, the DPP script should be tested before we merge to ensure it wont break any existing users - @rafvasq and @Abhishek-TAMU will need your help for that.

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

matthew-pisano · 2026-02-23T17:38:17Z

@Ssukriti Thank you very much for your thorough review!

In response to you final folder suggestion, I am flexible on where the folder is, but I really think everything should be grouped into a single folder. We tend to just dump everything into a single folder throughout the stack and it often causes significant confusion when loosely unrelated code is in the same folder.

Ssukriti · 2026-02-23T21:37:10Z

Thanks @matthew-pisano . The only pending change is that we do strongly want to support shargept and Rag factoid datasets from local paths on disk like previously, as we have processed the HF datasets for maximum coverage of programs. The processed datasets are shared between associated teams.

@joerunde @Abhishek-TAMU @rafvasq FYI ^

matthew-pisano · 2026-02-23T21:43:26Z

Thank you again @Ssukriti I will address your final concerns

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

matthew-pisano · 2026-02-26T14:12:26Z

I think I have gotten all of your suggestions. Let me know if not.

Ssukriti · 2026-02-26T20:40:20Z

@Abhishek-TAMU or @rafvasq the branch is ready for testing our pipelines before we merge

Ssukriti

approving as code is reviewed. we need to run some pipelines to double check logs before we merge

Abhishek-TAMU

Requested a small change after running DPP from existing Jenkins Pipeline using existing interface and not pytest. Thank you.

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Abhishek-TAMU

Thanks for the PR, @matthew-pisano . Left a few comments to address.

Abhishek-TAMU · 2026-03-09T21:58:09Z

+                f"Could not determine tkv_limit for model variant '{model_variant}'. "
+                "Please set the environment variable VLLM_DT_MAX_BATCH_TKV_LIMIT or "
+                "run this program in distributed mode."


@matthew-pisano Looks like we would always get this error when running with 1 AIU using existing interface from aiu_fms_testing_utils/scripts/drive_paged_programs.py. Are we planning to add code to use VLLM_DT_MAX_BATCH_TKV_LIMIT for 1 AIU runs too ?

I will need some information how how you are running this with 1 AIU, 4 AIUs is the only good configuration that I know of. Are you running with torchrun?

We do run with 1 AIU DPP runs using torchrun and sharing the command for the same if that helps:

export VLLM_DT_CHUNK_LEN=1024 export VLLM_DT_MAX_BATCH_SIZE=16 export VLLM_DT_MAX_BATCH_TKV_LIMIT=131072 export VLLM_DT_MAX_CONTEXT_LEN=3072 torchrun --nproc-per-node=1 aiu-fms-testing-utils/scripts/drive_paged_programs.py \ --max_new_tokens=32 \ --prefill_chunk_size=1024 \ --model_variant=ibm-granite/granite-3.3-8b-instruct \ --program_criteria_json_path=program_criteria_new.json \ --dataset_path=ShareGPT_V3_unfiltered_cleaned_split.json \ --dataset_type=sharegpt \ --test_type=metrics \ --cross_entropy_threshold=2.6 \ --failure_rate_threshold=0.1 \ --attention_type=paged \ --validation_info_outputs_dir=granite_3/sharegpt_homogeneous \ --prioritize_large_batch_sizes \ --enforce_homogeneous_prompt_programs

I was unable to get it working with these parameters. I got an error complaining that it could not allocate a KV cache. I tried it with a few smaller values and frequently got floating point errors. I found one set of values that worked:

VLLM_DT_CHUNK_LEN"="256" VLLM_DT_MAX_BATCH_SIZE="8" VLLM_DT_MAX_BATCH_TKV_LIMIT="1024" VLLM_DT_MAX_CONTEXT_LEN="256" AFTU_PAGED_KVCACHE_NUM_BLOCKS_HINT="1024"

Is there a way to decide which combination of environment variables will work? I cannot find any documentation, just things declaring "These should probably work".

@Abhishek-TAMU Any thoughts

@matthew-pisano Earlier I had shared the detailed product use case internally, along with the potential fix. Summarizing it again here:

There is currently no documentation describing which parameter combination should be used. Based on the product use case, the 1-AIU case is currently running and testing successfully with the following parameters:

VLLM_DT_MAX_BATCH_SIZE= 16 VLLM_DT_MAX_CONTEXT_LEN=3072 VLLM_DT_MAX_BATCH_TKV_LIMIT=131072 VLLM_DT_CHUNK_LEN=1024

AFTU_PAGED_KVCACHE_NUM_BLOCKS_HINT is Not set because it is later set in generate function using the logic here.

For the 4-AIU configuration, the current logic works fine:
AFTU_PAGED_KVCACHE_NUM_BLOCKS_HINT = 8192 if prefill_chunk_size > 0 else 2080

So in this commit, the logic added for the 1-AIU run to assign VLLM_DT_MAX_BATCH_TKV_LIMIT looks correct:

elif use_distributed and world_size == 1: ##Only set defaults for TP=1 context = ( "Model granite-3.3-8b (or compatible) " "with tensor parallel size 1 detected" ) self.tkv_limit = self._get_int_env( key="VLLM_DT_MAX_BATCH_TKV_LIMIT", default=131072, context=context, )

However, it seems better to keep the earlier behavior of assigning self.num_blocks only for 4-AIU runs, not for 1-AIU runs, since for 1 AIU num_blocks is set later and that is already working in the current pipeline.

So this code block should be removed, and this earlier code here should be restored.

This will run 1 AIU configuration successfully with the params given here.

@Abhishek-TAMU It looks like DPP is unacceptably flaky/machine dependent with 1 AIU. We have been unable to get the 1 AIU case working for any configuration recently. Even the parameters that I measured as good do not work. We are thinking that there may be some issues beyond the environment variables, possibly dependent on container configuration.

@matthew-pisano

1- Can I ask what those errors are? Just to confirm, are you testing with this change in aiu_fms_testing_utils/utils/dpp_config.py ? If you are testing with this change and are unable to get it running on your machine for verification, then we can stick with this change since it works in the existing pipeline for the 1 AIU run. Once you make this change, we can proceed with the PR.

2- I was also testing the 4 AIU runs again and noticed this: mapping here attn_type_map and passing attn_type_map[attn_type] instead of just attn_type.value in the get_default_validation_prefix() function changes the name of the CPU info file. Because of that, it no longer finds the existing CPU info files that we have already saved for all models and dtypes.

So, if this change is not mandatory, can we continue passing just attn_type.value? That would save the effort of re-saving all CPU info for the entire DPP pipeline across all models just because the file name changes.

3- When DPP tries to save new CPU info, there are a couple of errors in that path. Please test the CPU info saving flow as well. One fix is here in aiu_fms_testing_utils/testing/dpp/generation.py due to an argument mismatch, but there seem to be additional issues in that path when saving new CPU info.

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

JRosenkranz · 2026-03-12T01:50:37Z

bot:test
TEST_FILE=test_scripts.py

matthew-pisano requested review from JRosenkranz, Ssukriti and kcirred February 12, 2026 15:35

matthew-pisano self-assigned this Feb 12, 2026

matthew-pisano added 26 commits February 12, 2026 15:11

Split Up DPP

c167107

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Reformat Files

f20c06e

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Refactor Run DPP Script

4974a42

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Refactor Generation Script

534b19a

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Mode Dataset Resolution to Prepare Data

6f88b07

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Create Prepare Model File

82c0c2a

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Updatre to Use Timing Enum

189bef8

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update to Use Device Type Enum

d967cf8

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update to Use Test Type Enum

6c68121

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update to Use Attention Type Enum

dd907f4

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update Attention Type Enum

13b7afa

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix Very Long Line

1228653

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Add Rank Zero Dprint

d155fe8

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix Internal Function Names

490d87b

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Refactor Prepare Data

4500685

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Remove System Exit Commands

9868661

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Refactor Prompt Sampler

d792499

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Add Logging to Resolving Dataset

b7df666

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix Version Import Issues

1a475ac

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix Circular Import

295192a

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Add Path Validation

3e44b78

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Split Main Run Dpp Function

4501d22

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix TYpe Issue

81512cc

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Add Logging to Model Preparation

b268ebd

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Tweak Logging

7d41147

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Reuse Test Code in DPP Script

ffa1dda

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Ssukriti reviewed Feb 20, 2026

View reviewed changes

Comment thread aiu_fms_testing_utils/utils/dpp_config.py

matthew-pisano added 4 commits February 23, 2026 11:49

Add Missing Shapp Access

1b76705

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Flip Conditional in CPU Validation

836f1d6

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Only Calculate Token Accuracy Logs for Rank Zero

a74ae8b

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Ensure only Rank Zero Downloads Datasets

8bdba1c

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

matthew-pisano added 2 commits February 24, 2026 10:19

Allow Custom Path Specification

9123e2b

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Use Defaults for DPP Test

0145ec8

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

matthew-pisano requested review from Ssukriti, jameslivulpi and joerunde February 26, 2026 14:12

Ssukriti approved these changes Feb 26, 2026

View reviewed changes

Abhishek-TAMU reviewed Feb 28, 2026

View reviewed changes

Comment thread aiu_fms_testing_utils/scripts/drive_paged_programs.py Outdated

Default Timing to None in DPP

1c31b7f

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Abhishek-TAMU reviewed Mar 9, 2026

View reviewed changes

matthew-pisano added 7 commits March 10, 2026 10:41

Fix Resolve Dataset Call

c4c9a80

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update Local Vaiable Name

5a04c18

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Fix Timing Parsing

42b8c00

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Temporary Logging in Validation

c9baea3

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Handle Tensor Case in Validation

a05809d

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Update Variable Naming

b2db16d

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

Configure for Running on One AIU

8fbfec4

Signed-off-by: Matthew Pisano <matthewpisano14@gmail.com>

matthew-pisano requested review from cfsarmiento and removed request for jameslivulpi March 20, 2026 15:56

Conversation

matthew-pisano commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Running from PyTest

Internal changes:

Fixes:

Uh oh!

Uh oh!

Ssukriti commented Feb 20, 2026

Uh oh!

matthew-pisano commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ssukriti commented Feb 23, 2026

Uh oh!

matthew-pisano commented Feb 23, 2026

Uh oh!

matthew-pisano commented Feb 26, 2026

Uh oh!

Ssukriti commented Feb 26, 2026

Uh oh!

Ssukriti left a comment

Choose a reason for hiding this comment

Uh oh!

Abhishek-TAMU left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Abhishek-TAMU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Abhishek-TAMU Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

matthew-pisano Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Abhishek-TAMU Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

matthew-pisano Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthew-pisano Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Abhishek-TAMU Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

matthew-pisano Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Abhishek-TAMU Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JRosenkranz commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

matthew-pisano commented Feb 12, 2026 •

edited

Loading

matthew-pisano commented Feb 23, 2026 •

edited

Loading

Abhishek-TAMU left a comment •

edited

Loading

matthew-pisano Mar 11, 2026 •

edited

Loading

Abhishek-TAMU Apr 2, 2026 •

edited

Loading