Skip to content

Mm paged generate#192

Draft
gkumbhat wants to merge 8 commits intofoundation-model-stack:mainfrom
gkumbhat:mm_paged_generate
Draft

Mm paged generate#192
gkumbhat wants to merge 8 commits intofoundation-model-stack:mainfrom
gkumbhat:mm_paged_generate

Conversation

@gkumbhat
Copy link
Copy Markdown
Contributor

@gkumbhat gkumbhat commented Feb 23, 2026

Description

  • Uses embedding inputs for multimodal models
  • If it's multimodal (i.e., has a text_config as a subconfig), pull opts from the text_config
  • If all named params are the same dtype, use that dtype for the kv cache (otherwise falls back to existing behavior)
    • The existing fallback to float32 is pretty dangerous - with the above change, hopefully this should not really occur (assuming weights are mostly homogeneous for a given model). We should really not assume a default dtype for the model, since mismatches will break at decode time
  • Started to pull some of the hackery around figuring out things like kv heads etc into small helpers to be a bit more clear
  • Allow passing a prepare_model_inputs_hook to run multimodal models' prepare_inputs_for_generation to get the merged mm/text embeddings.
  • If we have input_ids that are 3D, the final dimension is marked as static for compile since it's the embedding dim. When this is eventually consolidated with generate in FMS (ref) it would be better to be better about variable names, but for now it's keeping the same naming convention as FMS intentionally, since most of this code is based on generate there.

Things to Test

  1. Inference.py
    • granite
      • inference.py with paged attention and chunked prefill
      • inference.py with paged attention but without chunked prefill
      • inference.py with sdpa
    • mistral
      • inference.py with paged attention and chunked prefill
      • inference.py with paged attention but without chunked prefill
      • inference.py with sdpa
  2. DPP

Continuation of : #188

alex-jw-brooks and others added 3 commits February 13, 2026 06:24
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
@gkumbhat gkumbhat marked this pull request as draft March 3, 2026 22:05
gkumbhat added 2 commits March 4, 2026 07:37
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants