Skip to content

Dump hs#3

Draft
ChangseokSong wants to merge 23 commits into
mainfrom
dump-hs
Draft

Dump hs#3
ChangseokSong wants to merge 23 commits into
mainfrom
dump-hs

Conversation

@ChangseokSong
Copy link
Copy Markdown

@ChangseokSong ChangseokSong commented Nov 24, 2025

Motivation

This PR adds a feature to dump auxiliary hidden states and last hidden states produced during EAGLE speculative decoding.
The main purpose is to support SpecForge offline training, and it can also be used to collect data from real requests for training draft models.


Modifications

Hidden-state capture

  • Auxiliary and last hidden states are captured in the logits processor and stored per request.
  • Hidden states for each request are moved to host once, after the request finishes.

Dumping workflow

  • Added HiddenStateDumper, which manages preparing payloads, staging tensors on CPU, and writing them to disk.
  • Disk writes are handled by a process pool to avoid blocking the main worker.
  • A separate CUDA stream is used for DtoH transfers so that they can overlap with verification.
  • Hidden states from the previous decode step are processed at the next step, and the final batch is flushed when all requests in the batch are finished.
  • Dumping is performed in a round robin manner across TP ranks.

Buffer reuse

  • Added FlatBufferPool, which reuses CPU buffers to reduce allocation overhead.

Accuracy Tests

Benchmarking and Profiling

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant