Add support for offline speculative decoding model PTQ#883
Add support for offline speculative decoding model PTQ#883yeyu-nvidia wants to merge 4 commits intomainfrom
Conversation
… yet Signed-off-by: Ye Yu <yeyu@nvidia.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #883 +/- ##
=======================================
Coverage 73.73% 73.73%
=======================================
Files 199 199
Lines 21165 21165
=======================================
Hits 15606 15606
Misses 5559 5559 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change:
new feature
Overview:
This PR enables loading in a ModelOpt pretrained offline speculative decoding model (e.g., EAGLE3) and performs PTQ on it and export.
Usage
Follow the speculative_decoding examples to train an offline speculative decoding model first.
Then follow the command below to quantize and export it:
Testing
Before your PR is "Ready for review"
Additional Information