Fix num_logits_to_keep default in decoder_forward and add get_available_devices() by Suh0161 · Pull Request #1669 · huggingface/transformers.js

Suh0161 · 2026-05-01T17:38:49Z

Closes #1666, closes #1643.

Fix: `num_logits_to_keep` defaults to `0n` instead of `1n` (#1666)

decoder_forward() had a comment correctly explaining that num_logits_to_keep=1 reduces memory during generation, but the code set 0n — causing the ONNX model to compute logits for the entire prompt instead of just the last token. For Gemma 4 with a 20k token prompt and 262k vocabulary this wastes ~20 GB of memory and can cause OOM crashes.

Fix: Change [0n] → [1n] to match both the comment and the behavior of decoder_prepare_inputs_for_generation().

Feature: `get_available_devices()` (#1643)

The supportedDevices list already existed inside the ONNX backend but was never exposed to users. This adds a public get_available_devices() function:

import { get_available_devices } from '@huggingface/transformers';

const devices = get_available_devices();
// Node.js (Windows):   ['dml', 'webgpu', 'cpu']
// Node.js (Linux x64): ['cuda', 'webgpu', 'cpu']
// Browser (WebGPU):    ['webgpu', 'wasm']
// Browser (no WebGPU): ['wasm']


---

Also note: **#1182 already has open PR #1190**, so I skipped that one to avoid duplicating work.

…ing to tokenizer - TokenClassificationPipeline now populates start/end character offsets on every raw token result by scanning forward through the original text. Grouped results (aggregation_strategy='simple') carry the span of the first-to-last token in the group. - PreTrainedTokenizer._call now accepts return_offsets_mapping: true, which adds an offset_mapping field ([start, end) per token) to the encoding. Works for single strings and batched input; handles padding with [0,0] and strips the field before tensor conversion so it is never tensorized. - Adds computeOffsets() helper with case-insensitive fallback for uncased tokenizers (e.g. bert-base-uncased). Closes huggingface#425, closes huggingface#633.

- Fix decoder_forward() defaulting num_logits_to_keep to 0n instead of 1n. The comment correctly stated the value should be 1 to avoid computing logits for the entire prompt sequence, but the code contradicted it. For models like Gemma 4 with long contexts and large vocabularies this caused ~20 GB of unnecessary memory allocation during generation. Closes huggingface#1666. - Add get_available_devices() to the public API. The underlying supportedDevices list already existed in the ONNX backend but was not accessible to users. Returns a copy of the device list sorted by priority/performance for the current environment (Node.js, browser, Electron). Closes huggingface#1643.

Suh0161 · 2026-05-01T18:30:38Z

Closing in favor of #1670, which contains this work in the same commit stack.

Suh0161 added 2 commits May 1, 2026 18:16

Suh0161 closed this May 1, 2026

Suh0161 mentioned this pull request May 1, 2026

Auto device fallback on provider failure and tokenizer_options passthrough in Text2TextGenerationPipeline #1670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix num_logits_to_keep default in decoder_forward and add get_available_devices()#1669

Fix num_logits_to_keep default in decoder_forward and add get_available_devices()#1669
Suh0161 wants to merge 2 commits into
huggingface:mainfrom
Suh0161:feat/num-logits-and-available-devices

Suh0161 commented May 1, 2026

Uh oh!

Suh0161 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Suh0161 commented May 1, 2026

Fix: num_logits_to_keep defaults to 0n instead of 1n (#1666)

Feature: get_available_devices() (#1643)

Uh oh!

Suh0161 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: `num_logits_to_keep` defaults to `0n` instead of `1n` (#1666)

Feature: `get_available_devices()` (#1643)