feat: configurable timeouts and concurrency with lenient defaults for high-burst workloads by VWang1111 · Pull Request #14 · deepgram/deepgram-java-sdk-transport-sagemaker

VWang1111 · 2026-04-28T22:33:46Z

Summary

Expose connectionTimeout, connectionAcquireTimeout, subscriptionTimeout, and maxConcurrency on SageMakerConfig.Builder. Each is validated > 0 / non-null at build time. Public DEFAULT_* constants on SageMakerConfig document the chosen defaults.
Wire them through SageMakerTransportFactory (Netty connectionTimeout / connectionAcquisitionTimeout / maxConcurrency) and SageMakerTransport (input-publisher subscription wait at ensureConnected()).
Move defaults to values tuned for high-burst workloads instead of inheriting the AWS Netty defaults, which fail-fast in a way that's actively wrong for SageMaker bidi streaming under bursts.

Knob	Old default	New default	Why
`connectionTimeout`	AWS Netty default = 2 s	30 s	Cold-start endpoints under a 200–500-stream burst can't accept TLS handshakes in 2 s
`connectionAcquireTimeout`	AWS Netty default = 10 s	60 s	A 400-stream burst drains the acquire pool past 10 s
`subscriptionTimeout`	(was hardcoded 30 s)	60 s	Slow first-stream subscribe under burst
`maxConcurrency`	(was hardcoded 500)	500 (now tunable)	No change — just made it tunable

Why this matters

Empirically validated against a 400-concurrent-stream burst test on a 10× ml.g6.2xlarge endpoint, replicating a customer report:

Error category	Before (Maven Central 0.1.2 + AWS defaults)	After (this PR)
`Connection pool exhausted (Acquire took longer)`	3,230	~0
`TCP connect timed out`	706	~0
`TCP Connection reset`	3,528	sharply down

Out of the box, callers no longer have to know that AWS's general-purpose Netty defaults bite hard on the SageMaker streaming path; the transport ships sane-for-its-use-case defaults. Anyone who wants fail-fast behavior can tighten the knobs explicitly:

SageMakerConfig config = SageMakerConfig.builder()
    .endpointName("my-deepgram-endpoint")
    .region("us-east-2")
    .connectionTimeout(Duration.ofSeconds(5))
    .connectionAcquireTimeout(Duration.ofSeconds(15))
    .build();

Scope and what this does NOT fix

This PR addresses the transport layer only. The SDK-level retry storm in com.deepgram:deepgram-java-sdk (the hardcoded connectionFuture.get(4000, MILLISECONDS) in ReconnectingWebSocketListener.connect() plus the listener's default maxRetries = Integer.MAX_VALUE) is not touched here and was empirically observed to remain even with this PR's transport fixes — the bottleneck simply moves inward to the SDK ceiling.

The full burst-handling fix requires pairing this PR with the corresponding change in deepgram-java-sdk that auto-disables ReconnectingWebSocketListener when a custom transport is configured (ClientOptions.reconnect(false) + auto-set in DeepgramClientBuilder.transportFactory(...)). With both layers in place, end-to-end error counts drop ~97 % at 400 concurrent streams.

Backwards compatibility

Adds new builder methods only; no existing API is removed or modified. Existing callers using SageMakerConfig.builder().endpointName(...).region(...).build() get the new defaults automatically — same calling convention, more lenient runtime behavior. No code changes required for existing applications.

Test plan

./gradlew :sagemaker-transport:test — all tests pass (18/18; 4 new tests for defaults, custom values, and validation of non-positive arguments).
Local build of the patched JAR consumed by a load-test harness against a real SageMaker endpoint.
Validation run vs. baseline measured a Connection pool exhausted / TCP connect timed out reduction to ~0.
CI passes.

🤖 Generated with Claude Code

… high-burst workloads Expose `connectionTimeout`, `connectionAcquireTimeout`, `subscriptionTimeout`, and `maxConcurrency` on `SageMakerConfig.Builder`. Wire them through `SageMakerTransportFactory` (Netty client) and `SageMakerTransport` (input-publisher subscription wait) so high-concurrency callers can tune the chokepoints that the AWS Netty defaults expose under burst load. Move the defaults to values tuned for high-burst workloads: connectionTimeout AWS Netty default 2s -> 30s connectionAcquireTimeout AWS Netty default 10s -> 60s subscriptionTimeout (was hardcoded 30s) -> 60s maxConcurrency (was hardcoded 500) -> 500 (now tunable) Empirically validated against a 400-concurrent-stream burst test on a 10x ml.g6.2xlarge endpoint: with the previous AWS defaults, customers hit a wave of `connection acquire` and `connect timed out` errors in the first few seconds of the run that look like server-side problems but are really client-side fail-fast tripping early. With the new defaults, the underlying connect/acquire layer absorbs the burst and those error categories go to near-zero. (Note: this fix only addresses the transport layer; SDK-level retry-storm remains until paired with `reconnect(false)` in `deepgram-java-sdk` >= 0.4.0.) Adds 4 unit tests covering defaults, custom values, and validation of non-positive arguments. README updated with a "High-concurrency notes" section and the new parameter table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lukeocodes

appreciate the burst-test work, those AWS-side numbers going to ~0 are good to see. but on its own this PR doesn't fix the real problem, and deepgram/deepgram-java-sdk#45 on the SDK side is going the wrong way trying to compensate. I've left the detail over there.

the storm exists because every transient AWS error here goes straight to transport.onError(...) at SageMakerTransport.java:100-110 and the SDK happily reconnect-wraps it into Throttling-on-Throttling. the plugin already owns the AWS HTTP/2 client, the bidi stream, the publisher buffering, the lot. it should be the layer that decides what's recoverable and what's terminal. SageMaker should be invisible to the SDK.

what I'd like in here:

retry-with-backoff around the bidi stream in ensureConnected() (SageMakerTransport.java:77-129). wrap invokeEndpointWithBidirectionalStream(...) with a budget + classification, so ThrottlingException, connection-pool-exhausted, and transient connect failure all get retried internally and never fire errorListeners.
reuse StreamPublisher.pending (:239-290) across reconnect cycles so messages queued during an internal reset are drained on the new stream instead of dropped.
only propagate to listeners on terminal errors: auth, protocol, exhausted budget. the handler at :100-110 is the right gate.
while in here, inputPublisher.awaitSubscription(...) ignores its boolean return at :121-128. the new subscriptionTimeout doesn't actually fail fast today, it just waits and continues. either honour the return or drop the knob.

once the plugin absorbs storm handling internally, the SDK-side change becomes tiny. basically just the listener fixes I asked for on #45 (configurable connect timeout + clean maxRetries(0) semantics). no reconnect(boolean) flag, no auto-disable heuristic, no edits to generated WS clients. that keeps transportFactory() as the only thing the SDK knows about SageMaker, which is where I'd like to land.

the 30s / 60s / 60s / 500 defaults look fine. if anything the missing config surface here is retry budget + backoff, not looser timeouts.

approving the netty wiring + new knobs in principle, but I'd like the retry/classification work landed in this PR (or one stacked on it) before we touch #45.

@lukeocodes

…ent pending queue Bundles the four storm-handling asks from @lukeocodes' review of #14 into the existing PR. The plugin now owns end-to-end retry/backoff/classification for transient AWS failures so they never reach transport.onError(...) and the SDK's wrapper-level reconnect can be disabled via the new DeepgramTransportFactory.reconnectOptions() hook (paired with the SDK fix in deepgram/deepgram-java-sdk#45). SageMakerConfig — new retry knobs (defaults tuned for high-burst workloads): - maxRetries = 5 (set 0 to disable internal retry) - initialBackoff = 100 ms - maxBackoff = 5 s - backoffMultiplier = 2.0 (validated >= 1.0) - retryBudget = 30 s wall-clock cap across all retries - build() rejects initialBackoff > maxBackoff SageMakerTransport — internal storm absorption: - ensureConnected() wraps attemptConnect() in a budgeted retry loop with exponential backoff. Successful subscription resets the budget. - handleStreamError(): the response-handler onError gate from line 100 is now a classify-and-decide step. RETRYABLE + budget left triggers an internal reset (cancel future, complete publisher, mark disconnected) so the next send re-enters the loop on a fresh stream. TERMINAL or budget-exhausted surfaces to errorListeners as before. - classify(Throwable) walks the cause chain. Retryable: TimeoutException, ConnectException, IOException, AwsServiceException with status 429 or 5xx or error code containing "throttl", and SdkException whose message contains "acquire"/"pool"/"throttl"/"timeout". Terminal: 4xx other than 429, anything unmatched (default-deny for retry). - StreamPublisher.pending is hoisted up to a SageMakerTransport instance field so events queued during an internal reset survive across the publisher swap. The new publisher drains the shared queue on subscribe. - awaitSubscription now returns the boolean. Timeout throws TimeoutException, which flows into the retry loop as RETRYABLE — the new subscriptionTimeout knob now actually fails fast as advertised. SageMakerTransportFactory: - Overrides the new reconnectOptions() default to return ReconnectOptions.builder().maxRetries(0).build(). Combined with the SDK's auto-wiring in TransportWebSocketFactory, this disables the SDK's wrapper-level reconnect for any per-resource WebSocket client constructed against this factory — no user wiring required. build.gradle: - Bumps deepgram-java-sdk dep from 0.2.1 to 0.3.0 to pick up the new DeepgramTransportFactory.reconnectOptions() default method, ReconnectOptions.connectionTimeoutMs, the maxRetries(0) bug fix, and the applyOptionsOverride hook. Tests: - New SageMakerTransportRetryTest covers classify() across all branches (Timeout/Connect/IOException; AWS 401/403/429/5xx/throttling code; SdkException pool-keyword; cause-chain walking; default-terminal) and StreamPublisher behaviour (awaitSubscription false-on-timeout, true-on-subscribe; pending-queue persistence across publisher instances; immediate forward when subscriber present). - SageMakerTransportFactoryTest gains: retry-config defaults + customisation + validation, initialBackoff > maxBackoff rejection, and factoryDeclaresMaxRetriesZeroForReconnectOptions verifying the storm-suppression policy is published correctly. README: - Bumps the SDK dep version reference and notes the v0.3.0+ requirement. - Adds the new retry knobs to the configuration table. - New "Retry & storm absorption" section explaining the internal classification/budget design and how factory.reconnectOptions() auto-wires the SDK's reconnect-disable. End-to-end mock-AWS retry coverage isn't included here — the AWS reactive streams handler indirection makes deterministic stubbing fragile. Those paths are exercised by the existing burst test described in the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… connection establishment

VWang1111 · 2026-05-04T05:12:59Z

Added robust error connection / buffer handling. Tested at high concurrency.

## Summary - Add `ClientOptions.Builder.reconnect(boolean)` (default `true`). When `false`, the per-resource WebSocket clients connect once via a plain `okhttp3.WebSocketListener` and propagate failures directly, bypassing `ReconnectingWebSocketListener` (and its hardcoded `connectionFuture.get(4000, MILLISECONDS)` ceiling). - Auto-disable for custom transports: when `DeepgramClientBuilder.transportFactory(...)` (or its async counterpart) is set, `setAdditional()` also calls `builder.reconnect(false)`. Custom transports like `SageMakerTransportFactory` already manage their own connection lifecycle; wrapping them in another retry layer creates double-retry storms under burst load. - Patch applied to all four per-resource WebSocket clients: `listen/v1`, `listen/v2`, `agent/v1`, `speak/v1`. Same structural change in each: `connect()` branches on `clientOptions.reconnect()`; `disconnect()`, `sendMedia()`, `sendMessage()`, `assertSocketIsOpen()` use `directWebSocket` when the listener is null. - The four files are added to the "temporarily frozen" section of `.fernignore` so the patch survives the next Fern regen. Unfreeze once the change has been pushed upstream into the Fern generator template. ## Why Validated against a 400-concurrent-stream burst test on a 10× `ml.g6.2xlarge` endpoint (Deepgram on SageMaker, replicating a customer-reported failure): | Metric | Before | After (this PR + transport-side timeouts) | Δ | |---|---:|---:|---:| | `ThrottlingException` line-count | 68,909 | 240 | −99.7 % | | `ModelStreamError` line-count | 30,442 | 1,128 | −96.3 % | | Streams hitting AWS SDK `Attempt Count: 4` | 29,030 | 96 | −99.7 % | | 4-second `connection timeout after 4000` | 13 | 0 | eliminated | | Total error log lines | 1,322,666 | 33,972 | −97.4 % | | Wall time | 313.57 s | 312.61 s | unchanged | | Transcript count | 41,337 | 40,467 | unchanged | Same work done, ~97 % less error noise. Pairs with [`deepgram/deepgram-java-sdk-transport-sagemaker#14`](deepgram/deepgram-java-sdk-transport-sagemaker#14) (Layer 1 transport-side timeouts), but each provides independent value. ## Backwards compatibility **Existing callers that do NOT use `transportFactory(...)` see zero behavior change.** The new `reconnect()` field defaults to `true`, the `ReconnectingWebSocketListener`-wrapped code path is unchanged for the OkHttp WebSocket transport (Deepgram cloud, self-hosted Deepgram via raw WS), and the lower-level `ClientOptions.Builder.webSocketFactory(...)` seam is also unaffected (only the explicit `DeepgramClientBuilder.transportFactory(...)` API triggers the auto-disable). | Customer shape | `transportFactory != null`? | `reconnect()` | Wraps in `ReconnectingWebSocketListener`? | Behavior change | |---|---|---|---|---| | Deepgram Cloud (`wss://api.deepgram.com`) | no | `true` | yes (unchanged) | none | | Self-hosted Deepgram (raw WS) | no | `true` | yes (unchanged) | none | | Custom `WebSocketFactory` via `ClientOptions` (lower-level seam) | no | `true` | yes (unchanged) | none | | `transportFactory(SageMakerTransportFactory)` | yes | `false` (auto-set) | no (direct path) | yes — desired | | `transportFactory(<future custom transport>)` | yes | `false` (auto-set) | no (direct path) | yes; opt back in with `.reconnect(true)` | ## Edge case worth calling out A SageMaker caller who has been tuning retry behavior via `wsClient.reconnectOptions(customOpts)` will find those options **silently ignored** after this change (`reconnect=false` skips the listener entirely). Their tuning was almost certainly a workaround for the very storm we are now eliminating, so the right migration is to delete the `reconnectOptions(...)` call. Customers who genuinely want both a custom transport AND SDK-side reconnect can re-enable explicitly with `.reconnect(true)` after `.transportFactory(...)`: ```java DeepgramClient.builder() .transportFactory(sagemakerFactory) .reconnect(true) // explicit override; not recommended for SageMaker .build(); ``` This is a runtime-quality change, not an API break — no compile errors for any existing customer. Recommend mentioning in CHANGELOG and migration notes for the next minor release. ## Test plan - [x] `./gradlew build` — passes (spotless + compile + tests). - [x] `./gradlew test` — all existing tests pass. - [x] End-to-end validation with patched JAR linked into a customer load-test harness against a real SageMaker endpoint (numbers above). - [ ] CI passes. - [ ] Push template change upstream into the Fern generator and unfreeze the four per-resource WebSocket client files in `.fernignore`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg Holmes <greg.holmes@deepgram.com>

Align the transport metadata and Flux examples with the latest typed Listen V2 model API so the library continues to build cleanly against the current SDK release.

🤖 I have created a release *beep* *boop* --- ## [0.1.3](v0.1.2...v0.1.3) (2026-05-06) ### Features * configurable timeouts and concurrency with lenient defaults for high-burst workloads ([#14](#14)) ([78b6338](78b6338)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

VWang1111 requested a review from lukeocodes as a code owner April 28, 2026 22:33

VWang1111 mentioned this pull request Apr 28, 2026

fix(reconnect): listener bug fixes + transport factory policy hook deepgram/deepgram-java-sdk#45

Merged

5 tasks

lukeocodes requested a review from GregHolmes April 29, 2026 08:24

lukeocodes requested changes Apr 29, 2026

View reviewed changes

VWang1111 and others added 5 commits April 29, 2026 15:00

Manage throttling errors when handling payload parts rather than just…

113309d

… connection establishment

Untested aws do not retry strategy

87ae582

Update retry classification logic

4fc0bce

Update buffer management

a6f4f41

VWang1111 added 3 commits May 4, 2026 02:11

Add Jitter for backoff

91754f7

add tests for jitter

3e2fc61

Allow option tuning of health check ping and netty event loop threads

87bad38

lukeocodes approved these changes May 5, 2026

View reviewed changes

Update Deepgram SDK dependency to 0.4.0

7fc22a9

Align the transport metadata and Flux examples with the latest typed Listen V2 model API so the library continues to build cleanly against the current SDK release.

GregHolmes merged commit 78b6338 into main May 6, 2026
7 checks passed

github-actions Bot mentioned this pull request May 6, 2026

chore(main): release 0.1.3 #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: configurable timeouts and concurrency with lenient defaults for high-burst workloads#14

feat: configurable timeouts and concurrency with lenient defaults for high-burst workloads#14
GregHolmes merged 10 commits into
mainfrom
feat/configurable-timeouts-and-concurrency

VWang1111 commented Apr 28, 2026

Uh oh!

lukeocodes left a comment

Uh oh!

VWang1111 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

VWang1111 commented Apr 28, 2026

Summary

Why this matters

Scope and what this does NOT fix

Backwards compatibility

Test plan

Uh oh!

lukeocodes left a comment

Choose a reason for hiding this comment

Uh oh!

VWang1111 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants