feat: configurable timeouts and concurrency with lenient defaults for high-burst workloads#14
Conversation
… high-burst workloads Expose `connectionTimeout`, `connectionAcquireTimeout`, `subscriptionTimeout`, and `maxConcurrency` on `SageMakerConfig.Builder`. Wire them through `SageMakerTransportFactory` (Netty client) and `SageMakerTransport` (input-publisher subscription wait) so high-concurrency callers can tune the chokepoints that the AWS Netty defaults expose under burst load. Move the defaults to values tuned for high-burst workloads: connectionTimeout AWS Netty default 2s -> 30s connectionAcquireTimeout AWS Netty default 10s -> 60s subscriptionTimeout (was hardcoded 30s) -> 60s maxConcurrency (was hardcoded 500) -> 500 (now tunable) Empirically validated against a 400-concurrent-stream burst test on a 10x ml.g6.2xlarge endpoint: with the previous AWS defaults, customers hit a wave of `connection acquire` and `connect timed out` errors in the first few seconds of the run that look like server-side problems but are really client-side fail-fast tripping early. With the new defaults, the underlying connect/acquire layer absorbs the burst and those error categories go to near-zero. (Note: this fix only addresses the transport layer; SDK-level retry-storm remains until paired with `reconnect(false)` in `deepgram-java-sdk` >= 0.4.0.) Adds 4 unit tests covering defaults, custom values, and validation of non-positive arguments. README updated with a "High-concurrency notes" section and the new parameter table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeocodes
left a comment
There was a problem hiding this comment.
appreciate the burst-test work, those AWS-side numbers going to ~0 are good to see. but on its own this PR doesn't fix the real problem, and deepgram/deepgram-java-sdk#45 on the SDK side is going the wrong way trying to compensate. I've left the detail over there.
the storm exists because every transient AWS error here goes straight to transport.onError(...) at SageMakerTransport.java:100-110 and the SDK happily reconnect-wraps it into Throttling-on-Throttling. the plugin already owns the AWS HTTP/2 client, the bidi stream, the publisher buffering, the lot. it should be the layer that decides what's recoverable and what's terminal. SageMaker should be invisible to the SDK.
what I'd like in here:
- retry-with-backoff around the bidi stream in
ensureConnected()(SageMakerTransport.java:77-129). wrapinvokeEndpointWithBidirectionalStream(...)with a budget + classification, soThrottlingException, connection-pool-exhausted, and transient connect failure all get retried internally and never fireerrorListeners. - reuse
StreamPublisher.pending(:239-290) across reconnect cycles so messages queued during an internal reset are drained on the new stream instead of dropped. - only propagate to listeners on terminal errors: auth, protocol, exhausted budget. the handler at
:100-110is the right gate. - while in here,
inputPublisher.awaitSubscription(...)ignores its boolean return at:121-128. the newsubscriptionTimeoutdoesn't actually fail fast today, it just waits and continues. either honour the return or drop the knob.
once the plugin absorbs storm handling internally, the SDK-side change becomes tiny. basically just the listener fixes I asked for on #45 (configurable connect timeout + clean maxRetries(0) semantics). no reconnect(boolean) flag, no auto-disable heuristic, no edits to generated WS clients. that keeps transportFactory() as the only thing the SDK knows about SageMaker, which is where I'd like to land.
the 30s / 60s / 60s / 500 defaults look fine. if anything the missing config surface here is retry budget + backoff, not looser timeouts.
approving the netty wiring + new knobs in principle, but I'd like the retry/classification work landed in this PR (or one stacked on it) before we touch #45.
…ent pending queue Bundles the four storm-handling asks from @lukeocodes' review of #14 into the existing PR. The plugin now owns end-to-end retry/backoff/classification for transient AWS failures so they never reach transport.onError(...) and the SDK's wrapper-level reconnect can be disabled via the new DeepgramTransportFactory.reconnectOptions() hook (paired with the SDK fix in deepgram/deepgram-java-sdk#45). SageMakerConfig — new retry knobs (defaults tuned for high-burst workloads): - maxRetries = 5 (set 0 to disable internal retry) - initialBackoff = 100 ms - maxBackoff = 5 s - backoffMultiplier = 2.0 (validated >= 1.0) - retryBudget = 30 s wall-clock cap across all retries - build() rejects initialBackoff > maxBackoff SageMakerTransport — internal storm absorption: - ensureConnected() wraps attemptConnect() in a budgeted retry loop with exponential backoff. Successful subscription resets the budget. - handleStreamError(): the response-handler onError gate from line 100 is now a classify-and-decide step. RETRYABLE + budget left triggers an internal reset (cancel future, complete publisher, mark disconnected) so the next send re-enters the loop on a fresh stream. TERMINAL or budget-exhausted surfaces to errorListeners as before. - classify(Throwable) walks the cause chain. Retryable: TimeoutException, ConnectException, IOException, AwsServiceException with status 429 or 5xx or error code containing "throttl", and SdkException whose message contains "acquire"/"pool"/"throttl"/"timeout". Terminal: 4xx other than 429, anything unmatched (default-deny for retry). - StreamPublisher.pending is hoisted up to a SageMakerTransport instance field so events queued during an internal reset survive across the publisher swap. The new publisher drains the shared queue on subscribe. - awaitSubscription now returns the boolean. Timeout throws TimeoutException, which flows into the retry loop as RETRYABLE — the new subscriptionTimeout knob now actually fails fast as advertised. SageMakerTransportFactory: - Overrides the new reconnectOptions() default to return ReconnectOptions.builder().maxRetries(0).build(). Combined with the SDK's auto-wiring in TransportWebSocketFactory, this disables the SDK's wrapper-level reconnect for any per-resource WebSocket client constructed against this factory — no user wiring required. build.gradle: - Bumps deepgram-java-sdk dep from 0.2.1 to 0.3.0 to pick up the new DeepgramTransportFactory.reconnectOptions() default method, ReconnectOptions.connectionTimeoutMs, the maxRetries(0) bug fix, and the applyOptionsOverride hook. Tests: - New SageMakerTransportRetryTest covers classify() across all branches (Timeout/Connect/IOException; AWS 401/403/429/5xx/throttling code; SdkException pool-keyword; cause-chain walking; default-terminal) and StreamPublisher behaviour (awaitSubscription false-on-timeout, true-on-subscribe; pending-queue persistence across publisher instances; immediate forward when subscriber present). - SageMakerTransportFactoryTest gains: retry-config defaults + customisation + validation, initialBackoff > maxBackoff rejection, and factoryDeclaresMaxRetriesZeroForReconnectOptions verifying the storm-suppression policy is published correctly. README: - Bumps the SDK dep version reference and notes the v0.3.0+ requirement. - Adds the new retry knobs to the configuration table. - New "Retry & storm absorption" section explaining the internal classification/budget design and how factory.reconnectOptions() auto-wires the SDK's reconnect-disable. End-to-end mock-AWS retry coverage isn't included here — the AWS reactive streams handler indirection makes deterministic stubbing fragile. Those paths are exercised by the existing burst test described in the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… connection establishment
|
Added robust error connection / buffer handling. Tested at high concurrency. |
## Summary - Add `ClientOptions.Builder.reconnect(boolean)` (default `true`). When `false`, the per-resource WebSocket clients connect once via a plain `okhttp3.WebSocketListener` and propagate failures directly, bypassing `ReconnectingWebSocketListener` (and its hardcoded `connectionFuture.get(4000, MILLISECONDS)` ceiling). - Auto-disable for custom transports: when `DeepgramClientBuilder.transportFactory(...)` (or its async counterpart) is set, `setAdditional()` also calls `builder.reconnect(false)`. Custom transports like `SageMakerTransportFactory` already manage their own connection lifecycle; wrapping them in another retry layer creates double-retry storms under burst load. - Patch applied to all four per-resource WebSocket clients: `listen/v1`, `listen/v2`, `agent/v1`, `speak/v1`. Same structural change in each: `connect()` branches on `clientOptions.reconnect()`; `disconnect()`, `sendMedia()`, `sendMessage()`, `assertSocketIsOpen()` use `directWebSocket` when the listener is null. - The four files are added to the "temporarily frozen" section of `.fernignore` so the patch survives the next Fern regen. Unfreeze once the change has been pushed upstream into the Fern generator template. ## Why Validated against a 400-concurrent-stream burst test on a 10× `ml.g6.2xlarge` endpoint (Deepgram on SageMaker, replicating a customer-reported failure): | Metric | Before | After (this PR + transport-side timeouts) | Δ | |---|---:|---:|---:| | `ThrottlingException` line-count | 68,909 | 240 | −99.7 % | | `ModelStreamError` line-count | 30,442 | 1,128 | −96.3 % | | Streams hitting AWS SDK `Attempt Count: 4` | 29,030 | 96 | −99.7 % | | 4-second `connection timeout after 4000` | 13 | 0 | eliminated | | Total error log lines | 1,322,666 | 33,972 | −97.4 % | | Wall time | 313.57 s | 312.61 s | unchanged | | Transcript count | 41,337 | 40,467 | unchanged | Same work done, ~97 % less error noise. Pairs with [`deepgram/deepgram-java-sdk-transport-sagemaker#14`](deepgram/deepgram-java-sdk-transport-sagemaker#14) (Layer 1 transport-side timeouts), but each provides independent value. ## Backwards compatibility **Existing callers that do NOT use `transportFactory(...)` see zero behavior change.** The new `reconnect()` field defaults to `true`, the `ReconnectingWebSocketListener`-wrapped code path is unchanged for the OkHttp WebSocket transport (Deepgram cloud, self-hosted Deepgram via raw WS), and the lower-level `ClientOptions.Builder.webSocketFactory(...)` seam is also unaffected (only the explicit `DeepgramClientBuilder.transportFactory(...)` API triggers the auto-disable). | Customer shape | `transportFactory != null`? | `reconnect()` | Wraps in `ReconnectingWebSocketListener`? | Behavior change | |---|---|---|---|---| | Deepgram Cloud (`wss://api.deepgram.com`) | no | `true` | yes (unchanged) | none | | Self-hosted Deepgram (raw WS) | no | `true` | yes (unchanged) | none | | Custom `WebSocketFactory` via `ClientOptions` (lower-level seam) | no | `true` | yes (unchanged) | none | | `transportFactory(SageMakerTransportFactory)` | yes | `false` (auto-set) | no (direct path) | yes — desired | | `transportFactory(<future custom transport>)` | yes | `false` (auto-set) | no (direct path) | yes; opt back in with `.reconnect(true)` | ## Edge case worth calling out A SageMaker caller who has been tuning retry behavior via `wsClient.reconnectOptions(customOpts)` will find those options **silently ignored** after this change (`reconnect=false` skips the listener entirely). Their tuning was almost certainly a workaround for the very storm we are now eliminating, so the right migration is to delete the `reconnectOptions(...)` call. Customers who genuinely want both a custom transport AND SDK-side reconnect can re-enable explicitly with `.reconnect(true)` after `.transportFactory(...)`: ```java DeepgramClient.builder() .transportFactory(sagemakerFactory) .reconnect(true) // explicit override; not recommended for SageMaker .build(); ``` This is a runtime-quality change, not an API break — no compile errors for any existing customer. Recommend mentioning in CHANGELOG and migration notes for the next minor release. ## Test plan - [x] `./gradlew build` — passes (spotless + compile + tests). - [x] `./gradlew test` — all existing tests pass. - [x] End-to-end validation with patched JAR linked into a customer load-test harness against a real SageMaker endpoint (numbers above). - [ ] CI passes. - [ ] Push template change upstream into the Fern generator and unfreeze the four per-resource WebSocket client files in `.fernignore`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Greg Holmes <greg.holmes@deepgram.com>
Align the transport metadata and Flux examples with the latest typed Listen V2 model API so the library continues to build cleanly against the current SDK release.
🤖 I have created a release *beep* *boop* --- ## [0.1.3](v0.1.2...v0.1.3) (2026-05-06) ### Features * configurable timeouts and concurrency with lenient defaults for high-burst workloads ([#14](#14)) ([78b6338](78b6338)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
connectionTimeout,connectionAcquireTimeout,subscriptionTimeout, andmaxConcurrencyonSageMakerConfig.Builder. Each is validated> 0/ non-null at build time. PublicDEFAULT_*constants onSageMakerConfigdocument the chosen defaults.SageMakerTransportFactory(NettyconnectionTimeout/connectionAcquisitionTimeout/maxConcurrency) andSageMakerTransport(input-publisher subscription wait atensureConnected()).connectionTimeoutconnectionAcquireTimeoutsubscriptionTimeoutmaxConcurrencyWhy this matters
Empirically validated against a 400-concurrent-stream burst test on a 10×
ml.g6.2xlargeendpoint, replicating a customer report:Connection pool exhausted (Acquire took longer)TCP connect timed outTCP Connection resetOut of the box, callers no longer have to know that AWS's general-purpose Netty defaults bite hard on the SageMaker streaming path; the transport ships sane-for-its-use-case defaults. Anyone who wants fail-fast behavior can tighten the knobs explicitly:
Scope and what this does NOT fix
This PR addresses the transport layer only. The SDK-level retry storm in
com.deepgram:deepgram-java-sdk(the hardcodedconnectionFuture.get(4000, MILLISECONDS)inReconnectingWebSocketListener.connect()plus the listener's defaultmaxRetries = Integer.MAX_VALUE) is not touched here and was empirically observed to remain even with this PR's transport fixes — the bottleneck simply moves inward to the SDK ceiling.The full burst-handling fix requires pairing this PR with the corresponding change in
deepgram-java-sdkthat auto-disablesReconnectingWebSocketListenerwhen a custom transport is configured (ClientOptions.reconnect(false)+ auto-set inDeepgramClientBuilder.transportFactory(...)). With both layers in place, end-to-end error counts drop ~97 % at 400 concurrent streams.Backwards compatibility
Adds new builder methods only; no existing API is removed or modified. Existing callers using
SageMakerConfig.builder().endpointName(...).region(...).build()get the new defaults automatically — same calling convention, more lenient runtime behavior. No code changes required for existing applications.Test plan
./gradlew :sagemaker-transport:test— all tests pass (18/18; 4 new tests for defaults, custom values, and validation of non-positive arguments).Connection pool exhausted/TCP connect timed outreduction to ~0.🤖 Generated with Claude Code