SageMaker transport for the Deepgram Java SDK. Uses AWS SageMaker's HTTP/2 bidirectional streaming API as an alternative to WebSocket, allowing transparent switching between Deepgram Cloud and Deepgram on SageMaker.
dependencies {
implementation 'com.deepgram:deepgram-java-sdk:0.4.0'
implementation 'com.deepgram:deepgram-sagemaker:0.1.3' // x-release-please-version
}<dependency>
<groupId>com.deepgram</groupId>
<artifactId>deepgram-sagemaker</artifactId>
<version>0.1.3</version> <!-- x-release-please-version -->
</dependency>- Java 11+
- Deepgram Java SDK v0.4.0+ (the
default ReconnectOptions reconnectOptions()hook onDeepgramTransportFactoryis required for storm absorption) - AWS credentials configured (environment variables, shared credentials file, or IAM role)
- A Deepgram model deployed to an AWS SageMaker endpoint
This transport uses AWS credentials, not Deepgram API keys. Authentication is handled by the AWS SDK's default credential provider chain:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - Shared credentials file (
~/.aws/credentials) - IAM instance role (when running on EC2/ECS/Lambda)
The apiKey parameter on the Deepgram client builder is unused when a SageMaker transport is configured, but a value must be provided (the builder requires it).
import com.deepgram.DeepgramClient;
import com.deepgram.sagemaker.SageMakerConfig;
import com.deepgram.sagemaker.SageMakerTransportFactory;
import com.deepgram.resources.listen.v1.websocket.V1ConnectOptions;
import com.deepgram.resources.listen.v1.websocket.V1WebSocketClient;
import com.deepgram.types.ListenV1Model;
// 1. Configure the SageMaker transport
SageMakerTransportFactory factory = new SageMakerTransportFactory(
SageMakerConfig.builder()
.endpointName("my-deepgram-endpoint")
.region("us-west-2")
.build()
);
// 2. Create the Deepgram client with the transport factory
DeepgramClient client = DeepgramClient.builder()
.apiKey("unused")
.transportFactory(factory)
.build();
// 3. Use the SDK exactly as normal
V1WebSocketClient ws = client.listen().v1().v1WebSocket();
ws.onResults(results -> {
String transcript = results.getChannel()
.getAlternatives().get(0)
.getTranscript();
System.out.println(transcript);
});
ws.connect(V1ConnectOptions.builder()
.model(ListenV1Model.NOVA3)
.build());
ws.sendMedia(audioBytes);
ws.close();The transport is transparent — the SDK API is identical whether using Deepgram Cloud or Deepgram on SageMaker.
| Parameter | Required | Default | Description |
|---|---|---|---|
endpointName |
Yes | — | SageMaker endpoint name |
region |
No | us-west-2 |
AWS region |
connectionTimeout |
No | 30s |
Max time for the underlying TCP/TLS connect (AWS Netty default is 2 s — bumped here so cold-start endpoints under burst load have time to accept TLS handshakes). |
connectionAcquireTimeout |
No | 60s |
Max time to acquire a connection from the Netty pool (AWS Netty default is 10 s — bumped so a 200–500-stream burst doesn't drain the acquire pool). |
subscriptionTimeout |
No | 60s |
Max time the transport waits for the AWS SDK to subscribe to the bidi-stream input publisher before failing. A timeout here is treated as a transient connect failure and counts against maxRetries / retryBudget. |
maxConcurrency |
No | 500 |
Max simultaneous in-flight HTTP/2 streams across the shared Netty pool. With maxStreams=1 this is the cap on simultaneous bidirectional streams. |
maxRetries |
No | 5 |
Max retries on transient AWS errors (throttling, pool-exhausted, transient connect/timeout). Set to 0 to disable internal retry. Terminal errors (auth, validation) bypass this. |
initialBackoff |
No | 100ms |
First backoff delay applied after the initial failure. |
maxBackoff |
No | 5s |
Cap on the per-attempt backoff delay regardless of multiplier. |
backoffMultiplier |
No | 2.0 |
Exponential growth factor between retry attempts. Must be >= 1.0. |
retryBudget |
No | 30s |
Total wall-clock cap across all retry attempts before giving up and surfacing the error to listeners. |
SageMakerConfig config = SageMakerConfig.builder()
.endpointName("my-deepgram-endpoint")
.region("us-east-2")
.build();The transport's defaults are tuned for high-burst workloads (large numbers of
streams opened in a tight loop against an endpoint that may need to scale up).
If you're opening 200–500 streams simultaneously against a cold endpoint,
the AWS Netty defaults (2 s connect / 10 s acquire) will fire before
the load balancer has accepted all of the inbound TLS handshakes — you'll
see a wave of connection acquire and connect timed out errors that look
like server-side problems but are really client-side fail-fast tripping early.
This transport ships with more lenient defaults (30 s / 60 s) so the common high-concurrency path works out of the box. Tighten them if you need fail-fast behavior in low-latency pipelines:
SageMakerConfig config = SageMakerConfig.builder()
.endpointName("my-deepgram-endpoint")
.region("us-east-2")
.connectionTimeout(Duration.ofSeconds(5))
.connectionAcquireTimeout(Duration.ofSeconds(15))
.build();Transient AWS-side failures (ThrottlingException, connection-pool exhaustion, transient
connect/timeout failures) are absorbed by the transport itself: classified as retryable, retried
with exponential backoff up to maxRetries and retryBudget, with messages enqueued during the
reset window persisted across the reconnect so audio isn't dropped. Only terminal errors (auth,
validation) and budget-exhausted retryable errors propagate to transport.onError(...) and reach
the application's error handler.
This means the SDK's wrapper-level reconnect (ReconnectingWebSocketListener) would compound the
plugin's internal retries into a Throttling-on-Throttling storm under burst load, so the plugin
declares ReconnectOptions.builder().maxRetries(0).build() via the
DeepgramTransportFactory.reconnectOptions() hook. The SDK applies it automatically when it sees
a transportFactory in use; no user wiring required.
To tune retry behavior:
SageMakerConfig config = SageMakerConfig.builder()
.endpointName("my-deepgram-endpoint")
.maxRetries(10)
.initialBackoff(Duration.ofMillis(200))
.maxBackoff(Duration.ofSeconds(10))
.retryBudget(Duration.ofMinutes(1))
.build();Set maxRetries(0) to disable internal retry entirely (every transient AWS error then surfaces
immediately to the application).
The default new SageMakerTransportFactory(config) constructor backs every factory instance with
a process-wide shared SageMakerRuntimeHttp2AsyncClient, keyed by the parts of
SageMakerConfig that affect the underlying Netty HTTP/2 client (region, max concurrency,
connect/acquire timeouts). Multiple factories built with the same config fingerprint reuse one
Netty event loop group and one connection pool — so naive code that constructs a fresh factory
per stream still gets a single, well-behaved client underneath.
Without sharing, every factory instantiates its own Netty pool, and a burst of N factories triggers N simultaneous TLS handshakes from N distinct Netty clients against the same SageMaker endpoint. Under high concurrency (100+ streams) the SageMaker HTTP/2 frontline silently drops a large fraction of those streams before they ever reach the model container — verified end-to-end with CloudWatch logs from a 400-stream burst test against a 1× ml.g6.2xlarge endpoint: without sharing, ~65% of streams never appeared in the Deepgram container's listen log; with sharing, the burst behaves the same as the canonical Python load-test harness.
Lifecycle:
| Constructor | Client backing | factory.shutdown() |
|---|---|---|
SageMakerTransportFactory(config) |
shared (lazy-init, keyed by config fingerprint) | no-op — call SageMakerTransportFactory.shutdownAllSharedClients() once at app shutdown to release Netty resources |
SageMakerTransportFactory(config, smClient) |
caller-provided (BYO, used for testing or custom credential providers) | no-op — caller owns the client lifecycle |
// At app shutdown — releases all shared Netty pools the plugin lazily created.
Runtime.getRuntime().addShutdownHook(new Thread(SageMakerTransportFactory::shutdownAllSharedClients));For custom credential providers, proxy configuration, or testing:
import software.amazon.awssdk.services.sagemakerruntimehttp2.SageMakerRuntimeHttp2AsyncClient;
SageMakerRuntimeHttp2AsyncClient customClient = SageMakerRuntimeHttp2AsyncClient.builder()
.region(Region.US_WEST_2)
.credentialsProvider(myCredentialsProvider)
.build();
SageMakerTransportFactory factory = new SageMakerTransportFactory(config, customClient);The standard Deepgram SDK connects via WebSocket to wss://api.deepgram.com. This transport replaces that connection with HTTP/2 streaming to your SageMaker endpoint:
Standard: SDK → WebSocket → api.deepgram.com → Deepgram cloud
SageMaker: SDK → HTTP/2 → SageMaker endpoint → Your Deepgram model
Under the hood, the transport uses the AWS SDK v2's InvokeEndpointWithBidirectionalStream API for true bidirectional HTTP/2 streaming — audio chunks are sent and transcript responses received concurrently over a single connection.
This is a multi-module Gradle project:
deepgram-sagemaker-java/
├── sagemaker-transport/ # SageMaker implementation (SageMakerTransport, SageMakerConfig)
└── examples/ # Usage examples
# AWS credentials must be configured
export AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.SageMakerTransportExampleexport AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.LiveMicSageMakerExampleexport AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-flux-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.LiveMicFluxSageMakerExampleThe SageMaker transport adds these dependencies (not included in the core SDK):
| Dependency | Version | Purpose |
|---|---|---|
software.amazon.awssdk:sagemakerruntimehttp2 |
2.42.x | SageMaker HTTP/2 bidirectional streaming client |
software.amazon.awssdk:netty-nio-client |
2.42.x | HTTP/2 via Netty |
Requires Java 11+. If Gradle can't find your JDK, set it in gradle.properties:
org.gradle.java.home=/path/to/your/jdkRun tests:
./gradlew buildThis project is licensed under the MIT License - see the LICENSE file for details.