Skip to content

deepgram/deepgram-java-sdk-transport-sagemaker

Repository files navigation

Deepgram SageMaker Transport for Java

Java Version License: MIT

SageMaker transport for the Deepgram Java SDK. Uses AWS SageMaker's HTTP/2 bidirectional streaming API as an alternative to WebSocket, allowing transparent switching between Deepgram Cloud and Deepgram on SageMaker.

Installation

Gradle

dependencies {
    implementation 'com.deepgram:deepgram-java-sdk:0.4.0'
    implementation 'com.deepgram:deepgram-sagemaker:0.1.3' // x-release-please-version
}

Maven

<dependency>
    <groupId>com.deepgram</groupId>
    <artifactId>deepgram-sagemaker</artifactId>
    <version>0.1.3</version> <!-- x-release-please-version -->
</dependency>

Requirements

  • Java 11+
  • Deepgram Java SDK v0.4.0+ (the default ReconnectOptions reconnectOptions() hook on DeepgramTransportFactory is required for storm absorption)
  • AWS credentials configured (environment variables, shared credentials file, or IAM role)
  • A Deepgram model deployed to an AWS SageMaker endpoint

Authentication

This transport uses AWS credentials, not Deepgram API keys. Authentication is handled by the AWS SDK's default credential provider chain:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. Shared credentials file (~/.aws/credentials)
  3. IAM instance role (when running on EC2/ECS/Lambda)

The apiKey parameter on the Deepgram client builder is unused when a SageMaker transport is configured, but a value must be provided (the builder requires it).

Quickstart

import com.deepgram.DeepgramClient;
import com.deepgram.sagemaker.SageMakerConfig;
import com.deepgram.sagemaker.SageMakerTransportFactory;
import com.deepgram.resources.listen.v1.websocket.V1ConnectOptions;
import com.deepgram.resources.listen.v1.websocket.V1WebSocketClient;
import com.deepgram.types.ListenV1Model;

// 1. Configure the SageMaker transport
SageMakerTransportFactory factory = new SageMakerTransportFactory(
    SageMakerConfig.builder()
        .endpointName("my-deepgram-endpoint")
        .region("us-west-2")
        .build()
);

// 2. Create the Deepgram client with the transport factory
DeepgramClient client = DeepgramClient.builder()
    .apiKey("unused")
    .transportFactory(factory)
    .build();

// 3. Use the SDK exactly as normal
V1WebSocketClient ws = client.listen().v1().v1WebSocket();

ws.onResults(results -> {
    String transcript = results.getChannel()
        .getAlternatives().get(0)
        .getTranscript();
    System.out.println(transcript);
});

ws.connect(V1ConnectOptions.builder()
    .model(ListenV1Model.NOVA3)
    .build());

ws.sendMedia(audioBytes);
ws.close();

The transport is transparent — the SDK API is identical whether using Deepgram Cloud or Deepgram on SageMaker.

Configuration

SageMakerConfig

Parameter Required Default Description
endpointName Yes SageMaker endpoint name
region No us-west-2 AWS region
connectionTimeout No 30s Max time for the underlying TCP/TLS connect (AWS Netty default is 2 s — bumped here so cold-start endpoints under burst load have time to accept TLS handshakes).
connectionAcquireTimeout No 60s Max time to acquire a connection from the Netty pool (AWS Netty default is 10 s — bumped so a 200–500-stream burst doesn't drain the acquire pool).
subscriptionTimeout No 60s Max time the transport waits for the AWS SDK to subscribe to the bidi-stream input publisher before failing. A timeout here is treated as a transient connect failure and counts against maxRetries / retryBudget.
maxConcurrency No 500 Max simultaneous in-flight HTTP/2 streams across the shared Netty pool. With maxStreams=1 this is the cap on simultaneous bidirectional streams.
maxRetries No 5 Max retries on transient AWS errors (throttling, pool-exhausted, transient connect/timeout). Set to 0 to disable internal retry. Terminal errors (auth, validation) bypass this.
initialBackoff No 100ms First backoff delay applied after the initial failure.
maxBackoff No 5s Cap on the per-attempt backoff delay regardless of multiplier.
backoffMultiplier No 2.0 Exponential growth factor between retry attempts. Must be >= 1.0.
retryBudget No 30s Total wall-clock cap across all retry attempts before giving up and surfacing the error to listeners.
SageMakerConfig config = SageMakerConfig.builder()
    .endpointName("my-deepgram-endpoint")
    .region("us-east-2")
    .build();

High-concurrency notes

The transport's defaults are tuned for high-burst workloads (large numbers of streams opened in a tight loop against an endpoint that may need to scale up). If you're opening 200–500 streams simultaneously against a cold endpoint, the AWS Netty defaults (2 s connect / 10 s acquire) will fire before the load balancer has accepted all of the inbound TLS handshakes — you'll see a wave of connection acquire and connect timed out errors that look like server-side problems but are really client-side fail-fast tripping early.

This transport ships with more lenient defaults (30 s / 60 s) so the common high-concurrency path works out of the box. Tighten them if you need fail-fast behavior in low-latency pipelines:

SageMakerConfig config = SageMakerConfig.builder()
    .endpointName("my-deepgram-endpoint")
    .region("us-east-2")
    .connectionTimeout(Duration.ofSeconds(5))
    .connectionAcquireTimeout(Duration.ofSeconds(15))
    .build();

Retry & storm absorption

Transient AWS-side failures (ThrottlingException, connection-pool exhaustion, transient connect/timeout failures) are absorbed by the transport itself: classified as retryable, retried with exponential backoff up to maxRetries and retryBudget, with messages enqueued during the reset window persisted across the reconnect so audio isn't dropped. Only terminal errors (auth, validation) and budget-exhausted retryable errors propagate to transport.onError(...) and reach the application's error handler.

This means the SDK's wrapper-level reconnect (ReconnectingWebSocketListener) would compound the plugin's internal retries into a Throttling-on-Throttling storm under burst load, so the plugin declares ReconnectOptions.builder().maxRetries(0).build() via the DeepgramTransportFactory.reconnectOptions() hook. The SDK applies it automatically when it sees a transportFactory in use; no user wiring required.

To tune retry behavior:

SageMakerConfig config = SageMakerConfig.builder()
    .endpointName("my-deepgram-endpoint")
    .maxRetries(10)
    .initialBackoff(Duration.ofMillis(200))
    .maxBackoff(Duration.ofSeconds(10))
    .retryBudget(Duration.ofMinutes(1))
    .build();

Set maxRetries(0) to disable internal retry entirely (every transient AWS error then surfaces immediately to the application).

Connection-pool sharing

The default new SageMakerTransportFactory(config) constructor backs every factory instance with a process-wide shared SageMakerRuntimeHttp2AsyncClient, keyed by the parts of SageMakerConfig that affect the underlying Netty HTTP/2 client (region, max concurrency, connect/acquire timeouts). Multiple factories built with the same config fingerprint reuse one Netty event loop group and one connection pool — so naive code that constructs a fresh factory per stream still gets a single, well-behaved client underneath.

Without sharing, every factory instantiates its own Netty pool, and a burst of N factories triggers N simultaneous TLS handshakes from N distinct Netty clients against the same SageMaker endpoint. Under high concurrency (100+ streams) the SageMaker HTTP/2 frontline silently drops a large fraction of those streams before they ever reach the model container — verified end-to-end with CloudWatch logs from a 400-stream burst test against a 1× ml.g6.2xlarge endpoint: without sharing, ~65% of streams never appeared in the Deepgram container's listen log; with sharing, the burst behaves the same as the canonical Python load-test harness.

Lifecycle:

Constructor Client backing factory.shutdown()
SageMakerTransportFactory(config) shared (lazy-init, keyed by config fingerprint) no-op — call SageMakerTransportFactory.shutdownAllSharedClients() once at app shutdown to release Netty resources
SageMakerTransportFactory(config, smClient) caller-provided (BYO, used for testing or custom credential providers) no-op — caller owns the client lifecycle
// At app shutdown — releases all shared Netty pools the plugin lazily created.
Runtime.getRuntime().addShutdownHook(new Thread(SageMakerTransportFactory::shutdownAllSharedClients));

Custom AWS Client

For custom credential providers, proxy configuration, or testing:

import software.amazon.awssdk.services.sagemakerruntimehttp2.SageMakerRuntimeHttp2AsyncClient;

SageMakerRuntimeHttp2AsyncClient customClient = SageMakerRuntimeHttp2AsyncClient.builder()
    .region(Region.US_WEST_2)
    .credentialsProvider(myCredentialsProvider)
    .build();

SageMakerTransportFactory factory = new SageMakerTransportFactory(config, customClient);

How It Works

The standard Deepgram SDK connects via WebSocket to wss://api.deepgram.com. This transport replaces that connection with HTTP/2 streaming to your SageMaker endpoint:

Standard:   SDK → WebSocket → api.deepgram.com → Deepgram cloud
SageMaker:  SDK → HTTP/2    → SageMaker endpoint → Your Deepgram model

Under the hood, the transport uses the AWS SDK v2's InvokeEndpointWithBidirectionalStream API for true bidirectional HTTP/2 streaming — audio chunks are sent and transcript responses received concurrently over a single connection.

Project Structure

This is a multi-module Gradle project:

deepgram-sagemaker-java/
├── sagemaker-transport/    # SageMaker implementation (SageMakerTransport, SageMakerConfig)
└── examples/               # Usage examples

Examples

File Transcription (STT)

# AWS credentials must be configured
export AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.SageMakerTransportExample

Live Microphone (STT — Nova-3)

export AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.LiveMicSageMakerExample

Live Microphone (STT — Flux V2)

export AWS_REGION=us-east-2
export SAGEMAKER_ENDPOINT=my-deepgram-flux-endpoint
./gradlew :examples:run -PmainClass=com.deepgram.examples.LiveMicFluxSageMakerExample

Dependencies

The SageMaker transport adds these dependencies (not included in the core SDK):

Dependency Version Purpose
software.amazon.awssdk:sagemakerruntimehttp2 2.42.x SageMaker HTTP/2 bidirectional streaming client
software.amazon.awssdk:netty-nio-client 2.42.x HTTP/2 via Netty

Development

Requires Java 11+. If Gradle can't find your JDK, set it in gradle.properties:

org.gradle.java.home=/path/to/your/jdk

Run tests:

./gradlew build

Getting Help

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

SageMaker transport for the Deepgram Java SDK

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages