generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 739
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.33.0
Python Version
3.13.12
Operating System
Ubuntu
Installation Method
pip
Steps to Reproduce
- Deploy any model to a SageMaker endpoint with AsyncInferenceConfig:
from sagemaker.core.inference_config import AsyncInferenceConfig
async_config = AsyncInferenceConfig(
output_path="s3://my-bucket/async-outputs/",
)
model_builder.deploy(
instance_type="ml.g5.xlarge",
initial_instance_count=1,
endpoint_name="my-async-endpoint",
inference_config=async_config,
)
- Use SageMakerAIModel to invoke it:
from strands import Agent
from strands.models.sagemaker import SageMakerAIModel
model = SageMakerAIModel(
endpoint_config={
"endpoint_name": "my-async-endpoint",
"region_name": "us-west-2",
},
payload_config={"max_tokens": 100, "stream": False},
)
agent = Agent(model=model)
result = agent("Say hello.") # raises ValidationError
- Both stream=True and stream=False fail with ValidationError
Expected Behavior
Responce
Actual Behavior
FAILED: ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint XXXX does not support this inference type.
Motivation
SageMaker Async Inference seems to me the recommended approach for long-running inference (>60s), which is common for large LLMs with long prompts. The 60-second real-time timeout on invoke_endpoint makes it unusable for many production workloads.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working