feat(toolkit-lib): Implement Hotswap with CCAPIs for broader resource coverage and improve intrinsic function resolution for Get::Attr#1310
Conversation
| * The processed template from CloudFormation with all intrinsic functions | ||
| * resolved to their concrete values from the last successful deployment. |
There was a problem hiding this comment.
That is not what "processed" is, unfortunately, so I hope you're not basing too much functionality off of this assumption.
|
|
||
| // The processed template has all intrinsics resolved by CloudFormation from the last deployment. | ||
| // Used as a fallback when local evaluation of Fn::GetAtt fails for unsupported resource types. | ||
| this.processedTemplate = props.processedTemplate ?? {}; |
There was a problem hiding this comment.
The order should be
{ Ref }-- get from CFN'sDescribeStackResources{ GetAtt }-- either get from a custom handler, otherwise from CCAPI, otherwise fail
We could potentially get a { GetAtt } from the template, if it is an input property that is set. But if we can parallellize all the necessary { GetAtt }s I don't see why this should add a lot of time, and the extremely fast path is very speculative and rare so probably not worth it.
| // This will fail for resource types with compound primaryIdentifiers because | ||
| // CloudFormation's PhysicalResourceId only contains a single value. | ||
| try { | ||
| const cloudControl = this.sdk.cloudControl(); |
There was a problem hiding this comment.
Do we really want a new SDK client for every single invocation? I imagine there will be a bunch.
| // This will fail for resource types with compound primaryIdentifiers because | ||
| // CloudFormation's PhysicalResourceId only contains a single value. |
There was a problem hiding this comment.
So... introduce a mapper!
type CfnToCcapiIdentifierMapper = (cfnPhysicalId: string) => string;Have a table of these, with (x) => x as a default for resources that don't occur in the table.
Aaannndd... it's probably fine for 90% of resources?
| return props[attribute]; | ||
| } | ||
| } catch { | ||
| // Cloud Control lookup failed — fall through to the error below |
There was a problem hiding this comment.
Give a more descriptive error here, include the CCAPI one perhaps. How otherwise will we know what to fix?
| * built by joining each component with "|". CloudFormation's PhysicalResourceId | ||
| * only returns a single value, which doesn't work for compound keys. |
There was a problem hiding this comment.
9 times out of 10 the PhysicalResourceId will already be that string with components separated by |.
It should be only for certain legacy resources that they aren't (singular in CFN, compound in CCAPI).
In fact, that list is here. I would say we add a dependency on the service spec, and at build time extract that information and dump it into a JSON file that we use to decide what to do here (or that's the list we must have custom identifier mappers for, as mentioned before).
| const delaySeconds = functionIsInVpcOrUsesDockerForCode ? 5 : 1; | ||
| // otherwise, the update will be quick. This is reflected in the delay times we | ||
| // set for exponential backoff while waiting | ||
| const minDelaySeconds = functionIsInVpcOrUsesDockerForCode ? 3 : 1, maxDelaySeconds = functionIsInVpcOrUsesDockerForCode ? 5 : 3; |
There was a problem hiding this comment.
This is not a readable way of writing this (at least imo).
More code but more readable:
let minDelaySeconds;
let maxDelaySeconds;
if (functionIsInVpcOrUsesDockerForCode) {
minDelaySeconds = 3;
maxDelaySeconds = 5;
} else {
minDelaySeconds = 1;
maxDelaySeconds = 3;
}Btw the difference between these number is so small that we might as well not bother.
The nice thing about exponential backoff is that it will automatically find its own stable point. Do not try to outsmart it, do not go "oh but what if the update finished between poll 13 and 14, then the user is waiting 30 seconds for nothing, better make the poll shorter so they don't have to wait..."
If you've already been waiting 30 seconds, chances are high you'll be waiting 30 seconds longer. Either things are fast and your polling intervals don't matter, or the service is busted and quicker polling won't make (much of) a difference and just degrades the service for everyone.
Relatedly, I was a little confused while reading this:
Lambda Function hotswapping takes advantage of exponential backoff: waiters in AWS SDK V3 support exponential back off and we were not taking advantage of that while hotswapping Lambda functions, hotswapping Lambda functions that involve Docker or VPCs should be a bit faster.
Exponential backoff usually makes things slower (for good reason!). So why would the benefit of using it be that we are faster for Docker or VPCs?
| 'AWS::AppSync::GraphQLSchema': isHotswappableAppSyncChange, | ||
| 'AWS::AppSync::ApiKey': isHotswappableAppSyncChange, | ||
|
|
||
| 'AWS::BedrockAgentCore::Runtime': isHotswappableBedrockAgentCoreRuntimeChange, |
There was a problem hiding this comment.
Should we not be deleting these functions now then?
| service: 'cloudcontrol', | ||
| apply: async () => { | ||
| const cloudControl = sdk.cloudControl(); | ||
| const currentResource = await cloudControl.getResource({ |
There was a problem hiding this comment.
I'm in doubt about this.
Upstairs, we already decided we had a "change" that's hotswappable. That means we already look at a "current state" and a "new state", and made a calculation based on the diff between them.
(It might be that that state is inaccurate, but that's a different problem!)
If we already have a current state that we calculated a diff on, it seems dangerous to me to fetch another current state, and do another diff calculation based on that.
The primary change in this PR is the introduction of hotswap with CCAPIs. This allows us to support resource hotswapping for new resource types without having to write custom implementations for each of them.
Other changes in this PR:
Fn::GetAttrfor a small number of resources (11). This stems from us having hardcoded implementations for each of those resources. Now if a resource is not one we have a hardcoded attribute resolution for, we try to get it from the deployed Cloudformation template. If we cannot get it from the deployed Cloudformation template, we attempt to resolve it using its live state through CCAPIs. Only if we cannot resolve it with CCAPIs, do we throw an error that we cannot resolve that attribute.Resource types that are now supported by hotswap through CCAPIs:
AWS::ApiGateway::RestApiAWS::ApiGateway::DeploymentAWS::ApiGateway::MethodAWS::ApiGatewayV2::ApiAWS::ApiGatewayV2::IntegrationAWS::Bedrock::AgentAWS::Events::RuleAWS::DynamoDB::TableAWS::DynamoDB::GlobalTableAWS::SQS::QueueAWS::CloudWatch::AlarmAWS::CloudWatch::CompositeAlarmAWS::CloudWatch::DashboardAWS::StepFunctions::StateMachine- previously supported by custom SDK implementationAWS::BedrockAgentCore::Runtime- previously supported by custom SDK implementationBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license