Skip to content

[Do not Merge] POC for ACO#13158

Draft
nidhiii-27 wants to merge 11 commits into
mainfrom
aco-in-otel
Draft

[Do not Merge] POC for ACO#13158
nidhiii-27 wants to merge 11 commits into
mainfrom
aco-in-otel

Conversation

@nidhiii-27
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a background metadata fetching mechanism to enrich OpenTelemetry spans with bucket-specific resource IDs and locations. It introduces a BucketMetadataCache and decorators for Span and SpanBuilder to handle attribute extraction and application across both gRPC and HTTP transports. Feedback highlights several critical areas for improvement: the use of fragile reflection to access internal gRPC channels, the risks associated with an unbounded thread pool for background fetches, and the omission of certain SpanBuilder method overrides which could bypass the metadata logic. Additionally, it is recommended to improve error handling for background tasks and replace brittle sleep statements in tests with robust polling.

Comment on lines +131 to +132
} catch (Exception e) {
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Exceptions during the background metadata fetch are swallowed without logging. This makes it difficult to diagnose why bucket attributes might be missing or stuck on default values. At a minimum, these exceptions should be logged.

storage.getOptions().toBuilder().setOpenTelemetry(openTelemetrySdk).build();
try (Storage storage = storageOptions.getService()) {
storage.create(BlobInfo.newBuilder(bucket, generator.randomObjectName()).build());
Thread.sleep(800);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Thread.sleep() in integration tests is brittle and can lead to flaky tests or unnecessary delays in the CI pipeline. It is better to use a polling mechanism like Awaitility to wait for the background metadata fetch to complete and the cache to be updated.

@nidhiii-27
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bucket metadata caching mechanism to enrich OpenTelemetry spans with resource attributes, specifically 'gcp.resource.destination.id' and 'gcp.resource.destination.location'. It adds new classes like AcoSpan, AcoSpanBuilder, and BucketMetadataCache, and implements the GetStorageLayout RPC for both gRPC and HTTP transports. Review feedback identifies several areas for improvement: the cache invalidation logic for 404 errors is currently too aggressive and may trigger redundant fetches; the use of an unbounded cached thread pool should be replaced with a bounded one for better resource management; and exceptions during background fetches are being swallowed without logging. Additionally, it is recommended to make the hardcoded cache capacity configurable and to use a more graceful shutdown sequence for the executor service.

Comment on lines +63 to +70
if (exception instanceof StorageException
&& parent != null
&& parent.delegate instanceof StorageInternal) {
StorageException se = (StorageException) exception;
if (se.getCode() == 404) {
((StorageInternal) parent.delegate).getBucketMetadataCache().remove(bucketName);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to remove the bucket from the metadata cache on any 404 error is too aggressive. A 404 status code can be returned for object-level operations (e.g., getObject) when the object is missing but the bucket still exists. This results in unnecessary cache invalidation and redundant background fetches. Consider checking if the error specifically indicates that the bucket itself is not found before invalidating the cache.

@nidhiii-27 nidhiii-27 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 19, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 19, 2026
@nidhiii-27
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an asynchronous bucket metadata enrichment feature for OpenTelemetry spans in the Google Cloud Storage client. It adds AcoSpan, AcoSpanBuilder, and BucketMetadataCache to cache and inject bucket location and resource ID attributes into spans. The review identified several opportunities for improvement, including refactoring duplicated exception handling in AcoSpan, using a bounded queue for the cache executor to prevent memory exhaustion, replacing the synchronized keyword with explicit locks for better performance, fixing naming conventions, reducing the shutdown timeout for the cache executor, and defining constants for location types.

try {
bucketMetadataCache.clear();
cacheExecutor.shutdownNow();
cacheExecutor.awaitTermination(5, TimeUnit.MINUTES);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Waiting for 5 minutes for the cache executor to terminate is excessive for non-critical telemetry enrichment tasks. A shorter timeout (e.g., 10 seconds) would be more appropriate to avoid delaying application shutdown.

Suggested change
cacheExecutor.awaitTermination(5, TimeUnit.MINUTES);
cacheExecutor.awaitTermination(10, TimeUnit.SECONDS);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants