Skip to content

feat(datastore): Decorate and log errors thrown during DataStore entrypoint initialization#27327

Open
markfields wants to merge 6 commits into
microsoft:mainfrom
markfields:entrypoint-load-failures
Open

feat(datastore): Decorate and log errors thrown during DataStore entrypoint initialization#27327
markfields wants to merge 6 commits into
microsoft:mainfrom
markfields:entrypoint-load-failures

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented May 16, 2026

Description

Component load failures are one of the top scenarios our partners need to be able to monitor and diagnose in production. This PR closes a long-standing gap where errors thrown during entryPoint initialization (e.g. for PureDataObject, that includes the initalization hooks) are not logged to FF tables.

While host applications can and do log these errors, there is context that's only readily available within FF that is really important for investigation - All the common ContainerRuntime props (notably clientType - is it the Summarizer?), as well as the DataStore's packagePath (indicates the component type) and id.

This PR logs an error event with all these props, and additionally annotates the error object* with the dataStore package/id info.

* Note: In testing I found that these telemetry props get stripped off on the way up by roundtripping through exceptionToResponse and responseToException. A bit curious to look into that flow and if we could cut out those tranforms.

Copilot AI review requested due to automatic review settings May 16, 2026 21:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (252 lines, 7 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds telemetry for DataStore entry point initialization failures so component load issues include Fluid-specific context like DataStore identity and package path.

Changes:

  • Adds a shared helper for code-artifact-tagged DataStore telemetry properties.
  • Logs and annotates errors thrown while resolving FluidDataStoreRuntime.entryPoint.
  • Reuses the shared telemetry helper for DataStore realization errors and adds tests for entry point failure logging.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/utils/telemetry-utils/src/error.ts Aligns DataProcessingError wrapping factory usage.
packages/runtime/runtime-utils/src/index.ts Exports the new DataStore telemetry helper.
packages/runtime/runtime-utils/src/dataStoreHelpers.ts Adds shared DataStore identity telemetry property helper.
packages/runtime/datastore/src/test/dataStoreRuntime.spec.ts Adds coverage for lazy entry point initialization and failure telemetry.
packages/runtime/datastore/src/dataStoreRuntime.ts Wraps, annotates, logs, and rethrows entry point initialization failures.
packages/runtime/container-runtime/src/dataStoreContext.ts Reuses the shared DataStore telemetry helper for realization errors.

Comment thread packages/runtime/container-runtime/src/dataStoreContext.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Fleet Review — Clean

No issues found across the reviewer fleet for this run.

View run

error,
"entryPointInitialization",
);
errorWrapped.addTelemetryProperties(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: In testing I found that these telemetry props get stripped off on the way up by roundtripping through exceptionToResponse and responseToException. A bit curious to look into that flow and if we could cut out those tranforms.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to do with privacy concerns? I know there can sometimes be policies to strip all error information away in case it contains user data or identifiable information.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it has to do with legacy architectural direction (URI-based routing within and across containers)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties are correctly tagged as "code artifacts", so hosts can decide if they're safe to log under different circumstances

Copy link
Copy Markdown
Contributor

@ChumpChief ChumpChief left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love wrapping at the exit to custom code! Few organizational/API suggestions.

fluidDataStoreId: this.id,
}),
);
errorWrapped.addTelemetryProperties(dataStoreLoadTelemetryProps(this));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing this tends to be a code smell, probably prefer either passing the specific members needed or else make it a member method if it truly needs access to this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't want to muck up the legacy-beta IFluidDataStoreContext interface. I can consider passing in just the props we need off the context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my preference too, passing big/powerful objects for small tasks is also usually a code smell :-D

Comment thread packages/runtime/runtime-utils/src/dataStoreHelpers.ts
Comment thread packages/runtime/runtime-utils/src/dataStoreHelpers.ts
new LazyPromise(async () => provideEntryPoint(this)),
new LazyPromise(async () =>
provideEntryPoint(this).catch((error) => {
const errorWrapped = DataProcessingError.wrapIfUnrecognized(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

new LazyPromise(async () => provideEntryPoint(this)),
new LazyPromise(async () =>
provideEntryPoint(this).catch((error) => {
const errorWrapped = DataProcessingError.wrapIfUnrecognized(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if DataProcessingError is appropriate here. This runs application entry point logic and has nothing to do with Fluid right? So, maybe this is a UsageError or just a generic error?

Copy link
Copy Markdown
Contributor

@agarwal-navin agarwal-navin May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this - we should separate out the path that loads the data store / DDS data from the path that loads the application logic. And we should be able to tell these apart from telemetry. I would suggest to not use data process error for the latter because that should not cause any problems with the data (it does fail summaries today but that is an unexpected scenario and will be fixed soon)

summarizeNow(createSummarizerResult.summarizer, "test"),
"Summarizing should not throw",
);
// Create a summarizer for the container and do a summary shouldn't throw.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting test. I know you are not changing this but this comment isn't clear. I would suggest updating this to say that the data store is realized / loaded during summarization but is not initialized and hence summarization succeeds.

"Non interactive/summarizer client's data object should not be initialized",
"Loading data store in summarizer did not throw as it should, or threw an unexpected error.",
);
// In summarizer, load the data store should fail.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment to say that this throws an error and also logs the error in itExpects

"Initial creation of container and data store should succeed.",
);
});
// Load second container, load the data store will also call initializingFromExisting and succeed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An observation, don't need to change this I guess - This step seems unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants