Skip to content

Datadog tracer issue with clear route cache and refreshTracing #43532

@bvandewalle

Description

@bvandewalle

Title: After introducing refreshTracing in Envoy 1.36, Datadog tracer treats a refresh differently than the initial startSpan.

Description:

Upon receiving a request, Envoy decides if it should get traced and issues a tracing_decision. which is either True or False (Based on multiple things such as tracing-forced headers, sampling rates etc).
That tracing_decision is passed down to DD tracer through startSpan

For some reason (and there is even a Todo addressing this behavior), the Datadog tracer treats a tracing_decision as:

  • True is optional and will be evaluated by Datadog itself later. (stored as nullopt )
  • False is a definite drop: (stored as USER_DROP )

DD tracer's implementation keeps track of this through sampling_decision() flag.

But if a filter calls a route cache refresh, it will trigger refreshTracing which will now set the trace as USER_KEEP, making the tracing not optional at all anymore.

There is a clear inconsistency between the initial evaluation and refreshed evaluation. This only seems to happen with Datadog's tracer.

Possible resolutions:

  • Respect Envoy's initial decision. Change startSpan to reflect True as a USER_KEEP and user Envoy's sampling parameter to decide how many traces to keep. Issue here is that we bypass DD agent sampling decision, all is done in Envoy.
  • Change setSampled to use the same logic as startSpan : Treat True as optional. Issue here is that this is called in a ton of other places where they expect USER_KEEP

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugstalestalebot believes this issue/PR has not been touched recently

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions