Title: After introducing refreshTracing in Envoy 1.36, Datadog tracer treats a refresh differently than the initial startSpan.
Description:
Upon receiving a request, Envoy decides if it should get traced and issues a tracing_decision. which is either True or False (Based on multiple things such as tracing-forced headers, sampling rates etc).
That tracing_decision is passed down to DD tracer through startSpan
For some reason (and there is even a Todo addressing this behavior), the Datadog tracer treats a tracing_decision as:
- True is optional and will be evaluated by Datadog itself later. (stored as nullopt )
- False is a definite drop: (stored as USER_DROP )
DD tracer's implementation keeps track of this through sampling_decision() flag.
But if a filter calls a route cache refresh, it will trigger refreshTracing which will now set the trace as USER_KEEP, making the tracing not optional at all anymore.
There is a clear inconsistency between the initial evaluation and refreshed evaluation. This only seems to happen with Datadog's tracer.
Possible resolutions:
- Respect Envoy's initial decision. Change startSpan to reflect True as a USER_KEEP and user Envoy's sampling parameter to decide how many traces to keep. Issue here is that we bypass DD agent sampling decision, all is done in Envoy.
- Change setSampled to use the same logic as startSpan : Treat True as optional. Issue here is that this is called in a ton of other places where they expect USER_KEEP
Title: After introducing
refreshTracingin Envoy 1.36, Datadog tracer treats a refresh differently than the initial startSpan.Description:
Upon receiving a request, Envoy decides if it should get traced and issues a
tracing_decision. which is either True or False (Based on multiple things such as tracing-forced headers, sampling rates etc).That tracing_decision is passed down to DD tracer through
startSpanFor some reason (and there is even a Todo addressing this behavior), the Datadog tracer treats a
tracing_decisionas:DD tracer's implementation keeps track of this through sampling_decision() flag.
But if a filter calls a route cache refresh, it will trigger
refreshTracingwhich will now set the trace asUSER_KEEP, making the tracing not optional at all anymore.There is a clear inconsistency between the initial evaluation and refreshed evaluation. This only seems to happen with Datadog's tracer.
Possible resolutions: