Conversation
- Add telemetry_logger to loggers.py (always JSON, toggleable via FLYTE_TELEMETRY_ENABLED) - Enhance timeit to emit structured JSON telemetry with step name, wall/process time, status, task context - Wrap all uninstrumented steps in entrypoint._dispatch_execute: load_task, download_inputs, deserialize_inputs, task_dispatch_execute, output_offloading, upload_outputs, output_deck - Wrap uninstrumented steps in PythonTask.dispatch_execute: pre_execute, post_execute, write_decks Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
- test_timeit_telemetry_success: verify structured log on success - test_timeit_telemetry_error: verify status=error and error_type on exception - test_timeit_telemetry_extras: verify custom kv pairs pass through - test_timeit_telemetry_context_enrichment: verify context is pulled from FlyteContext - test_telemetry_disabled: verify FLYTE_TELEMETRY_ENABLED=0 silences logger - test_telemetry_enabled_by_default: verify logger is INFO when env unset - test_timeit_telemetry_json_format: verify output parses as valid JSON - test_timeit_all_steps_in_task_execution: verify pre_execute, execute, post_execute all emit Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
…l exceptions Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
Routes telemetry to ClickHouse via HTTP POST (FORMAT JSONEachRow) when FLYTE_TELEMETRY_CLICKHOUSE_URL is set. Falls back to structured JSON logs on stderr when ClickHouse is not configured. - ClickHouseTelemetrySink: thread-safe buffer, background flush, atexit - Zero new dependencies (stdlib urllib only) - 10 new tests (31 total), all passing Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
…ow inserts Remove buffering, locks, atexit, flush threshold. Each telemetry event now fires a single background HTTP POST immediately — every step is visible in ClickHouse as soon as it completes. Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
…ted JSON under params) Co-Authored-By: ryan@exa.ai <ryanjwong007@gmail.com>
jld-adriano
approved these changes
Feb 19, 2026
Comment on lines
+26
to
+31
| LOGGING_TELEMETRY_ENV_VAR = "FLYTE_TELEMETRY_ENABLED" | ||
| CLICKHOUSE_URL_ENV_VAR = "FLYTE_TELEMETRY_CLICKHOUSE_URL" | ||
| CLICKHOUSE_USER_ENV_VAR = "FLYTE_TELEMETRY_CLICKHOUSE_USER" | ||
| CLICKHOUSE_PASSWORD_ENV_VAR = "FLYTE_TELEMETRY_CLICKHOUSE_PASSWORD" | ||
| CLICKHOUSE_DATABASE_ENV_VAR = "FLYTE_TELEMETRY_CLICKHOUSE_DATABASE" | ||
| CLICKHOUSE_TABLE_ENV_VAR = "FLYTE_TELEMETRY_CLICKHOUSE_TABLE" |
There was a problem hiding this comment.
if these don't exist does it fail correctly by not emitting any metrics?
There was a problem hiding this comment.
Yes — if the env vars aren't set, it degrades gracefully:
ClickHouseTelemetrySink.__init__defaultsself._urlto""(line 222)self._enabled = bool(self._url)→Falsewhen URL is empty (line 227)send()returns immediately if not enabled (line 235-236)- If
FLYTE_TELEMETRY_ENABLEDis explicitly set to"0"/"false", the sink is set toNoneentirely (line 271)
So no metrics emitted, no errors, no HTTP calls — completely silent when unconfigured.
Carlos-Marques
approved these changes
Feb 19, 2026
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
Flytekit's task execution pipeline has many distinct phases (task loading, input download, deserialization, user code execution, output conversion, upload, etc.) but only a few are instrumented with timing today. This makes it hard to diagnose where time is spent during task execution.
These changes emit structured telemetry at each step boundary. When
FLYTE_TELEMETRY_CLICKHOUSE_URLis set, each event is immediately inserted as a row in ClickHouse via a background HTTP POST. Otherwise, structured JSON logs are emitted to stderr for Loki/LogQL ingestion as a fallback.What changes were proposed in this pull request?
5 files changed:
flytekit/loggers.py— Newtelemetry_logger(flytekit.telemetry), controlled byFLYTE_TELEMETRY_ENABLEDenv var (default: on). NewClickHouseTelemetrySinkclass that inserts each event as a single row via a fire-and-forget background thread HTTP POST (FORMAT JSONEachRow). No buffering, no batching — every step is immediately visible in ClickHouse. Zero new dependencies (stdliburllib+threadingonly).flytekit/core/utils.py— Enhancedtimeitcontext manager:_emit_telemetry()method emits structured data withevent,step,wall_time_s,process_time_s,status,error_type,execution_id,task_name,project,domain.ClickHouseTelemetrySinkwhen enabled, falls back totelemetry_logger(stderr JSON) otherwise.**extraskwarg for passing additional metadata (e.g.input_size_bytes)._emit_telemetrybody is wrapped intry/except Exception: passso telemetry can never break execution or mask real exceptions from__exit__.flytekit/bin/entrypoint.py— Wrapped 7 previously uninstrumented steps in_dispatch_execute:load_task,download_inputs,deserialize_inputs,task_dispatch_execute,output_offloading,upload_outputs,output_deck.flytekit/core/base_task.py— Wrapped 3 previously uninstrumented steps inPythonTask.dispatch_execute:pre_execute,post_execute,write_decks.tests/flytekit/unit/core/test_utils.py— 17 new tests (30 total, all passing): 8 for stderr fallback path, 9 for ClickHouse sink (per-row POST, background thread firing, routing, error silencing, enable/disable).ClickHouse env vars
FLYTE_TELEMETRY_ENABLED"1"FLYTE_TELEMETRY_CLICKHOUSE_URLhttps://host:8443— enables ClickHouse sinkFLYTE_TELEMETRY_CLICKHOUSE_USER"default"FLYTE_TELEMETRY_CLICKHOUSE_PASSWORD""FLYTE_TELEMETRY_CLICKHOUSE_DATABASE"default"FLYTE_TELEMETRY_CLICKHOUSE_TABLE"flytekit_telemetry"Example telemetry event (ClickHouse row)
{"event": "flytekit_step", "step": "execute_user_code", "wall_time_s": 1.234, "process_time_s": 0.89, "status": "success", "task_name": "my_task", "execution_id": "f12abc", "project": "ml", "domain": "production", "timestamp": "2025-02-12 10:30:45.123"}Fallback: LogQL queries (when ClickHouse not configured)
How was this patch tested?
30 unit tests in
tests/flytekit/unit/core/test_utils.py, all passing:test_timeit_telemetry_success_fallback_to_loggertest_timeit_telemetry_error_fallback_to_loggerstatus="error"anderror_typeset on exceptiontest_timeit_telemetry_extras**extraspass throughtest_timeit_telemetry_context_enrichmenttest_telemetry_disabledFLYTE_TELEMETRY_ENABLED=0silences telemetrytest_telemetry_enabled_by_defaulttest_timeit_telemetry_json_formattest_timeit_all_steps_in_task_executionpre_execute,Execute user level code,post_executetest_clickhouse_sink_disabled_without_urltest_clickhouse_sink_enabled_with_urltest_clickhouse_sink_send_noop_when_disabledsend()is no-op when disabledtest_clickhouse_sink_send_fires_background_threadsend()spawns daemon thread targeting_post_rowtest_clickhouse_sink_post_row_sends_jsontest_timeit_routes_to_clickhouse_sinktimeitroutes to sink when enabledtest_timeit_routes_to_clickhouse_on_errortest_clickhouse_sink_post_row_silences_errorstest_default_clickhouse_sink_is_disabledFLYTE_TELEMETRY_ENABLEDdefaults to"1". This will add telemetry overhead for all existing deployments. Confirm this is acceptable or change default to off._emit_telemetryand_post_rowhave bareexcept Exception: pass. Telemetry configuration issues and ClickHouse connectivity problems will be completely silent. No way to know if telemetry is actually reaching ClickHouse.flytekit_telemetrymust exist before telemetry works. Consider adding a migration script or documenting the required schema.Check all the applicable boxes
Link to Devin run: https://app.devin.ai/sessions/c6fb4462bb7e4072b175e7fc664aad4b
Requested by: @ryanjwong