Note
Ocular is currently in its very early development phase (literally a day old!). The telemetry formats, export mechanisms, and analysis features are experimental and subject to rapid iteration.
Ocular’s telemetry engine is built to aggregate millions of instruction-level execution events without stalling the Python Global Interpreter Lock (GIL). To achieve this, Ocular shifts the heavy lifting of trace reconstruction, hardware cycle averaging, and JSON serialization to a dedicated background Rust thread.
When sys.monitoring fires an event, Ocular's FFI callbacks immediately push a lightweight TraceEvent enum payload to a lock-free queue (EVENT_QUEUE) and return control to CPython. The telemetry_worker thread continuously drains this queue in the background.
The worker performs several critical tasks entirely offline from the hot path:
- Symbol Resolution & Caching: It lazily interrogates Python's
dismodule to map raw instruction offsets to human-readable opcode names. This metadata is cached in aCodeMetaregistry to avoid repeated lookups. - Call Stack Tracking: It maintains an internal mirror of the Python call stack to accurately pair function entries (
PyStart) with function exits (PyReturn). - Cycle Averaging: It calculates the hardware Time Stamp Counter (TSC) clock cycles elapsed between instructions. It averages these cycles across loop iterations to provide highly stable, nanosecond-accurate performance metrics.
- Zero-Allocation Recycling: As batches of events are processed, the worker pushes the empty vectors to a
FREE_QUEUEso the frontend callbacks can reuse them, completely eliminating heap allocation during tracing.
At the end of a tracing session, Ocular analyzes its internal TraceStats map to identify the hottest execution paths (loops). It then outputs a highly detailed disassembly directly to the console:
[Ocular] Trace Disassembly with Hardware CPU Cycles (Base -> Specialized):
[000] offset 94 : FOR_ITER -> FOR_ITER_RANGE | ~1567 cycles
[001] offset 98 : STORE_FAST | ~582 cycles
[002] offset 104: STORE_SUBSCR -> STORE_SUBSCR_LIST_INT | ~458 cycles
- Base -> Specialized Transitions: Because modern CPython uses a Specializing Adaptive Interpreter, Ocular tracks both the original opcode (e.g.,
STORE_SUBSCR) and the specialized, quickened opcode Python swapped in at runtime (e.g.,STORE_SUBSCR_LIST_INT). - Cycle Latency: Next to each micro-operation, you will see the hardware cycles consumed. If Ocular is running in
adaptivemode, this represents the average block latency. If running inprecisemode, this represents the exact instruction-to-instruction cycle cost.
When tracing is initialized with perfetto=True, Ocular translates the raw hardware execution events into the open-source Chrome Trace Event Format.
Once ocular.stop_tracing() is called, the telemetry worker flushes this data into a file named ocular_trace.json in your current working directory.
- Open a Chromium-based browser (Chrome, Edge, Brave).
- Navigate to
ui.perfetto.devorchrome://tracing. - Drag and drop the
ocular_trace.jsonfile into the viewer.
Ocular structures the JSON payload using PerfettoEvent structs. The timeline represents the following distinct phases:
- Function Boundaries (
ph: "B"andph: "E"): Function invocations are tracked viaPyStartandPyReturnevents. These form the macro-level flame graph of your application's call tree. Ocular attaches the raw memorycode_ptras an argument to these events to help disambiguate calls. - Loop Analytics (
ph: "i"): When a jump backwards is detected (signifying a loop body repeating), Ocular emits an Instant event categorized under "loop". This event embeds the number of micro-ops (uOps) executed in the loop body and the totalHardware Cyclesconsumed by that specific iteration.