Skip to content

Fix a bug in measuring running times in a few places, more fine-grained running time report#1124

Open
aqjune-aws wants to merge 9 commits intomainfrom
jlee/profile2
Open

Fix a bug in measuring running times in a few places, more fine-grained running time report#1124
aqjune-aws wants to merge 9 commits intomainfrom
jlee/profile2

Conversation

@aqjune-aws
Copy link
Copy Markdown
Contributor

Lean sometimes reorders invocation of current time function with other code:

let t1 ← IO.monoNanosNow
let res := (some heavyweight work)
let t2 ← IO.monoNanosNow
let elapsedTime := t2 - t1

into:

let res := (some heavyweight work)
let t1 ← IO.monoNanosNow
let t2 ← IO.monoNanosNow
let elapsedTime := t2 - t1 // This is trivially zero!

And this made the SMT encoding time (the ProofObligation object -> SMT Term, Definition, etc) appearing as zero ms, but actually it was taking around 20% of the whole VC dischare process.

Also, this pull request adds much more fine-grained reports on running times: it reports on (1) each Core transform, and (2) each SMT query generation.

[profile]     ProcedureInlining: 0ms             
[profile]     FilterProcedures: 0ms           
[profile]     CallElim: 0ms                   
[profile]     PrecondElim: 1ms  
[profile]     FilterProcedures: 0ms       
[profile]     LoopElim: 0ms
[profile]     typeCheck: 2ms
[profile]     symbolicEval: 2ms             
[profile]     ANFEncoder: 0ms  
[profile]   Program transformations: 5ms
[profile]   Build axiom relevance cache: 0ms
[profile]     Preprocess obligations: 3ms
[profile]     SMT encoding: 8ms
[profile]         defineFunTerms: 2ms
[profile]         emitDatatypes: 1ms
[profile]         encodeAssumptions: 0ms
[profile]         encodeAxioms: 0ms
[profile]         encodeFunctions: 1ms
[profile]         encodeObligation: 0ms
[profile]         encodeUFs: 3ms
[profile]         epilog: 3ms
[profile]         prelude: 0ms
[profile]       encodeCore: 24ms
[profile]       runSolver: 0ms
[profile]     Solver/file writing: 26ms
[profile]     Obligations: 20 total, 0 resolved by evaluator
[profile]   VC discharge: 38ms

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@@ -301,29 +302,37 @@ When two-sided checking is enabled, the generated SMT file will contain two
and the return value includes both decisions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update docstring to discuss HashMap addition (maybe use an alias as well)?

Comment thread Strata/Util/Profile.lean
Comment on lines +12 to +22
/-- Measure the wall-clock nanoseconds taken by a pure expression.
Takes a thunk to ensure the expression is not evaluated before timing starts.
The `t1 == 0` trick prevents hoisting `result` above t1.
Writing `result` to an `IO.Ref` forces its evaluation before t2 is read. -/
public def measureNanos [Inhabited α] (expr : Unit → α) : BaseIO (α × Nat) := do
let ref ← IO.mkRef (default : α)
let t1 ← IO.monoNanosNow
let result := if t1 == 0 then default else expr ()
ref.set result
let t2 ← IO.monoNanosNow
pure (result, t2 - t1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really hacky. How about something annotated with noinline?

/-- Measure the wall-clock nanoseconds taken by an IO action.
    Writing `result` to an `IO.Ref` forces its evaluation before t2 is read. -/
@[noinline] def measureNanosM (act : BaseIO α) : BaseIO (α × Nat) := do
  let t1 ← IO.monoNanosNow
  let result ← act
  let t2 ← IO.monoNanosNow
  pure (result, t2 - t1)

Then measureNanos would become:

/-- Measure the wall-clock nanoseconds taken by a pure expression.
    Takes a thunk to ensure the expression is not evaluated before timing starts.
    The `t1 == 0` trick prevents hoisting `result` above t1.
    Writing `result` to an `IO.Ref` forces its evaluation before t2 is read. -/
@[noinline]
public def measureNanos (expr : Unit → α) : BaseIO (α × Nat) :=
  measureNanosM (pure (expr ())

The second noinline is necessary so that the expr doesn't get pre-evaluated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the version still has the problem with measuring running times. lake exe strata verify --profile Examples/HeapReasoning.core.st prints 0ms on the Core transformations.
I tried another version of measureNanos which uses @[noinline] but also relies on IO.mkRef. I think this looks better than the previous one because there is an order between IO.monoNanosNow and ref.set.

Comment thread Strata/Util/Profile.lean Outdated

/-- Run an action and return its result along with the elapsed nanoseconds. -/
@[inline] public def recordNanos {m α} [Monad m] [MonadLiftT BaseIO m]
(key : String) (timing : Std.HashMap String Nat) (action : m α) : m (α × Std.HashMap String Nat) := do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using an abbreviation with a name (NanoTimings?) instead of a hashmap directly would clarify code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, I named it as TimingInfo. Like the collected stats do, I added simple inductive data structure that represents the items and created ToString instances using existing #derive_prefixed_toString.

@aqjune-aws
Copy link
Copy Markdown
Contributor Author

Diffs on Github will look better if ignoring whitespace changes is toggled on.

Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Good PR — the reordering bug is real and the fix via measureNanos with @[noinline] + IO.Ref is sound. The fine-grained timing breakdown is useful.

A few observations below.

Comment thread Strata/Util/Profile.lean
let result ← action
let t1 ← IO.monoNanosNow
pure (result, timing.insert key (t1 - t0))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 recordNanos is marked @[inline] and measures a monadic action using the same t0/t1 bracketing pattern that the PR description says Lean can reorder for pure code. Since action here is already monadic (sequenced by >>=), the reordering issue shouldn't apply — but the @[inline] means the compiler could inline this and then reorder the timestamp reads relative to the inlined body in theory. Is there a reason this isn't also @[noinline] like measureNanos? If the monadic bind is sufficient to prevent reordering, a brief comment explaining why would help future readers understand the asymmetry.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed @[noline].


-- If evaluator resolved both checks, we're done, unless we always want to generate SMT queries
if not options.alwaysGenerateSMT then
if let (some peSat, some peVal) := (peSatResult?, peValResult?) then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The pattern timingNs.insert key (timingNs.getD key 0 + val) is repeated 4 times in this function. Consider a small helper like:

def TimingInfo.add (t : TimingInfo) (key : String) (ns : Nat) : TimingInfo :=
  t.insert key (t.getD key 0 + ns)

This would reduce noise and make the accumulation intent clearer.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

let mut state : Transform.CoreTransformState := { Transform.CoreTransformState.emp with factory := some factory }
let mut step := 0
have : Inhabited (Except Transform.Err Program × Transform.CoreTransformState) :=
⟨(.error default, Transform.CoreTransformState.emp)⟩
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This have : Inhabited ... instance is needed because measureNanos requires [Inhabited α]. Since Transform.runWith is presumably a pure function being timed, this makes sense — but the Inhabited instance uses .error default which means if the IO.Ref default value somehow leaks through, you'd get a silent error result rather than a crash. Worth a brief comment explaining this is only needed to satisfy measureNanos's constraint and the default is never observed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment

| preprocessObligations
| smtEncoding
| solverFileWriting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Nit: The key "verifySingleEnv.solverFileWriting" is misleading — it measures the entire getObligationResult call which includes both SMT file writing and solver execution. The sub-timings (dischargeObligation.encodeSMT, dischargeObligation.runSolver) break it down correctly, but the parent key name suggests only file I/O. Consider renaming to something like solverTotal or dischargeTotal.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional since once switched to FFI there won't be file writing :)

Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 LGTM — all comments addressed. The TimingInfo.add helper, the Inhabited comment, and the @[inline] removal from recordNanos all look good. The solverFileWriting naming nit was minor and I'm fine leaving it as-is since the sub-timings break it down correctly.

Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 LGTM — all comments addressed. The TimingInfo.add helper is used in all 4 spots, @[inline] removed from recordNanos, and the Inhabited comment clarifies intent.

Comment thread Strata/Util/Profile.lean
Comment on lines +19 to +29
/-- Measure the wall-clock nanoseconds taken by a pure expression.
The `@[noinline]` prevents the compiler from hoisting `expr ()` out of
this function. The `IO.Ref.set` forces evaluation of `expr ()` between
the two timestamp reads (IO actions are sequenced). -/
@[noinline] public def measureNanos [Inhabited α] (expr : Unit → α) : BaseIO (α × Nat) := do
let ref ← IO.mkRef (default : α)
let t1 ← IO.monoNanosNow
ref.set (expr ())
let t2 ← IO.monoNanosNow
let result ← ref.get
pure (result, t2 - t1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring could be more precise about what evaluation is actually forced. IO.Ref.set stores whatever value is passed to it; Lean's evaluation strategy will reduce expr () to WHNF before the set call runs, but anything lazy inside the returned value (e.g., unevaluated thunks in a List, an Array of unforced elements, a lazy closure field) will not be touched by this function. For simple types like Nat or small ADTs in the leaf position this is fine, but callers measuring e.g. the cost of building a large structure need to either ensure the structure is strict or arrange deep evaluation themselves. Worth a one-line caveat:

Suggested change
/-- Measure the wall-clock nanoseconds taken by a pure expression.
The `@[noinline]` prevents the compiler from hoisting `expr ()` out of
this function. The `IO.Ref.set` forces evaluation of `expr ()` between
the two timestamp reads (IO actions are sequenced). -/
@[noinline] public def measureNanos [Inhabited α] (expr : Unit → α) : BaseIO (α × Nat) := do
let ref ← IO.mkRef (default : α)
let t1 ← IO.monoNanosNow
ref.set (expr ())
let t2 ← IO.monoNanosNow
let result ← ref.get
pure (result, t2 - t1)
/-- Measure the wall-clock nanoseconds taken by a pure expression.
The `@[noinline]` prevents the compiler from hoisting `expr ()` out of
this function. The `IO.Ref.set` forces evaluation of `expr ()` between
the two timestamp reads (IO actions are sequenced).
Note: only WHNF of `expr ()` is forced. Lazy sub-structures of the returned
value are not evaluated and their cost is not counted. If you need to time
deep evaluation, make the returned value strict (or force it yourself before
the second timestamp read). -/
@[noinline] public def measureNanos [Inhabited α] (expr : Unit → α) : BaseIO (α × Nat) := do
let ref ← IO.mkRef (default : α)
let t1 ← IO.monoNanosNow
ref.set (expr ())
let t2 ← IO.monoNanosNow
let result ← ref.get
pure (result, t2 - t1)

Comment thread Strata/Util/Profile.lean
Comment on lines +31 to +37
/-- Run an action and record its elapsed nanoseconds into a `TimingInfo` under the given key. -/
public def recordNanos {m α} [Monad m] [MonadLiftT BaseIO m]
(key : String) (timing : TimingInfo) (action : m α) : m (α × TimingInfo) := do
let t0 ← IO.monoNanosNow
let result ← action
let t1 ← IO.monoNanosNow
pure (result, timing.insert key (t1 - t0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns here, both about the contract:

  1. Reordering safety comment still owed. @MikaelMayer asked in the previous round why this one doesn't need @[noinline] while measureNanos does. The annotation was removed but the explanation wasn't added. A short comment would help future readers understand the asymmetry — the answer, I believe, is that because action : m α is a monadic bind, the IO primitives it wraps are sequenced by >>=, so reordering cannot move IO.monoNanosNow across action. But that's a non-obvious property of the compiler.

  2. Latent pure-thunk hazard, symmetric with the measureNanos bug. If a caller writes

    recordNanos "foo" t (pure expensivePureComputation)

    then expensivePureComputation is a pure thunk inside pure; the bind sequences the IO part but doesn't force the thunk. The recorded time will be near-zero even though the actual work hasn't happened yet. All current call sites are fine (they wrap IO-heavy work), but this is exactly the trap this PR exists to close elsewhere, so documenting it — or, better, forcing via IO.Ref with [Inhabited α] like measureNanos does — would prevent future regressions.

Minor: recordNanos uses timing.insert (overwrite), but the parallel merge loop in verifySingleEnv uses TimingInfo.add (accumulate). If the same key ever gets measured twice in the same timing scope (currently not the case, but easy to introduce), recordNanos would silently throw away the earlier reading. Prefer TimingInfo.add here too; for a first-write .getD k 0 = 0, so semantics are preserved, and repeated-key usage becomes safe.

Suggested change
/-- Run an action and record its elapsed nanoseconds into a `TimingInfo` under the given key. -/
public def recordNanos {m α} [Monad m] [MonadLiftT BaseIO m]
(key : String) (timing : TimingInfo) (action : m α) : m (α × TimingInfo) := do
let t0 ← IO.monoNanosNow
let result ← action
let t1 ← IO.monoNanosNow
pure (result, timing.insert key (t1 - t0))
/-- Run an action and record its elapsed nanoseconds into a `TimingInfo` under
the given key. The measurement is accumulated via `TimingInfo.add`, so
repeated calls with the same key sum their durations.
Unlike `measureNanos`, this does not need `@[noinline]`: `action` is a
monadic computation, so its IO primitives are sequenced by `>>=` and cannot
be reordered across `IO.monoNanosNow`.
Caveat: if the ` returned by `action` is a lazy pure value (e.g.
`pure (expensivePureComputation)`), the thunk is not forced here, and the
recorded time will reflect only the IO sequencing, not the pure work. For
timing pure computations, use `measureNanos` (which forces via `IO.Ref`). -/
public def recordNanos {m α} [Monad m] [MonadLiftT BaseIO m]
(key : String) (timing : TimingInfo) (action : m α) : m (α × TimingInfo) := do
let t0 ← IO.monoNanosNow
let result ← action
let t1 ← IO.monoNanosNow
pure (result, timing.add key (t1 - t0))

Comment thread Strata/Util/Profile.lean
Comment on lines +13 to +15
/-- Accumulate nanoseconds into an existing key (defaulting to 0). -/
@[inline] public def TimingInfo.add (t : TimingInfo) (key : String) (ns : Nat) : TimingInfo :=
t.insert key (t.getD key 0 + ns)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two adjacent requests:

(a) Add a mergeAdd helper. The codebase now has two places merging sub-timings into a parent TimingInfo, using different shapes:

  • Strata/DL/Imperative/SMTUtils.lean:336encTiming.fold (init := timing) fun acc k v => acc.insert k v (overwrites on collision)
  • Strata/Languages/Core/Verifier.lean:1160for (key, val) in timing do timingNs := timingNs.add key val (accumulates on collision)

These two behaviours are not equivalent (the first loses information when a key repeats across obligations). The Verifier.lean version is the semantically correct one for an accumulator spanning multiple calls. A shared helper would make the intent explicit and eliminate the asymmetry:

/-- Merge `b` into `a` by summing values for overlapping keys. -/
@[inline] public def TimingInfo.mergeAdd (a b : TimingInfo) : TimingInfo :=
  b.fold (init := a) fun acc k v => acc.add k v

(b) Pin down add's semantics with a couple of tiny theorems. Since TimingInfo.add is load-bearing for every per-phase total reported to the user, it's worth stating — and, ideally, proving — its key lemmas. Rough shape (names and exact Std.HashMap lemma references will need adjustment):

@[simp] theorem TimingInfo.add_getD_self (t : TimingInfo) (k : String) (v : Nat) :
    (t.add k v).getD k 0 = t.getD k 0 + v := by
  simp [TimingInfo.add, Std.HashMap.getD_insert_self]

@[simp] theorem TimingInfo.add_getD_of_ne (t : TimingInfo) (k k' : String) (v : Nat)
    (h : k ≠ k') : (t.add k v).getD k' 0 = t.getD k' 0 := by
  simp [TimingInfo.add, Std.HashMap.getD_insert_of_ne h]

theorem TimingInfo.mergeAdd_getD (a b : TimingInfo) (k : String) :
    (TimingInfo.mergeAdd a b).getD k 0 = a.getD k 0 + b.getD k 0 := by
  -- induction on the fold; uses the two lemmas above
  sorry

Even just the first two @[simp] lemmas would catch future typos in a rewrite of add and are a natural place to put #guard-style sanity tests in StrataTest/Util/.

Comment thread Strata/Util/Profile.lean
ref.set (expr ())
let t2 ← IO.monoNanosNow
let result ← ref.get
pure (result, t2 - t1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing regression test for the PR's headline fix. The whole point of this PR is that the previous naïve pattern silently reported 0 ms for heavyweight pure work. Nothing in StrataTest/ would catch if a future edit to measureNanos (e.g. dropping the @[noinline] or the IO.Ref indirection) reintroduced the bug.

Proposed test in a new StrataTest/Util/Profile.lean:

import Strata.Util.Profile

/-- Regression: `measureNanos` must actually observe time for CPU-bound pure
    work. If this reports 0 ns, the `@[noinline]` / `IO.Ref.set` trick has
    regressed and timing will silently be lost again. -/
def measureNanos_nonzero : BaseIO Bool := do
  let (_, ns) ← measureNanos fun () =>
    -- Large enough to comfortably exceed monoNanosNow resolution on any host.
    (List.range 200000).foldl (· + ·) 0
  pure (ns > 0)

/-- `#guard` an `IO Bool` by running it. -/
#eval show IO Unit from do
  unless (← measureNanos_nonzero.toIO) do
    throw <| IO.userError "measureNanos regressed to 0 ns"

And a small guard-level test for TimingInfo.add:

#guard
  let t := ({} : TimingInfo).add "a" 5 |>.add "b" 7 |>.add "a" 3
  t.getD "a" 0 == 8 && t.getD "b" 0 == 7 && t.getD "missing" 0 == 0

recordNanos (toString Timing.encodeSMT) timing do
encodeSMT.run solver
-- Merge encodeCore's internal timing into ours
let timing := encTiming.fold (init := timing) fun acc k v => acc.insert k v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the TimingInfo.mergeAdd helper I suggested on Profile.lean lands, this becomes:

Suggested change
let timing := encTiming.fold (init := timing) fun acc k v => acc.insert k v
-- Merge encodeCore's internal timing into ours
let timing := TimingInfo.mergeAdd timing encTiming

Semantic note: using insert here (overwrite) happens to be OK today because each encodeCore key is set exactly once per call and the outer timing doesn't yet contain any encodeCore.* key. But it's the wrong operation for a generic merge — it silently loses the parent value on any key collision — and mergeAdd would stay correct if the invariant ever changes.

| preprocessObligations
| smtEncoding
| solverFileWriting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting the earlier thread on this name: I understand the forward-looking argument (post-FFI there is no file writing), but the current code makes this key cover a t4…t5 window around getObligationResult, which encompasses SMT encoding and solver execution and file I/O. A reader of the profile output today sees Solver/file writing: 1605ms next to runSolver: 1313ms and reasonably wonders how file writing alone could take ~300ms.

Concrete suggestion: pick a name that is already accurate today and stays accurate post-FFI, e.g. dischargeTotal or obligationTotal. The print label can be changed independently later. This is a small rename; doesn't need to block the PR, but leaving it makes the output confusing in the meantime.

Comment on lines +1158 to +1161
timingNs := timingNs.add (toString Verifier.Timing.solverFileWriting) (t5 - t4)

for (key, val) in timing do
timingNs := timingNs.add key val
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a TimingInfo.mergeAdd helper (see comment on Profile.lean), this becomes a one-liner and matches the pattern used in dischargeObligation:

Suggested change
timingNs := timingNs.add (toString Verifier.Timing.solverFileWriting) (t5 - t4)
for (key, val) in timing do
timingNs := timingNs.add key val
timingNs := timingNs.add (toString Verifier.Timing.solverFileWriting) (t5 - t4)
timingNs := TimingInfo.mergeAdd timingNs timing

Comment on lines +1180 to 1192
let _ ← (IO.println s!"[profile] Preprocess obligations: {nsToMs (timingNs.getD (toString Verifier.Timing.preprocessObligations) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] SMT encoding: {nsToMs (timingNs.getD (toString Verifier.Timing.smtEncoding) 0)}ms" |>.toBaseIO)
-- Print encodeCore.* sub-timings
let encodeCoreEntries := timingNs.toList.filter (·.1.startsWith "encodeCore.")
|>.mergeSort (fun a b => a.1 < b.1)
for (key, val) in encodeCoreEntries do
let suffix := key.drop "encodeCore.".length
let _ ← (IO.println s!"[profile] {suffix}: {nsToMs val}ms" |>.toBaseIO)
-- Print dischargeObligation-level timings (encodeCore total, runSolver)
let _ ← (IO.println s!"[profile] encodeCore: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.encodeSMT) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] runSolver: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.runSolver) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Solver/file writing: {nsToMs (timingNs.getD (toString Verifier.Timing.solverFileWriting) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Obligations: {obligations.size} total, {peResolvedCount} resolved by evaluator" |>.toBaseIO)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printed hierarchy is confusing. On HeapReasoning.core.st this produces:

[profile]     SMT encoding: 300ms
[profile]         defineFunTerms: 152ms
[profile]         emitDatatypes: 0ms
[profile]         encodeAssumptions: 2ms
...
[profile]       encodeCore: 267ms
[profile]       runSolver: 1313ms
[profile]     Solver/file writing: 1605ms

The encodeCore.* sub-entries appear under — and indented deeper than — SMT encoding, but they actually belong to encodeCore (which belongs to Solver/file writing). SMT encoding (= ProofObligation.toSMTTerms, building Term objects in memory) is a sibling, not a parent, of encodeCore (= emitting SMT-LIB from those Term objects). The current order is also children-before-parent, which is unusual.

Suggested reordering, parent-first with children indented consistently beneath:

Suggested change
let _ ← (IO.println s!"[profile] Preprocess obligations: {nsToMs (timingNs.getD (toString Verifier.Timing.preprocessObligations) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] SMT encoding: {nsToMs (timingNs.getD (toString Verifier.Timing.smtEncoding) 0)}ms" |>.toBaseIO)
-- Print encodeCore.* sub-timings
let encodeCoreEntries := timingNs.toList.filter (·.1.startsWith "encodeCore.")
|>.mergeSort (fun a b => a.1 < b.1)
for (key, val) in encodeCoreEntries do
let suffix := key.drop "encodeCore.".length
let _ ← (IO.println s!"[profile] {suffix}: {nsToMs val}ms" |>.toBaseIO)
-- Print dischargeObligation-level timings (encodeCore total, runSolver)
let _ ← (IO.println s!"[profile] encodeCore: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.encodeSMT) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] runSolver: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.runSolver) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Solver/file writing: {nsToMs (timingNs.getD (toString Verifier.Timing.solverFileWriting) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Obligations: {obligations.size} total, {peResolvedCount} resolved by evaluator" |>.toBaseIO)
if profile then
let _ ← (IO.println s!"[profile] Preprocess obligations: {nsToMs (timingNs.getD (toString Verifier.Timing.preprocessObligations) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] SMT term construction: {nsToMs (timingNs.getD (toString Verifier.Timing.smtEncoding) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Solver/file writing: {nsToMs (timingNs.getD (toString Verifier.Timing.solverFileWriting) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] encodeCore: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.encodeSMT) 0)}ms" |>.toBaseIO)
let encodeCorePrefix := "encodeCore."
let encodeCoreEntries := timingNs.toList.filter (·.1.startsWith encodeCorePrefix)
|>.mergeSort (fun a b => a.1 < b.1)
for (key, val) in encodeCoreEntries do
let suffix := key.drop encodeCorePrefix.length
let _ ← (IO.println s!"[profile] {suffix}: {nsToMs val}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] runSolver: {nsToMs (timingNs.getD (toString Imperative.SMT.Timing.runSolver) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] Obligations: {obligations.size} total, {peResolvedCount} resolved by evaluator" |>.toBaseIO)
return (results, stats)

I also renamed the user-visible label from SMT encoding to SMT term construction to disambiguate it from encodeCore (both phases are forms of "SMT encoding"). And I lifted "encodeCore." into a local to avoid the literal being duplicated on two lines.

let _ ← (IO.println s!"[profile] Preprocess obligations: {nsToMs (timingNs.getD (toString Verifier.Timing.preprocessObligations) 0)}ms" |>.toBaseIO)
let _ ← (IO.println s!"[profile] SMT encoding: {nsToMs (timingNs.getD (toString Verifier.Timing.smtEncoding) 0)}ms" |>.toBaseIO)
-- Print encodeCore.* sub-timings
let encodeCoreEntries := timingNs.toList.filter (·.1.startsWith "encodeCore.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string "encodeCore." is also the prefix argument to #derive_prefixed_toString Timing "encodeCore" at line 47. If the enum prefix is ever renamed (say, to smtEncode), the profiler silently drops its sub-timings and no one notices until someone eyeballs the output. Either:

  • factor out a def Timing.prefix : String := "encodeCore." next to the inductive Timing, or
  • derive the prefix from a representative constructor's toString at runtime.

The #derive_prefixed_toString elaborator could also emit such a constant alongside the ToString instance — worth considering if any other Stats enum in the codebase develops the same coupling.

inductive Timing where
| encodeSMT
| runSolver

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: this local Timing enum lives in the same file-local scope as the other SMTUtils helpers and shares a plain name with Strata.SMT.Encoder.Timing (at Verifier.lean:34) and Strata.Languages.Core.Verifier.Timing (at Verifier.lean:1053). All three are disambiguated by namespace and by their #derive_prefixed_toString prefix, so there's no correctness issue, but callers end up writing Imperative.SMT.Timing.encodeSMT / Verifier.Timing.smtEncoding / bare Timing.prelude depending on context, which is noisy. If you want to collapse the noise, a single module Strata.Util.ProfileKeys collecting all profile enums (or at least more specific per-use names like DischargeTiming / EncodeCoreTiming / VerifyTiming) would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants