Name	Name	Last commit message	Last commit date
parent directory ..
search	search
README.md	README.md
perf_deepseekv2_layer4_ep4_pp2.py	perf_deepseekv2_layer4_ep4_pp2.py
perf_deepseekv2_layer4_ep4_pp2_full_recompute.py	perf_deepseekv2_layer4_ep4_pp2_full_recompute.py
perf_deepseekv2_layer4_ep4_pp2_selective_recompute.py	perf_deepseekv2_layer4_ep4_pp2_selective_recompute.py
perf_deepseekv2_layer4_ep8_pp1.py	perf_deepseekv2_layer4_ep8_pp1.py
perf_llama3_70b_layer12_tp1_pp2.py	perf_llama3_70b_layer12_tp1_pp2.py
perf_llama3_70b_layer12_tp2.py	perf_llama3_70b_layer12_tp2.py
perf_llama3_70b_layer12_tp2_full_recompute.py	perf_llama3_70b_layer12_tp2_full_recompute.py
perf_llama3_70b_layer12_tp2_selective_recompute.py	perf_llama3_70b_layer12_tp2_selective_recompute.py
perf_llama3_70b_layer12_tp4.py	perf_llama3_70b_layer12_tp4.py
perf_llama3_70b_layer12_tp8.py	perf_llama3_70b_layer12_tp8.py
perf_llama3_8b_tp1_pp2.py	perf_llama3_8b_tp1_pp2.py
perf_llama3_8b_tp2.py	perf_llama3_8b_tp2.py
perf_llama3_8b_tp4.py	perf_llama3_8b_tp4.py
perf_llama3_8b_tp8.py	perf_llama3_8b_tp8.py
run_all.sh	run_all.sh
search_strategy_llama3_8b.py	search_strategy_llama3_8b.py
show_simu_avaliable_modes.py	show_simu_avaliable_modes.py
simulator_trace_snapshot.py	simulator_trace_snapshot.py

Name

Last commit message

Last commit date

perf_deepseekv2_layer4_ep4_pp2.py

perf_deepseekv2_layer4_ep4_pp2_full_recompute.py

perf_deepseekv2_layer4_ep4_pp2_selective_recompute.py

perf_deepseekv2_layer4_ep8_pp1.py

perf_llama3_70b_layer12_tp1_pp2.py

perf_llama3_70b_layer12_tp2.py

perf_llama3_70b_layer12_tp2_full_recompute.py

perf_llama3_70b_layer12_tp2_selective_recompute.py

perf_llama3_70b_layer12_tp4.py

perf_llama3_70b_layer12_tp8.py

perf_llama3_8b_tp1_pp2.py

perf_llama3_8b_tp2.py

perf_llama3_8b_tp4.py

perf_llama3_8b_tp8.py

run_all.sh

search_strategy_llama3_8b.py

show_simu_avaliable_modes.py

simulator_trace_snapshot.py

SimuMax Examples

This directory contains examples that use the shipped configs under configs/.

For the basic PerfLLM and simulate() API flow, see docs/tutorial.md. This page only records the runnable example entrypoints and the simulator artifact fields that users usually parse.

Perf Examples

Run a standard perf example from this directory:

python perf_llama3_8b_tp1_pp2.py

The perf examples call run_estimate() and write analytical cost and memory outputs such as compute_result.json and mem_result.json.

Simulator Trace And Snapshot Example

Run the small simulator example:

python simulator_trace_snapshot.py

The example uses llama2-tiny, trims it to two layers for speed, runs run_estimate(), then calls simulate() and prints a short summary of the exported artifacts. Pass --output DIR to choose a different artifact directory.

Trace

tracing_logs.json is always exported. It is a Chrome trace style event list that can be opened by Perfetto or Chrome tracing.

Common parsing fields:

ph == "X": duration event.
ts, dur: start timestamp and duration in microseconds.
pid: simulated rank, for example rank0.
tid: display lane, such as forward compute, backward compute, comm, wait, or pipeline detail.
name: operation name.
cat: broad category when available.
args.call_stack: model/module context.
args.base_name, args.direction, args.stream_type: useful grouping keys.

Minimal trace parser:

events = payload.get("traceEvents", payload) if isinstance(payload, dict) else payload
slices = [event for event in events if event.get("ph") == "X"]
duration_us = max(event["ts"] + event.get("dur", 0.0) for event in slices)
comm_slices = [
    event for event in slices
    if event.get("cat") == "comm" or event.get("args", {}).get("stream_type") == "comm"
]

Snapshot

The current simulator memory timeline exports memory artifacts for pp_size == 1 or for the sync pipeline memory path (pp_comm_async == false). This example uses pp_size == 1, so it writes:

simu_memory_result.json: compact static and peak allocated memory by rank.
simu_memory_snapshot.json: detailed memory timeline.
simu_memory_viz_snapshot.pickle: memory-visualization adapter.

simu_memory_snapshot.json currently uses schema simumax_memory_snapshot_v1.

Common snapshot fields:

rank, ts_us: simulated rank and timestamp.
static_bytes: long-lived model memory.
cached_bytes: live activation cache retained for backward.
temp_bytes: op-local temporary peak memory.
allocated_bytes: static_bytes + cached_bytes + temp_bytes.
phase, op_name: lifecycle phase and associated operation.
microbatch, chunk: data microbatch and VPP chunk metadata when present.
cache_tokens: activation-cache lifetime records.

Minimal snapshot parser:

events = snapshot["events"]
peak = max(events, key=lambda event: event["allocated_bytes"])
alloc_tokens = [
    token for token in snapshot.get("cache_tokens", [])
    if token.get("action") == "alloc"
]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SimuMax Examples

Perf Examples

Simulator Trace And Snapshot Example

Trace

Snapshot

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

SimuMax Examples

Perf Examples

Simulator Trace And Snapshot Example

Trace

Snapshot