Skip to content

tools/mllm-llm-benchmark: add llama benchmark template#617

Open
huangzhenhua111 wants to merge 2 commits intoUbiquitousLearning:mainfrom
huangzhenhua111:fix/llm-benchmark-mv
Open

tools/mllm-llm-benchmark: add llama benchmark template#617
huangzhenhua111 wants to merge 2 commits intoUbiquitousLearning:mainfrom
huangzhenhua111:fix/llm-benchmark-mv

Conversation

@huangzhenhua111
Copy link

@huangzhenhua111 huangzhenhua111 commented Jan 30, 2026

  • What

    • Add a Llama_Benchmark template for mllm-llm-benchmark.
    • Register llama/tinyllama in models/All.hpp so -n tiny_llama works.
  • Why

    • Current mllm-llm-benchmark registry only supports Qwen3_W4A32_KAI. This enables running the benchmark with the existing llama example config and TinyLlama V1 weights.
  • Notes

    • This template loads .mllm as ModelFileVersion::kV1 to match TinyLlama example weights (loading as V2 asserts on magic mismatch). V2 support can be added in a follow-up PR (CLI flag or file version probe).
  • Scope (change scope):

    • Only adds Llama.hpp and registers it in the benchmark registry; no changes to the core runtime.
  • Compatibility:

    • Does not affect the existing Qwen3 benchmark; LLaMA is only enabled when -n llama / -n tinyllama is specified.
  • How to test

    ninja -C build -v mllm-llm-benchmark
    ./build/bin/mllm-llm-benchmark \
      -n tiny_llama \
      -m /home/huangzhenhua/models/mllm_tinyllama/tinyllama-fp32.mllm \
      -c examples/llama/config_tiny_llama.json \
      -pp 8 -tg 4 -t 4 -cl 2048

Summary by CodeRabbit

  • New Features

    • Added benchmarking support for LLaMA and TinyLLaMA variants with streaming generation metrics (time-to-first-token, prefill and decode speeds).
  • Bug Fixes

    • Safer model-name handling to avoid character-processing issues and improve reliable model selection.
    • Retained fallback when no model match is found.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Adds a new Llama_Benchmark implementation and updates the benchmark factory: reorder/includes, make createBenchmark inline, use safe unsigned char cast for std::tolower, refine model-name checks (including a llama branch), and fall back to nullptr when no match.

Changes

Cohort / File(s) Summary
LLaMA Benchmark Implementation
tools/mllm-llm-benchmark/models/Llama.hpp
Adds Llama_Benchmark implementing BenchmarkTemplate with init(), printModelInfo(), warmup(), clear(), and run() that performs streaming generation, measures prefill/decode timings, and returns BenchmarkTemplateResult.
Factory & Includes Update
tools/mllm-llm-benchmark/models/All.hpp
Added <string> and <cctype> includes, reordered includes, changed createBenchmark to inline std::shared_ptr<BenchmarkTemplate> createBenchmark(const std::string&), cast to unsigned char before std::tolower, tightened model-name checks (Qwen3) and added a branch returning Llama_Benchmark for llama/tinyllama/tiny_llama; fallback remains nullptr.

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant Factory as createBenchmark
    participant Bench as Llama_Benchmark
    participant Model as LlamaForCausalLM
    participant Result as BenchmarkResult

    Caller->>Factory: createBenchmark(model_name)
    Factory-->>Caller: returns Bench or nullptr

    Caller->>Bench: init(cfg_path, model_path, cache_length)
    Bench->>Model: construct & load model
    Model-->>Bench: model loaded

    Caller->>Bench: warmup()
    Bench->>Model: run short generation
    Model-->>Bench: warmup done

    Caller->>Bench: run(prefill_tokens, generate_tokens)
    Bench->>Model: start streaming generation (callbacks)
    Model-->>Bench: stream tokens & timing events
    Bench->>Result: compute ttft, prefill_speed, decode_speed
    Result-->>Caller: return metrics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through headers, neat and spry,
I taught a llama to answer "why",
I timed the prefill, then timed the stream,
Small paws tapping on a benchmarking scheme,
Hop—new models sprint beneath the sky!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a llama benchmark template to the mllm-llm-benchmark tool.
Description check ✅ Passed The description is comprehensive with well-organized sections covering What, Why, Notes, Scope, Compatibility, and testing instructions, exceeding the basic template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@mllm/backends/cpu/CPUBackend.cpp`:
- Around line 86-90: The CPUPagedAttnOpFactory inclusion is conditionally
guarded by architecture macros but not by the MLLM_BUILD_ARM_BACKEND build flag,
which causes missing symbols when PagedAttn is excluded; update the preprocessor
guard around CPUPagedAttnOpFactory to require MLLM_BUILD_ARM_BACKEND (e.g., `#if`
defined(MLLM_BUILD_ARM_BACKEND) && (defined(__aarch64__) || defined(__arm__) ||
defined(__ANDROID__))) so the factory is only compiled when the backend is
enabled, and remove trailing whitespace at the ends of the
CPUMeanOpFactory/CPUPagedAttnOpFactory/CPURadixAttnOpFactory lines.

In `@mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp`:
- Around line 108-147: The subnormal handling in mllm_fp16_bits_to_fp32 uses an
unsigned exp (uint32_t) and decrements it in the normalization loop which can
underflow; change exp to a signed type (e.g., int32_t or int) before the while
loop (and use a separate unsigned mant as currently) so decrementing works
correctly, update subsequent uses (exp32 calculation and any casts) to use the
adjusted signed exp value when computing exp32, and ensure the normalization
loop condition and final exp-to-float conversion produce the correct FP32
exponent for subnormals.

In `@mllm/mllm.hpp`:
- Around line 385-387: The early return inside the conditional `#if`
defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
removes signal handler installation on Android; remove that return so the signal
handlers (SIGSEGV/SIGABRT) are still registered even when
MLLM_DISABLE_ANDROID_STACKTRACE is defined, and rely on existing guards around
stacktrace generation instead of skipping handler setup; ensure the conditional
only suppresses stacktrace-related calls and not the registration logic in the
function that installs the handlers.

In `@mllm/utils/Argparse.hpp`:
- Around line 199-209: The help-flag early exit (loop over inst.args_, checking
param->flags() for "-h"/"--help" and param->isSet(), then calling printHelp()
and return) occurs after required positional validation so -h/--help still
errors; move that help check to run before the positional required checks (or
short-circuit those checks when a help flag is set) by relocating the loop that
inspects param->flags() and param->isSet()—and calls printHelp()—to execute
prior to the positional validation block (or add a guard that skips positional
validation when any param->isSet() for flags containing "-h" or "--help").

In `@tasks/build_android_nostack.yaml`:
- Around line 6-16: The nostack variant currently doesn't disable Android
stacktraces; update the cmake_extra_args block (the cmake_extra_args list in
this file) to explicitly disable stacktrace by adding the flag
"-DMLLM_DISABLE_ANDROID_STACKTRACE=ON" (or, if you prefer C/CXX define style,
add it to the quoted CPU_BACKEND_COMPILE_OPTIONS), so the build matches the
“nostack” intent; alternatively rename the variant if you intend to keep
stacktrace enabled.

In `@tools/bench/run_suite.sh`:
- Around line 21-30: The warmup loop assumes PROMPTS has at least one entry and
uses a grep pattern that isn't fully portable; update the mapfile invocation to
use grep -v '^[[:space:]]*$' to handle all whitespace portably when populating
PROMPTS from PROMPTS_FILE, then add an explicit check after mapfile (e.g., test
"${`#PROMPTS`[@]}" -gt 0) and if the array is empty either exit with a clear error
or skip the warmup and main run, and ensure the warmup loop that calls
tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" only runs
when PROMPTS is non-empty.

In `@tools/mllm-llm-benchmark/models/All.hpp`:
- Around line 5-17: The inline lambda tolower inside createBenchmark is unsafe
and missing the <cctype> header; add `#include` <cctype> and change the call to
::tolower to cast characters to unsigned char (e.g.,
::tolower(static_cast<unsigned char>(ch))) so negative char values are handled
correctly; update the lambda in createBenchmark accordingly to use the
unsigned-char cast when transforming result.

In `@tools/mllm-llm-benchmark/models/Llama.hpp`:
- Around line 103-121: The decode-speed calculation currently includes the first
token even though decode_start is set when the first token arrives; adjust the
calculation by computing decode_token_count = std::max(0, token_count - 1) (or
subtract 1 with a non-negative clamp) and use decode_token_count when computing
r.decode_speed (and any related logic), keeping decode_us defined from
decode_end - decode_start; update references to token_count in the
r.decode_speed expression (the variables to change are token_count,
decode_start, decode_end, and r.decode_speed in the block around
model_->streamGenerate).
🧹 Nitpick comments (11)
tasks/build_android_noomp.yaml (1)

1-18: Configuration looks reasonable for Android no-threading build.

The CMake configuration correctly sets up cross-compilation for Android arm64-v8a with appropriate architecture flags. A few observations:

  1. Hardcoded install prefix: Line 12 uses /root/mllm-install-android-arm64-v8a which assumes root access and won't work in non-root CI environments or developer machines. Consider using a relative path or a configurable variable.

  2. Naming vs. behavior: The filename build_android_noomp.yaml suggests "no OpenMP," but lines 13-15 disable all threading options (MLLM_KERNEL_USE_THREADS, MLLM_KERNEL_THREADS_VENDOR_OPENMP, and MLLM_KERNEL_USE_THREADS_VENDOR_MLLM). If the intent is to disable only OpenMP while keeping other threading, the configuration should be adjusted. If disabling all threading is intentional, consider renaming to build_android_nothreads.yaml for clarity.

  3. Minor: Missing trailing newline at end of file.

💡 Suggested improvements
         - "-DMLLM_KERNEL_USE_THREADS=OFF"
         - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=OFF"
         - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"
-
+        
   - CMakeBuildTask:
       cmake_cfg_path: "build-android-arm64-v8a"
+

For the install prefix, consider:

-        - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
+        - "-DCMAKE_INSTALL_PREFIX=$MLLM_INSTALL_PREFIX"

Or use a relative path like ./install-android-arm64-v8a.

out/bench/tinyllama_fp32_warmup_20260128_202414_3416.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

These look like run artifacts; if they aren’t meant to be versioned, add a .gitignore rule and keep them as CI artifacts instead.

out/bench/tinyllama_fp32_20260128_202643_1914.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_202737_11645.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_202718_10725.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_202600_18024.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_191820_5461.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_202513_23515.time (1)

1-3: Consider excluding generated benchmark outputs from the repo.

If these are run artifacts, prefer .gitignore + CI artifact storage to avoid churn.

out/bench/tinyllama_fp32_20260128_202744_26783.time (1)

1-3: Consider excluding generated benchmark artifacts from version control.
If these .time files are run outputs, it may be better to keep them out of the repo (e.g., via .gitignore) unless you intentionally want sample results checked in.

tools/bench/parse.py (1)

57-66: Rename variable l to log_info to fix Ruff E741 and improve clarity.

The variable name l is flagged by Ruff E741 as ambiguous. Renaming to log_info resolves the lint error and enhances readability.

♻️ Proposed diff
-        l = parse_log(str(log_path))
+        log_info = parse_log(str(log_path))
@@
-            **l
+            **log_info
tools/mllm-llm-benchmark/models/Llama.hpp (1)

78-81: Consider exposing a KV-cache reset hook.
If the model retains KV cache across generations, warmup may affect subsequent benchmark runs. I can help wire a reset once the API exists.

Comment on lines 86 to 90
CPUMeanOpFactory, CPUKVCacheOpFactory,
#if defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__)
CPUPagedAttnOpFactory,
#endif
CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Align the PagedAttn guard with the build option, and remove trailing whitespace.
__ANDROID__ is true on Android x86, but PagedAttnOp.cpp is excluded when MLLM_BUILD_ARM_BACKEND is OFF, which can yield missing symbols at link time. Also, Lines 86 and 89 appear to end with trailing whitespace.

🔧 Proposed fix
-               CPUMeanOpFactory, CPUKVCacheOpFactory, 
-               `#if` defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__)
+               CPUMeanOpFactory, CPUKVCacheOpFactory,
+               `#if` defined(MLLM_BUILD_ARM_BACKEND) || defined(__aarch64__) || defined(__arm__)
                CPUPagedAttnOpFactory,
-               `#endif` 
+               `#endif`

As per coding guidelines, no line may end with trailing whitespace.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CPUMeanOpFactory, CPUKVCacheOpFactory,
#if defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__)
CPUPagedAttnOpFactory,
#endif
CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory,
CPUMeanOpFactory, CPUKVCacheOpFactory,
`#if` defined(MLLM_BUILD_ARM_BACKEND) || defined(__aarch64__) || defined(__arm__)
CPUPagedAttnOpFactory,
`#endif`
CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory,
🤖 Prompt for AI Agents
In `@mllm/backends/cpu/CPUBackend.cpp` around lines 86 - 90, The
CPUPagedAttnOpFactory inclusion is conditionally guarded by architecture macros
but not by the MLLM_BUILD_ARM_BACKEND build flag, which causes missing symbols
when PagedAttn is excluded; update the preprocessor guard around
CPUPagedAttnOpFactory to require MLLM_BUILD_ARM_BACKEND (e.g., `#if`
defined(MLLM_BUILD_ARM_BACKEND) && (defined(__aarch64__) || defined(__arm__) ||
defined(__ANDROID__))) so the factory is only compiled when the backend is
enabled, and remove trailing whitespace at the ends of the
CPUMeanOpFactory/CPUPagedAttnOpFactory/CPURadixAttnOpFactory lines.

Comment on lines 108 to 147
// fp16 bits -> fp32 fallback (no F16C)
static inline float mllm_fp16_bits_to_fp32(uint16_t h) {
uint32_t sign = (uint32_t)(h & 0x8000) << 16;
uint32_t exp = (h >> 10) & 0x1F;
uint32_t mant = h & 0x03FF;

uint32_t f;
if (exp == 0) {
if (mant == 0) {
f = sign; // zero
} else {
// subnormal
exp = 1;
while ((mant & 0x0400) == 0) { mant <<= 1; exp--; }
mant &= 0x03FF;
uint32_t exp32 = (exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}
} else if (exp == 31) {
// inf / nan
uint32_t exp32 = 0xFFu << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
} else {
uint32_t exp32 = (exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}

float out;
__builtin_memcpy(&out, &f, sizeof(out));
return out;
}

#if defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64))
#define MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x))
#else
#define MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x))
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix subnormal handling: unsigned exponent underflows.
exp is uint32_t and is decremented in the subnormal normalization loop. For subnormals requiring multiple shifts, this underflows and yields incorrect FP32 results.

🔧 Proposed fix
 static inline float mllm_fp16_bits_to_fp32(uint16_t h) {
   uint32_t sign = (uint32_t)(h & 0x8000) << 16;
-  uint32_t exp  = (h >> 10) & 0x1F;
+  int32_t  exp  = (h >> 10) & 0x1F;
   uint32_t mant = h & 0x03FF;
@@
-      uint32_t exp32 = (exp + (127 - 15)) << 23;
+      uint32_t exp32 = (uint32_t)(exp + (127 - 15)) << 23;
       uint32_t mant32 = mant << 13;
       f = sign | exp32 | mant32;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// fp16 bits -> fp32 fallback (no F16C)
static inline float mllm_fp16_bits_to_fp32(uint16_t h) {
uint32_t sign = (uint32_t)(h & 0x8000) << 16;
uint32_t exp = (h >> 10) & 0x1F;
uint32_t mant = h & 0x03FF;
uint32_t f;
if (exp == 0) {
if (mant == 0) {
f = sign; // zero
} else {
// subnormal
exp = 1;
while ((mant & 0x0400) == 0) { mant <<= 1; exp--; }
mant &= 0x03FF;
uint32_t exp32 = (exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}
} else if (exp == 31) {
// inf / nan
uint32_t exp32 = 0xFFu << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
} else {
uint32_t exp32 = (exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}
float out;
__builtin_memcpy(&out, &f, sizeof(out));
return out;
}
#if defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64))
#define MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x))
#else
#define MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x))
#endif
// fp16 bits -> fp32 fallback (no F16C)
static inline float mllm_fp16_bits_to_fp32(uint16_t h) {
uint32_t sign = (uint32_t)(h & 0x8000) << 16;
int32_t exp = (h >> 10) & 0x1F;
uint32_t mant = h & 0x03FF;
uint32_t f;
if (exp == 0) {
if (mant == 0) {
f = sign; // zero
} else {
// subnormal
exp = 1;
while ((mant & 0x0400) == 0) { mant <<= 1; exp--; }
mant &= 0x03FF;
uint32_t exp32 = (uint32_t)(exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}
} else if (exp == 31) {
// inf / nan
uint32_t exp32 = 0xFFu << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
} else {
uint32_t exp32 = (exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;
}
float out;
__builtin_memcpy(&out, &f, sizeof(out));
return out;
}
`#if` defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64))
`#define` MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x))
`#else`
`#define` MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x))
`#endif`
🤖 Prompt for AI Agents
In `@mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp` around lines 108
- 147, The subnormal handling in mllm_fp16_bits_to_fp32 uses an unsigned exp
(uint32_t) and decrements it in the normalization loop which can underflow;
change exp to a signed type (e.g., int32_t or int) before the while loop (and
use a separate unsigned mant as currently) so decrementing works correctly,
update subsequent uses (exp32 calculation and any casts) to use the adjusted
signed exp value when computing exp32, and ensure the normalization loop
condition and final exp-to-float conversion produce the correct FP32 exponent
for subnormals.

mllm/mllm.hpp Outdated
Comment on lines 385 to 387
#if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
return;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t skip signal handler registration when stacktrace is disabled.

Returning early on Android removes all signal handling, so SIGSEGV/SIGABRT won’t trigger the existing shutdown path. The macro name implies only stacktrace suppression, and you already gate stacktrace calls elsewhere, so handlers can remain installed.

🛠️ Proposed fix
 inline void __setup_signal_handler() {
-#if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
-  return;
-#endif
 `#if` defined(__MLLM_SIGNAL_WINDOWS)
   signal(SIGINT, __signal_handler);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
return;
#endif
inline void __setup_signal_handler() {
`#if` defined(__MLLM_SIGNAL_WINDOWS)
signal(SIGINT, __signal_handler);
🤖 Prompt for AI Agents
In `@mllm/mllm.hpp` around lines 385 - 387, The early return inside the
conditional `#if` defined(__MLLM_SIGNAL_ANDROID) &&
defined(MLLM_DISABLE_ANDROID_STACKTRACE) removes signal handler installation on
Android; remove that return so the signal handlers (SIGSEGV/SIGABRT) are still
registered even when MLLM_DISABLE_ANDROID_STACKTRACE is defined, and rely on
existing guards around stacktrace generation instead of skipping handler setup;
ensure the conditional only suppresses stacktrace-related calls and not the
registration logic in the function that installs the handlers.

Comment on lines 199 to 209
// Early exit on help: skip required checks
for (auto& param : inst.args_) {
const auto& flags = param->flags();
const bool is_help =
std::find(flags.begin(), flags.end(), "-h") != flags.end() ||
std::find(flags.begin(), flags.end(), "--help") != flags.end();
if (is_help && param->isSet()) {
printHelp();
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Help flag still errors when required positional args are missing

The early-exit check runs after positional required validation (Line 184-193), so -h/--help still fails if required positionals are absent. Move the help check before positional validation (or bypass those checks when help is set).

✅ Suggested fix (reorder help check before positional validation)
-    // Then parse positional arguments
-    size_t pos_idx = 0;
-    for (auto& param : inst.args_) {
-      if (param->isPositional()) {
-        if (pos_idx >= positional_args.size()) {
-          if (param->isRequired()) {
-            printHelp();
-            MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing positional argument");
-          }
-          continue;
-        }
-        param->parse(positional_args[pos_idx++]);
-      }
-    }
-
-    // Early exit on help: skip required checks
-    for (auto& param : inst.args_) {
+    // Early exit on help: skip required checks (including positional)
+    bool help_requested = false;
+    for (auto& param : inst.args_) {
       const auto& flags = param->flags();
       const bool is_help =
           std::find(flags.begin(), flags.end(), "-h") != flags.end() ||
           std::find(flags.begin(), flags.end(), "--help") != flags.end();
       if (is_help && param->isSet()) {
-        printHelp();
-        return;
+        help_requested = true;
+        break;
       }
     }
+    if (help_requested) {
+      printHelp();
+      return;
+    }
+
+    // Then parse positional arguments
+    size_t pos_idx = 0;
+    for (auto& param : inst.args_) {
+      if (param->isPositional()) {
+        if (pos_idx >= positional_args.size()) {
+          if (param->isRequired()) {
+            printHelp();
+            MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing positional argument");
+          }
+          continue;
+        }
+        param->parse(positional_args[pos_idx++]);
+      }
+    }
🤖 Prompt for AI Agents
In `@mllm/utils/Argparse.hpp` around lines 199 - 209, The help-flag early exit
(loop over inst.args_, checking param->flags() for "-h"/"--help" and
param->isSet(), then calling printHelp() and return) occurs after required
positional validation so -h/--help still errors; move that help check to run
before the positional required checks (or short-circuit those checks when a help
flag is set) by relocating the loop that inspects param->flags() and
param->isSet()—and calls printHelp()—to execute prior to the positional
validation block (or add a guard that skips positional validation when any
param->isSet() for flags containing "-h" or "--help").

Comment on lines 6 to 16
cmake_extra_args:
- "-DMLLM_CROSS_COMPILE=ON"
- "-DMLLM_BUILD_ARM_BACKEND=ON"
- "-DANDROID_PLATFORM=android-28"
- "-DANDROID_ABI=arm64-v8a"
- '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"'
- "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
- "-DMLLM_KERNEL_USE_THREADS=ON"
- "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON"
- "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add stacktrace-disable flags to match the “nostack” intent.

This variant doesn’t set MLLM_DISABLE_ANDROID_STACKTRACE, so it still builds with stacktrace enabled. If “nostack” means stacktrace off (as the no-OMP nostack config does), please add the C/CXX flags here (or rename the variant if stacktrace is intended).

🔧 Proposed fix
       cmake_extra_args:
         - "-DMLLM_CROSS_COMPILE=ON"
         - "-DMLLM_BUILD_ARM_BACKEND=ON"
         - "-DANDROID_PLATFORM=android-28"
         - "-DANDROID_ABI=arm64-v8a"
         - '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"'
         - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
         - "-DMLLM_KERNEL_USE_THREADS=ON"
         - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON"
         - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"
+        - '-DCMAKE_C_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'
+        - '-DCMAKE_CXX_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmake_extra_args:
- "-DMLLM_CROSS_COMPILE=ON"
- "-DMLLM_BUILD_ARM_BACKEND=ON"
- "-DANDROID_PLATFORM=android-28"
- "-DANDROID_ABI=arm64-v8a"
- '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"'
- "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
- "-DMLLM_KERNEL_USE_THREADS=ON"
- "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON"
- "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"
cmake_extra_args:
- "-DMLLM_CROSS_COMPILE=ON"
- "-DMLLM_BUILD_ARM_BACKEND=ON"
- "-DANDROID_PLATFORM=android-28"
- "-DANDROID_ABI=arm64-v8a"
- '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"'
- "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
- "-DMLLM_KERNEL_USE_THREADS=ON"
- "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON"
- "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"
- '-DCMAKE_C_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'
- '-DCMAKE_CXX_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'
🤖 Prompt for AI Agents
In `@tasks/build_android_nostack.yaml` around lines 6 - 16, The nostack variant
currently doesn't disable Android stacktraces; update the cmake_extra_args block
(the cmake_extra_args list in this file) to explicitly disable stacktrace by
adding the flag "-DMLLM_DISABLE_ANDROID_STACKTRACE=ON" (or, if you prefer C/CXX
define style, add it to the quoted CPU_BACKEND_COMPILE_OPTIONS), so the build
matches the “nostack” intent; alternatively rename the variant if you intend to
keep stacktrace enabled.

Comment on lines 21 to 30
mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE")

echo "Prompts: ${#PROMPTS[@]}, warmup=$WARMUP, repeat=$REPEAT"
mkdir -p "$OUT_DIR"

# warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查)
for ((i=1;i<=WARMUP;i++)); do
echo "== Warmup $i/$WARMUP =="
tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup"
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n tools/bench/run_suite.sh

Repository: UbiquitousLearning/mllm

Length of output: 1324


🏁 Script executed:

#!/bin/bash
# Test grep patterns for blank line filtering

echo "=== Testing grep patterns ==="

# Create test file with blank lines and whitespace
cat > /tmp/test_prompts.txt << 'EOF'
prompt1

prompt2
   
prompt3
EOF

echo "--- Original pattern (grep -v '^\s*$') ---"
grep -v '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 1 failed - exit code: $?]"

echo ""
echo "--- With -P flag (grep -vP '^\s*$') ---"
grep -vP '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 2 failed - exit code: $?]"

echo ""
echo "--- POSIX character class (grep -v '^[[:space:]]*$') ---"
grep -v '^[[:space:]]*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 3 failed - exit code: $?]"

echo ""
echo "--- With -E flag (grep -vE '^\s*$') ---"
grep -vE '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 4 failed - exit code: $?]"

Repository: UbiquitousLearning/mllm

Length of output: 375


🏁 Script executed:

# Test empty array behavior in bash
bash << 'BASH_EOF'
echo "=== Testing array bounds behavior ==="

# Simulate empty PROMPTS array
declare -a PROMPTS=()
echo "Empty array length: ${`#PROMPTS`[@]}"

# Try accessing [0]
if [[ -z "${PROMPTS[0]}" ]]; then
  echo "PROMPTS[0] is empty/unset"
fi

# Test with set -e behavior
bash -c '
set -e
declare -a PROMPTS=()
echo "Array size: ${`#PROMPTS`[@]}"
if [[ ${`#PROMPTS`[@]} -gt 0 ]]; then
  echo "Running with: ${PROMPTS[0]}"
else
  echo "Array is empty - accessing [0] would be unsafe"
fi
'
BASH_EOF

Repository: UbiquitousLearning/mllm

Length of output: 213


Fix unsafe array access and improve grep portability.
The script doesn't validate that the prompts array is non-empty before accessing PROMPTS[0] on line 29. If the prompts file is blank or contains only whitespace, grep -v '^\s*$' will filter all content, leaving an empty array; subsequent access to ${PROMPTS[0]} will pass an empty string to run_once.sh, likely causing it to fail. While the current grep pattern works in practice, using [[:space:]] improves portability across grep versions. Add an explicit empty-array check before the warmup loop.

Suggested fix
-mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE")
+mapfile -t PROMPTS < <(grep -v '^[[:space:]]*$' "$PROMPTS_FILE")
+
+if (( ${`#PROMPTS`[@]} == 0 )); then
+  echo "ERROR: prompts file is empty or contains only whitespace: $PROMPTS_FILE"
+  exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE")
echo "Prompts: ${#PROMPTS[@]}, warmup=$WARMUP, repeat=$REPEAT"
mkdir -p "$OUT_DIR"
# warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查)
for ((i=1;i<=WARMUP;i++)); do
echo "== Warmup $i/$WARMUP =="
tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup"
done
mapfile -t PROMPTS < <(grep -v '^[[:space:]]*$' "$PROMPTS_FILE")
if (( ${`#PROMPTS`[@]} == 0 )); then
echo "ERROR: prompts file is empty or contains only whitespace: $PROMPTS_FILE"
exit 1
fi
echo "Prompts: ${`#PROMPTS`[@]}, warmup=$WARMUP, repeat=$REPEAT"
mkdir -p "$OUT_DIR"
# warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查)
for ((i=1;i<=WARMUP;i++)); do
echo "== Warmup $i/$WARMUP =="
tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup"
done
🤖 Prompt for AI Agents
In `@tools/bench/run_suite.sh` around lines 21 - 30, The warmup loop assumes
PROMPTS has at least one entry and uses a grep pattern that isn't fully
portable; update the mapfile invocation to use grep -v '^[[:space:]]*$' to
handle all whitespace portably when populating PROMPTS from PROMPTS_FILE, then
add an explicit check after mapfile (e.g., test "${`#PROMPTS`[@]}" -gt 0) and if
the array is empty either exit with a clear error or skip the warmup and main
run, and ensure the warmup loop that calls tools/bench/run_once.sh
"${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" only runs when PROMPTS is non-empty.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/models/Llama.hpp`:
- Line 48: The mllm::print call in Llama.hpp divides cfg_->hidden_size by
cfg_->num_attention_heads without checking for zero; change it to compute
head_dim only after checking cfg_->num_attention_heads (e.g., if
cfg_->num_attention_heads == 0 set head_dim to 0 or print "N/A"), then pass that
safe value to mllm::print so cfg_->hidden_size and cfg_->num_attention_heads are
referenced but never used in an actual divide-by-zero operation.
🧹 Nitpick comments (1)
tools/mllm-llm-benchmark/models/All.hpp (1)

28-32: Redundant substring checks.

Since both "tinyllama" and "tiny_llama" contain the substring "llama", the first condition find("llama") != npos will already match them. The additional checks are redundant.

♻️ Simplified version
-  if (normalized_model_name.find("llama") != std::string::npos ||
-      normalized_model_name.find("tinyllama") != std::string::npos ||
-      normalized_model_name.find("tiny_llama") != std::string::npos) {
+  if (normalized_model_name.find("llama") != std::string::npos) {
     return std::make_shared<Llama_Benchmark>();
   }

@huangzhenhua111
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@huangzhenhua111
Copy link
Author

@coderabbitai help

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
    • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
    • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit configuration file (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@huangzhenhua111
Copy link
Author

@coderabbitai full review

I pushed updates in commit d95911a. Please re-review the entire PR from scratch and regenerate the review comments based on the latest changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

@huangzhenhua111: I'll perform a full review of the PR from scratch based on the latest changes in commit d95911a. This will regenerate all review comments for the entire PR.

✅ Actions performed

Full review triggered.

@huangzhenhua111
Copy link
Author

@coderabbitai ignore pre-merge checks

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

✅ Actions performed

Pre-merge checks override command executed successfully.

@huangzhenhua111
Copy link
Author

@coderabbitai run pre-merge checks

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Important

Pre-merge checks are overridden. No further action needed.

@huangzhenhua111
Copy link
Author

@coderabbitai full review
@coderabbitai run pre-merge checks

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Important

Pre-merge checks are overridden. No further action needed.

@huangzhenhua111
Copy link
Author

@chenghuaWang @yirongjie @oreomaker CI checks are still "Checks 0" for this fork PR. Could a maintainer please approve workflows so CI can run? Thanks!

@chenghuaWang
Copy link
Collaborator

chenghuaWang commented Jan 30, 2026

Hello, @huangzhenhua111

Thanks for your contribution!

CodeRabbit is not related to C++ CI. The CI does not run on the ./tools/ directory; it only runs on ./mllm/. We'll take a look at this PR. Which platform did you check this benchmark on?

@huangzhenhua111
Copy link
Author

huangzhenhua111 commented Jan 30, 2026

你好,@huangzhenhua111

感谢您的贡献!

CodeRabbit 与 C++ CI 无关。CI 不会在该./tools/目录下运行;它只在 [此处应填写具体目录] 上运行./mllm/。我们会查看这个 PR。你是在哪个平台上测试这个基准测试的?

Sorry, I accidentally closed the PR earlier — it’s reopened now. Platform details are in my next comment.

@huangzhenhua111
Copy link
Author

你好,@huangzhenhua111

感谢您的贡献!

CodeRabbit 与 C++ CI 无关。CI 不会在该./tools/目录下运行;它只在 [此处应填写具体目录] 上运行./mllm/。我们会查看这个 PR。你在哪个平台上测试这个基准测试的?

Hi Chenghua,

Thanks for taking a look!

I verified this benchmark on x86_64 Linux under WSL (Ubuntu) on CPU (no GPU/NPU). I built it with CMake + Ninja and ran build/bin/mllm-llm-benchmark using the TinyLlama FP32 .mllm (ModelFileVersion V1) and examples/llama/config_tiny_llama.json, e.g. PP=8, TG=4.

Understood on CodeRabbit/CI scope (tools/ is not covered by CI). I’ll continue to keep the changes isolated to tools and make sure it compiles cleanly.

Best,
huangzhenhua111

@chenghuaWang
Copy link
Collaborator

@jialilve, can you take a look at this PR?

@chenghuaWang chenghuaWang added the enhancement New feature or request label Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants