tools/mllm-llm-benchmark: add llama benchmark template#617
tools/mllm-llm-benchmark: add llama benchmark template#617huangzhenhua111 wants to merge 2 commits intoUbiquitousLearning:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds a new Llama_Benchmark implementation and updates the benchmark factory: reorder/includes, make Changes
Sequence DiagramsequenceDiagram
participant Caller as Caller
participant Factory as createBenchmark
participant Bench as Llama_Benchmark
participant Model as LlamaForCausalLM
participant Result as BenchmarkResult
Caller->>Factory: createBenchmark(model_name)
Factory-->>Caller: returns Bench or nullptr
Caller->>Bench: init(cfg_path, model_path, cache_length)
Bench->>Model: construct & load model
Model-->>Bench: model loaded
Caller->>Bench: warmup()
Bench->>Model: run short generation
Model-->>Bench: warmup done
Caller->>Bench: run(prefill_tokens, generate_tokens)
Bench->>Model: start streaming generation (callbacks)
Model-->>Bench: stream tokens & timing events
Bench->>Result: compute ttft, prefill_speed, decode_speed
Result-->>Caller: return metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@mllm/backends/cpu/CPUBackend.cpp`:
- Around line 86-90: The CPUPagedAttnOpFactory inclusion is conditionally
guarded by architecture macros but not by the MLLM_BUILD_ARM_BACKEND build flag,
which causes missing symbols when PagedAttn is excluded; update the preprocessor
guard around CPUPagedAttnOpFactory to require MLLM_BUILD_ARM_BACKEND (e.g., `#if`
defined(MLLM_BUILD_ARM_BACKEND) && (defined(__aarch64__) || defined(__arm__) ||
defined(__ANDROID__))) so the factory is only compiled when the backend is
enabled, and remove trailing whitespace at the ends of the
CPUMeanOpFactory/CPUPagedAttnOpFactory/CPURadixAttnOpFactory lines.
In `@mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp`:
- Around line 108-147: The subnormal handling in mllm_fp16_bits_to_fp32 uses an
unsigned exp (uint32_t) and decrements it in the normalization loop which can
underflow; change exp to a signed type (e.g., int32_t or int) before the while
loop (and use a separate unsigned mant as currently) so decrementing works
correctly, update subsequent uses (exp32 calculation and any casts) to use the
adjusted signed exp value when computing exp32, and ensure the normalization
loop condition and final exp-to-float conversion produce the correct FP32
exponent for subnormals.
In `@mllm/mllm.hpp`:
- Around line 385-387: The early return inside the conditional `#if`
defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
removes signal handler installation on Android; remove that return so the signal
handlers (SIGSEGV/SIGABRT) are still registered even when
MLLM_DISABLE_ANDROID_STACKTRACE is defined, and rely on existing guards around
stacktrace generation instead of skipping handler setup; ensure the conditional
only suppresses stacktrace-related calls and not the registration logic in the
function that installs the handlers.
In `@mllm/utils/Argparse.hpp`:
- Around line 199-209: The help-flag early exit (loop over inst.args_, checking
param->flags() for "-h"/"--help" and param->isSet(), then calling printHelp()
and return) occurs after required positional validation so -h/--help still
errors; move that help check to run before the positional required checks (or
short-circuit those checks when a help flag is set) by relocating the loop that
inspects param->flags() and param->isSet()—and calls printHelp()—to execute
prior to the positional validation block (or add a guard that skips positional
validation when any param->isSet() for flags containing "-h" or "--help").
In `@tasks/build_android_nostack.yaml`:
- Around line 6-16: The nostack variant currently doesn't disable Android
stacktraces; update the cmake_extra_args block (the cmake_extra_args list in
this file) to explicitly disable stacktrace by adding the flag
"-DMLLM_DISABLE_ANDROID_STACKTRACE=ON" (or, if you prefer C/CXX define style,
add it to the quoted CPU_BACKEND_COMPILE_OPTIONS), so the build matches the
“nostack” intent; alternatively rename the variant if you intend to keep
stacktrace enabled.
In `@tools/bench/run_suite.sh`:
- Around line 21-30: The warmup loop assumes PROMPTS has at least one entry and
uses a grep pattern that isn't fully portable; update the mapfile invocation to
use grep -v '^[[:space:]]*$' to handle all whitespace portably when populating
PROMPTS from PROMPTS_FILE, then add an explicit check after mapfile (e.g., test
"${`#PROMPTS`[@]}" -gt 0) and if the array is empty either exit with a clear error
or skip the warmup and main run, and ensure the warmup loop that calls
tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" only runs
when PROMPTS is non-empty.
In `@tools/mllm-llm-benchmark/models/All.hpp`:
- Around line 5-17: The inline lambda tolower inside createBenchmark is unsafe
and missing the <cctype> header; add `#include` <cctype> and change the call to
::tolower to cast characters to unsigned char (e.g.,
::tolower(static_cast<unsigned char>(ch))) so negative char values are handled
correctly; update the lambda in createBenchmark accordingly to use the
unsigned-char cast when transforming result.
In `@tools/mllm-llm-benchmark/models/Llama.hpp`:
- Around line 103-121: The decode-speed calculation currently includes the first
token even though decode_start is set when the first token arrives; adjust the
calculation by computing decode_token_count = std::max(0, token_count - 1) (or
subtract 1 with a non-negative clamp) and use decode_token_count when computing
r.decode_speed (and any related logic), keeping decode_us defined from
decode_end - decode_start; update references to token_count in the
r.decode_speed expression (the variables to change are token_count,
decode_start, decode_end, and r.decode_speed in the block around
model_->streamGenerate).
🧹 Nitpick comments (11)
tasks/build_android_noomp.yaml (1)
1-18: Configuration looks reasonable for Android no-threading build.The CMake configuration correctly sets up cross-compilation for Android arm64-v8a with appropriate architecture flags. A few observations:
Hardcoded install prefix: Line 12 uses
/root/mllm-install-android-arm64-v8awhich assumes root access and won't work in non-root CI environments or developer machines. Consider using a relative path or a configurable variable.Naming vs. behavior: The filename
build_android_noomp.yamlsuggests "no OpenMP," but lines 13-15 disable all threading options (MLLM_KERNEL_USE_THREADS,MLLM_KERNEL_THREADS_VENDOR_OPENMP, andMLLM_KERNEL_USE_THREADS_VENDOR_MLLM). If the intent is to disable only OpenMP while keeping other threading, the configuration should be adjusted. If disabling all threading is intentional, consider renaming tobuild_android_nothreads.yamlfor clarity.Minor: Missing trailing newline at end of file.
💡 Suggested improvements
- "-DMLLM_KERNEL_USE_THREADS=OFF" - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=OFF" - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF" - + - CMakeBuildTask: cmake_cfg_path: "build-android-arm64-v8a" +For the install prefix, consider:
- - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a" + - "-DCMAKE_INSTALL_PREFIX=$MLLM_INSTALL_PREFIX"Or use a relative path like
./install-android-arm64-v8a.out/bench/tinyllama_fp32_warmup_20260128_202414_3416.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.These look like run artifacts; if they aren’t meant to be versioned, add a
.gitignorerule and keep them as CI artifacts instead.out/bench/tinyllama_fp32_20260128_202643_1914.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_202737_11645.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_202718_10725.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_202600_18024.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_191820_5461.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_202513_23515.time (1)
1-3: Consider excluding generated benchmark outputs from the repo.If these are run artifacts, prefer
.gitignore+ CI artifact storage to avoid churn.out/bench/tinyllama_fp32_20260128_202744_26783.time (1)
1-3: Consider excluding generated benchmark artifacts from version control.
If these.timefiles are run outputs, it may be better to keep them out of the repo (e.g., via.gitignore) unless you intentionally want sample results checked in.tools/bench/parse.py (1)
57-66: Rename variableltolog_infoto fix Ruff E741 and improve clarity.The variable name
lis flagged by Ruff E741 as ambiguous. Renaming tolog_inforesolves the lint error and enhances readability.♻️ Proposed diff
- l = parse_log(str(log_path)) + log_info = parse_log(str(log_path)) @@ - **l + **log_infotools/mllm-llm-benchmark/models/Llama.hpp (1)
78-81: Consider exposing a KV-cache reset hook.
If the model retains KV cache across generations, warmup may affect subsequent benchmark runs. I can help wire a reset once the API exists.
mllm/backends/cpu/CPUBackend.cpp
Outdated
| CPUMeanOpFactory, CPUKVCacheOpFactory, | ||
| #if defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__) | ||
| CPUPagedAttnOpFactory, | ||
| #endif | ||
| CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory, |
There was a problem hiding this comment.
Align the PagedAttn guard with the build option, and remove trailing whitespace.
__ANDROID__ is true on Android x86, but PagedAttnOp.cpp is excluded when MLLM_BUILD_ARM_BACKEND is OFF, which can yield missing symbols at link time. Also, Lines 86 and 89 appear to end with trailing whitespace.
🔧 Proposed fix
- CPUMeanOpFactory, CPUKVCacheOpFactory,
- `#if` defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__)
+ CPUMeanOpFactory, CPUKVCacheOpFactory,
+ `#if` defined(MLLM_BUILD_ARM_BACKEND) || defined(__aarch64__) || defined(__arm__)
CPUPagedAttnOpFactory,
- `#endif`
+ `#endif`As per coding guidelines, no line may end with trailing whitespace.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CPUMeanOpFactory, CPUKVCacheOpFactory, | |
| #if defined(__aarch64__) || defined(__arm__) || defined(__ANDROID__) | |
| CPUPagedAttnOpFactory, | |
| #endif | |
| CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory, | |
| CPUMeanOpFactory, CPUKVCacheOpFactory, | |
| `#if` defined(MLLM_BUILD_ARM_BACKEND) || defined(__aarch64__) || defined(__arm__) | |
| CPUPagedAttnOpFactory, | |
| `#endif` | |
| CPUScatter2ShardsOpFactory, CPURadixAttnOpFactory, |
🤖 Prompt for AI Agents
In `@mllm/backends/cpu/CPUBackend.cpp` around lines 86 - 90, The
CPUPagedAttnOpFactory inclusion is conditionally guarded by architecture macros
but not by the MLLM_BUILD_ARM_BACKEND build flag, which causes missing symbols
when PagedAttn is excluded; update the preprocessor guard around
CPUPagedAttnOpFactory to require MLLM_BUILD_ARM_BACKEND (e.g., `#if`
defined(MLLM_BUILD_ARM_BACKEND) && (defined(__aarch64__) || defined(__arm__) ||
defined(__ANDROID__))) so the factory is only compiled when the backend is
enabled, and remove trailing whitespace at the ends of the
CPUMeanOpFactory/CPUPagedAttnOpFactory/CPURadixAttnOpFactory lines.
| // fp16 bits -> fp32 fallback (no F16C) | ||
| static inline float mllm_fp16_bits_to_fp32(uint16_t h) { | ||
| uint32_t sign = (uint32_t)(h & 0x8000) << 16; | ||
| uint32_t exp = (h >> 10) & 0x1F; | ||
| uint32_t mant = h & 0x03FF; | ||
|
|
||
| uint32_t f; | ||
| if (exp == 0) { | ||
| if (mant == 0) { | ||
| f = sign; // zero | ||
| } else { | ||
| // subnormal | ||
| exp = 1; | ||
| while ((mant & 0x0400) == 0) { mant <<= 1; exp--; } | ||
| mant &= 0x03FF; | ||
| uint32_t exp32 = (exp + (127 - 15)) << 23; | ||
| uint32_t mant32 = mant << 13; | ||
| f = sign | exp32 | mant32; | ||
| } | ||
| } else if (exp == 31) { | ||
| // inf / nan | ||
| uint32_t exp32 = 0xFFu << 23; | ||
| uint32_t mant32 = mant << 13; | ||
| f = sign | exp32 | mant32; | ||
| } else { | ||
| uint32_t exp32 = (exp + (127 - 15)) << 23; | ||
| uint32_t mant32 = mant << 13; | ||
| f = sign | exp32 | mant32; | ||
| } | ||
|
|
||
| float out; | ||
| __builtin_memcpy(&out, &f, sizeof(out)); | ||
| return out; | ||
| } | ||
|
|
||
| #if defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64)) | ||
| #define MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x)) | ||
| #else | ||
| #define MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x)) | ||
| #endif |
There was a problem hiding this comment.
Fix subnormal handling: unsigned exponent underflows.
exp is uint32_t and is decremented in the subnormal normalization loop. For subnormals requiring multiple shifts, this underflows and yields incorrect FP32 results.
🔧 Proposed fix
static inline float mllm_fp16_bits_to_fp32(uint16_t h) {
uint32_t sign = (uint32_t)(h & 0x8000) << 16;
- uint32_t exp = (h >> 10) & 0x1F;
+ int32_t exp = (h >> 10) & 0x1F;
uint32_t mant = h & 0x03FF;
@@
- uint32_t exp32 = (exp + (127 - 15)) << 23;
+ uint32_t exp32 = (uint32_t)(exp + (127 - 15)) << 23;
uint32_t mant32 = mant << 13;
f = sign | exp32 | mant32;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // fp16 bits -> fp32 fallback (no F16C) | |
| static inline float mllm_fp16_bits_to_fp32(uint16_t h) { | |
| uint32_t sign = (uint32_t)(h & 0x8000) << 16; | |
| uint32_t exp = (h >> 10) & 0x1F; | |
| uint32_t mant = h & 0x03FF; | |
| uint32_t f; | |
| if (exp == 0) { | |
| if (mant == 0) { | |
| f = sign; // zero | |
| } else { | |
| // subnormal | |
| exp = 1; | |
| while ((mant & 0x0400) == 0) { mant <<= 1; exp--; } | |
| mant &= 0x03FF; | |
| uint32_t exp32 = (exp + (127 - 15)) << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } | |
| } else if (exp == 31) { | |
| // inf / nan | |
| uint32_t exp32 = 0xFFu << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } else { | |
| uint32_t exp32 = (exp + (127 - 15)) << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } | |
| float out; | |
| __builtin_memcpy(&out, &f, sizeof(out)); | |
| return out; | |
| } | |
| #if defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64)) | |
| #define MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x)) | |
| #else | |
| #define MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x)) | |
| #endif | |
| // fp16 bits -> fp32 fallback (no F16C) | |
| static inline float mllm_fp16_bits_to_fp32(uint16_t h) { | |
| uint32_t sign = (uint32_t)(h & 0x8000) << 16; | |
| int32_t exp = (h >> 10) & 0x1F; | |
| uint32_t mant = h & 0x03FF; | |
| uint32_t f; | |
| if (exp == 0) { | |
| if (mant == 0) { | |
| f = sign; // zero | |
| } else { | |
| // subnormal | |
| exp = 1; | |
| while ((mant & 0x0400) == 0) { mant <<= 1; exp--; } | |
| mant &= 0x03FF; | |
| uint32_t exp32 = (uint32_t)(exp + (127 - 15)) << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } | |
| } else if (exp == 31) { | |
| // inf / nan | |
| uint32_t exp32 = 0xFFu << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } else { | |
| uint32_t exp32 = (exp + (127 - 15)) << 23; | |
| uint32_t mant32 = mant << 13; | |
| f = sign | exp32 | mant32; | |
| } | |
| float out; | |
| __builtin_memcpy(&out, &f, sizeof(out)); | |
| return out; | |
| } | |
| `#if` defined(__F16C__) && (defined(__x86_64__) || defined(_M_X64)) | |
| `#define` MLLM_COMPUTE_FP16_TO_FP32(x) _cvtsh_ss((uint16_t)(x)) | |
| `#else` | |
| `#define` MLLM_COMPUTE_FP16_TO_FP32(x) mllm_fp16_bits_to_fp32((uint16_t)(x)) | |
| `#endif` |
🤖 Prompt for AI Agents
In `@mllm/backends/cpu/kernels/common/ggml/quantize/quantize.hpp` around lines 108
- 147, The subnormal handling in mllm_fp16_bits_to_fp32 uses an unsigned exp
(uint32_t) and decrements it in the normalization loop which can underflow;
change exp to a signed type (e.g., int32_t or int) before the while loop (and
use a separate unsigned mant as currently) so decrementing works correctly,
update subsequent uses (exp32 calculation and any casts) to use the adjusted
signed exp value when computing exp32, and ensure the normalization loop
condition and final exp-to-float conversion produce the correct FP32 exponent
for subnormals.
mllm/mllm.hpp
Outdated
| #if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE) | ||
| return; | ||
| #endif |
There was a problem hiding this comment.
Don’t skip signal handler registration when stacktrace is disabled.
Returning early on Android removes all signal handling, so SIGSEGV/SIGABRT won’t trigger the existing shutdown path. The macro name implies only stacktrace suppression, and you already gate stacktrace calls elsewhere, so handlers can remain installed.
🛠️ Proposed fix
inline void __setup_signal_handler() {
-#if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE)
- return;
-#endif
`#if` defined(__MLLM_SIGNAL_WINDOWS)
signal(SIGINT, __signal_handler);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #if defined(__MLLM_SIGNAL_ANDROID) && defined(MLLM_DISABLE_ANDROID_STACKTRACE) | |
| return; | |
| #endif | |
| inline void __setup_signal_handler() { | |
| `#if` defined(__MLLM_SIGNAL_WINDOWS) | |
| signal(SIGINT, __signal_handler); |
🤖 Prompt for AI Agents
In `@mllm/mllm.hpp` around lines 385 - 387, The early return inside the
conditional `#if` defined(__MLLM_SIGNAL_ANDROID) &&
defined(MLLM_DISABLE_ANDROID_STACKTRACE) removes signal handler installation on
Android; remove that return so the signal handlers (SIGSEGV/SIGABRT) are still
registered even when MLLM_DISABLE_ANDROID_STACKTRACE is defined, and rely on
existing guards around stacktrace generation instead of skipping handler setup;
ensure the conditional only suppresses stacktrace-related calls and not the
registration logic in the function that installs the handlers.
mllm/utils/Argparse.hpp
Outdated
| // Early exit on help: skip required checks | ||
| for (auto& param : inst.args_) { | ||
| const auto& flags = param->flags(); | ||
| const bool is_help = | ||
| std::find(flags.begin(), flags.end(), "-h") != flags.end() || | ||
| std::find(flags.begin(), flags.end(), "--help") != flags.end(); | ||
| if (is_help && param->isSet()) { | ||
| printHelp(); | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Help flag still errors when required positional args are missing
The early-exit check runs after positional required validation (Line 184-193), so -h/--help still fails if required positionals are absent. Move the help check before positional validation (or bypass those checks when help is set).
✅ Suggested fix (reorder help check before positional validation)
- // Then parse positional arguments
- size_t pos_idx = 0;
- for (auto& param : inst.args_) {
- if (param->isPositional()) {
- if (pos_idx >= positional_args.size()) {
- if (param->isRequired()) {
- printHelp();
- MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing positional argument");
- }
- continue;
- }
- param->parse(positional_args[pos_idx++]);
- }
- }
-
- // Early exit on help: skip required checks
- for (auto& param : inst.args_) {
+ // Early exit on help: skip required checks (including positional)
+ bool help_requested = false;
+ for (auto& param : inst.args_) {
const auto& flags = param->flags();
const bool is_help =
std::find(flags.begin(), flags.end(), "-h") != flags.end() ||
std::find(flags.begin(), flags.end(), "--help") != flags.end();
if (is_help && param->isSet()) {
- printHelp();
- return;
+ help_requested = true;
+ break;
}
}
+ if (help_requested) {
+ printHelp();
+ return;
+ }
+
+ // Then parse positional arguments
+ size_t pos_idx = 0;
+ for (auto& param : inst.args_) {
+ if (param->isPositional()) {
+ if (pos_idx >= positional_args.size()) {
+ if (param->isRequired()) {
+ printHelp();
+ MLLM_ERROR_EXIT(ExitCode::kCoreError, "Missing positional argument");
+ }
+ continue;
+ }
+ param->parse(positional_args[pos_idx++]);
+ }
+ }🤖 Prompt for AI Agents
In `@mllm/utils/Argparse.hpp` around lines 199 - 209, The help-flag early exit
(loop over inst.args_, checking param->flags() for "-h"/"--help" and
param->isSet(), then calling printHelp() and return) occurs after required
positional validation so -h/--help still errors; move that help check to run
before the positional required checks (or short-circuit those checks when a help
flag is set) by relocating the loop that inspects param->flags() and
param->isSet()—and calls printHelp()—to execute prior to the positional
validation block (or add a guard that skips positional validation when any
param->isSet() for flags containing "-h" or "--help").
tasks/build_android_nostack.yaml
Outdated
| cmake_extra_args: | ||
| - "-DMLLM_CROSS_COMPILE=ON" | ||
| - "-DMLLM_BUILD_ARM_BACKEND=ON" | ||
| - "-DANDROID_PLATFORM=android-28" | ||
| - "-DANDROID_ABI=arm64-v8a" | ||
| - '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"' | ||
| - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a" | ||
| - "-DMLLM_KERNEL_USE_THREADS=ON" | ||
| - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON" | ||
| - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF" | ||
|
|
There was a problem hiding this comment.
Add stacktrace-disable flags to match the “nostack” intent.
This variant doesn’t set MLLM_DISABLE_ANDROID_STACKTRACE, so it still builds with stacktrace enabled. If “nostack” means stacktrace off (as the no-OMP nostack config does), please add the C/CXX flags here (or rename the variant if stacktrace is intended).
🔧 Proposed fix
cmake_extra_args:
- "-DMLLM_CROSS_COMPILE=ON"
- "-DMLLM_BUILD_ARM_BACKEND=ON"
- "-DANDROID_PLATFORM=android-28"
- "-DANDROID_ABI=arm64-v8a"
- '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"'
- "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a"
- "-DMLLM_KERNEL_USE_THREADS=ON"
- "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON"
- "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF"
+ - '-DCMAKE_C_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'
+ - '-DCMAKE_CXX_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cmake_extra_args: | |
| - "-DMLLM_CROSS_COMPILE=ON" | |
| - "-DMLLM_BUILD_ARM_BACKEND=ON" | |
| - "-DANDROID_PLATFORM=android-28" | |
| - "-DANDROID_ABI=arm64-v8a" | |
| - '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"' | |
| - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a" | |
| - "-DMLLM_KERNEL_USE_THREADS=ON" | |
| - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON" | |
| - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF" | |
| cmake_extra_args: | |
| - "-DMLLM_CROSS_COMPILE=ON" | |
| - "-DMLLM_BUILD_ARM_BACKEND=ON" | |
| - "-DANDROID_PLATFORM=android-28" | |
| - "-DANDROID_ABI=arm64-v8a" | |
| - '-DMLLM_CPU_BACKEND_COMPILE_OPTIONS="-march=armv8.2-a+fp16+fp16fml+dotprod+i8mm;-ffast-math;-Wno-nan-infinity-disabled"' | |
| - "-DCMAKE_INSTALL_PREFIX=/root/mllm-install-android-arm64-v8a" | |
| - "-DMLLM_KERNEL_USE_THREADS=ON" | |
| - "-DMLLM_KERNEL_THREADS_VENDOR_OPENMP=ON" | |
| - "-DMLLM_KERNEL_USE_THREADS_VENDOR_MLLM=OFF" | |
| - '-DCMAKE_C_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1' | |
| - '-DCMAKE_CXX_FLAGS=-DMLLM_DISABLE_ANDROID_STACKTRACE=1' |
🤖 Prompt for AI Agents
In `@tasks/build_android_nostack.yaml` around lines 6 - 16, The nostack variant
currently doesn't disable Android stacktraces; update the cmake_extra_args block
(the cmake_extra_args list in this file) to explicitly disable stacktrace by
adding the flag "-DMLLM_DISABLE_ANDROID_STACKTRACE=ON" (or, if you prefer C/CXX
define style, add it to the quoted CPU_BACKEND_COMPILE_OPTIONS), so the build
matches the “nostack” intent; alternatively rename the variant if you intend to
keep stacktrace enabled.
tools/bench/run_suite.sh
Outdated
| mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE") | ||
|
|
||
| echo "Prompts: ${#PROMPTS[@]}, warmup=$WARMUP, repeat=$REPEAT" | ||
| mkdir -p "$OUT_DIR" | ||
|
|
||
| # warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查) | ||
| for ((i=1;i<=WARMUP;i++)); do | ||
| echo "== Warmup $i/$WARMUP ==" | ||
| tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" | ||
| done |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n tools/bench/run_suite.shRepository: UbiquitousLearning/mllm
Length of output: 1324
🏁 Script executed:
#!/bin/bash
# Test grep patterns for blank line filtering
echo "=== Testing grep patterns ==="
# Create test file with blank lines and whitespace
cat > /tmp/test_prompts.txt << 'EOF'
prompt1
prompt2
prompt3
EOF
echo "--- Original pattern (grep -v '^\s*$') ---"
grep -v '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 1 failed - exit code: $?]"
echo ""
echo "--- With -P flag (grep -vP '^\s*$') ---"
grep -vP '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 2 failed - exit code: $?]"
echo ""
echo "--- POSIX character class (grep -v '^[[:space:]]*$') ---"
grep -v '^[[:space:]]*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 3 failed - exit code: $?]"
echo ""
echo "--- With -E flag (grep -vE '^\s*$') ---"
grep -vE '^\s*$' /tmp/test_prompts.txt 2>&1 || echo "[Pattern 4 failed - exit code: $?]"Repository: UbiquitousLearning/mllm
Length of output: 375
🏁 Script executed:
# Test empty array behavior in bash
bash << 'BASH_EOF'
echo "=== Testing array bounds behavior ==="
# Simulate empty PROMPTS array
declare -a PROMPTS=()
echo "Empty array length: ${`#PROMPTS`[@]}"
# Try accessing [0]
if [[ -z "${PROMPTS[0]}" ]]; then
echo "PROMPTS[0] is empty/unset"
fi
# Test with set -e behavior
bash -c '
set -e
declare -a PROMPTS=()
echo "Array size: ${`#PROMPTS`[@]}"
if [[ ${`#PROMPTS`[@]} -gt 0 ]]; then
echo "Running with: ${PROMPTS[0]}"
else
echo "Array is empty - accessing [0] would be unsafe"
fi
'
BASH_EOFRepository: UbiquitousLearning/mllm
Length of output: 213
Fix unsafe array access and improve grep portability.
The script doesn't validate that the prompts array is non-empty before accessing PROMPTS[0] on line 29. If the prompts file is blank or contains only whitespace, grep -v '^\s*$' will filter all content, leaving an empty array; subsequent access to ${PROMPTS[0]} will pass an empty string to run_once.sh, likely causing it to fail. While the current grep pattern works in practice, using [[:space:]] improves portability across grep versions. Add an explicit empty-array check before the warmup loop.
Suggested fix
-mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE")
+mapfile -t PROMPTS < <(grep -v '^[[:space:]]*$' "$PROMPTS_FILE")
+
+if (( ${`#PROMPTS`[@]} == 0 )); then
+ echo "ERROR: prompts file is empty or contains only whitespace: $PROMPTS_FILE"
+ exit 1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mapfile -t PROMPTS < <(grep -v '^\s*$' "$PROMPTS_FILE") | |
| echo "Prompts: ${#PROMPTS[@]}, warmup=$WARMUP, repeat=$REPEAT" | |
| mkdir -p "$OUT_DIR" | |
| # warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查) | |
| for ((i=1;i<=WARMUP;i++)); do | |
| echo "== Warmup $i/$WARMUP ==" | |
| tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" | |
| done | |
| mapfile -t PROMPTS < <(grep -v '^[[:space:]]*$' "$PROMPTS_FILE") | |
| if (( ${`#PROMPTS`[@]} == 0 )); then | |
| echo "ERROR: prompts file is empty or contains only whitespace: $PROMPTS_FILE" | |
| exit 1 | |
| fi | |
| echo "Prompts: ${`#PROMPTS`[@]}, warmup=$WARMUP, repeat=$REPEAT" | |
| mkdir -p "$OUT_DIR" | |
| # warmup:只用第一条 prompt 跑 WARMUP 次,不计入统计(但仍落盘,方便排查) | |
| for ((i=1;i<=WARMUP;i++)); do | |
| echo "== Warmup $i/$WARMUP ==" | |
| tools/bench/run_once.sh "${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" | |
| done |
🤖 Prompt for AI Agents
In `@tools/bench/run_suite.sh` around lines 21 - 30, The warmup loop assumes
PROMPTS has at least one entry and uses a grep pattern that isn't fully
portable; update the mapfile invocation to use grep -v '^[[:space:]]*$' to
handle all whitespace portably when populating PROMPTS from PROMPTS_FILE, then
add an explicit check after mapfile (e.g., test "${`#PROMPTS`[@]}" -gt 0) and if
the array is empty either exit with a clear error or skip the warmup and main
run, and ensure the warmup loop that calls tools/bench/run_once.sh
"${PROMPTS[0]}" "$OUT_DIR" "${TAG}_warmup" only runs when PROMPTS is non-empty.
b492e70 to
f5c0006
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tools/mllm-llm-benchmark/models/Llama.hpp`:
- Line 48: The mllm::print call in Llama.hpp divides cfg_->hidden_size by
cfg_->num_attention_heads without checking for zero; change it to compute
head_dim only after checking cfg_->num_attention_heads (e.g., if
cfg_->num_attention_heads == 0 set head_dim to 0 or print "N/A"), then pass that
safe value to mllm::print so cfg_->hidden_size and cfg_->num_attention_heads are
referenced but never used in an actual divide-by-zero operation.
🧹 Nitpick comments (1)
tools/mllm-llm-benchmark/models/All.hpp (1)
28-32: Redundant substring checks.Since both
"tinyllama"and"tiny_llama"contain the substring"llama", the first conditionfind("llama") != nposwill already match them. The additional checks are redundant.♻️ Simplified version
- if (normalized_model_name.find("llama") != std::string::npos || - normalized_model_name.find("tinyllama") != std::string::npos || - normalized_model_name.find("tiny_llama") != std::string::npos) { + if (normalized_model_name.find("llama") != std::string::npos) { return std::make_shared<Llama_Benchmark>(); }
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai full review I pushed updates in commit d95911a. Please re-review the entire PR from scratch and regenerate the review comments based on the latest changes. |
|
✅ Actions performedFull review triggered. |
d95911a to
1cb7439
Compare
|
@coderabbitai ignore pre-merge checks |
✅ Actions performedPre-merge checks override command executed successfully. |
|
@coderabbitai run pre-merge checks |
|
Important Pre-merge checks are overridden. No further action needed. |
|
@coderabbitai full review |
|
Important Pre-merge checks are overridden. No further action needed. |
|
@chenghuaWang @yirongjie @oreomaker CI checks are still "Checks 0" for this fork PR. Could a maintainer please approve workflows so CI can run? Thanks! |
|
Hello, @huangzhenhua111 Thanks for your contribution! CodeRabbit is not related to C++ CI. The CI does not run on the |
Sorry, I accidentally closed the PR earlier — it’s reopened now. Platform details are in my next comment. |
Hi Chenghua, Thanks for taking a look! I verified this benchmark on x86_64 Linux under WSL (Ubuntu) on CPU (no GPU/NPU). I built it with CMake + Ninja and ran build/bin/mllm-llm-benchmark using the TinyLlama FP32 .mllm (ModelFileVersion V1) and examples/llama/config_tiny_llama.json, e.g. PP=8, TG=4. Understood on CodeRabbit/CI scope (tools/ is not covered by CI). I’ll continue to keep the changes isolated to tools and make sure it compiles cleanly. Best, |
|
@jialilve, can you take a look at this PR? |
What
Llama_Benchmarktemplate formllm-llm-benchmark.llama/tinyllamainmodels/All.hppso-n tiny_llamaworks.Why
mllm-llm-benchmarkregistry only supportsQwen3_W4A32_KAI. This enables running the benchmark with the existing llama example config and TinyLlama V1 weights.Notes
.mllmasModelFileVersion::kV1to match TinyLlama example weights (loading as V2 asserts on magic mismatch). V2 support can be added in a follow-up PR (CLI flag or file version probe).Scope (change scope):
Llama.hppand registers it in the benchmark registry; no changes to the core runtime.Compatibility:
-n llama/-n tinyllamais specified.How to test
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.