Note: SVFA now provides both Bash and Python versions of these scripts. Python versions offer enhanced maintainability, better error handling, and cross-platform compatibility. See PYTHON_SCRIPTS.md for details.
| Feature | Bash Scripts | Python Scripts |
|---|---|---|
| Files | run-securibench-tests.shcompute-securibench-metrics.sh |
run_securibench_tests.pycompute_securibench_metrics.py |
| Dependencies | Bash, SBT | Python 3.6+, SBT |
| Maintainability | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Features | Basic | Enhanced (colors, verbose, better errors) |
| Cross-Platform | Unix/Linux/macOS | Windows/macOS/Linux |
./scripts/run-securibench-tests.sh [suite] [callgraph] [clean|--help]
Purpose: Run SVFA analysis on specified test suite(s) and save results to disk (no metrics computation).
What it does:
- Executes specific test suite or all 12 Securibench test suites (Inter, Basic, Aliasing, Arrays, Collections, Datastructures, Factories, Pred, Reflection, Sanitizers, Session, StrongUpdates)
- Saves results as JSON files in
target/test-results/ - Shows execution summary and timing
- Does NOT compute accuracy metrics
Get help: Run ./scripts/run-securibench-tests.sh --help for detailed usage information.
Usage:
# Execute all tests with default SPARK call graph
./scripts/run-securibench-tests.sh
./scripts/run-securibench-tests.sh all
# Execute specific test suite with SPARK call graph
./scripts/run-securibench-tests.sh inter
./scripts/run-securibench-tests.sh basic
./scripts/run-securibench-tests.sh session
# ... (all 12 suites supported)
# Execute with different call graph algorithms
./scripts/run-securibench-tests.sh inter cha # CHA call graph
./scripts/run-securibench-tests.sh basic spark_library # SPARK_LIBRARY call graph
./scripts/run-securibench-tests.sh inter rta # RTA call graph
./scripts/run-securibench-tests.sh basic vta # VTA call graph
./scripts/run-securibench-tests.sh all cha # All suites with CHA
# Clean previous data and execute all tests
./scripts/run-securibench-tests.sh cleanCall Graph Algorithms:
spark(default): SPARK points-to analysis - most precise, slowercha: Class Hierarchy Analysis - fastest, least precisespark_library: SPARK with library support - comprehensive coveragerta: Rapid Type Analysis via SPARK - fast, moderately precisevta: Variable Type Analysis via SPARK - balanced speed/precision
Output:
=== EXECUTING ALL SECURIBENCH TESTS ===
🚀 Starting test execution for all suites...
🔄 Executing Inter tests (securibench.micro.inter)...
=== PHASE 1: EXECUTING TESTS FOR securibench.micro.inter ===
Executing: Inter1
Inter1: 1/1 conflicts - ✅ PASS (204ms)
Executing: Inter2
Inter2: 2/2 conflicts - ✅ PASS (414ms)
...
=== EXECUTION COMPLETE: 14 tests executed ===
Results: 9 passed, 5 failed
📊 EXECUTION SUMMARY:
Total tests: 14
Passed: 9
Failed: 5
Success rate: 64%
ℹ️ Note: SBT 'success' indicates technical execution completed.
Individual test results show SVFA analysis accuracy.
✅ Inter test execution completed (technical success)
📊 14 tests executed in 12s
ℹ️ Individual test results show SVFA analysis accuracy
🏁 ALL TEST EXECUTION COMPLETED
✅ All test suites executed successfully!
📊 Total: 56 tests executed in 33s
./scripts/compute-securibench-metrics.sh [suite] [callgraph] [clean|--help]
Purpose: Compute accuracy metrics for Securibench test suites with automatic test execution.
Default behavior: Processes all suites and creates CSV + summary reports.
Get help: Run ./scripts/compute-securibench-metrics.sh --help for detailed usage information.
What it does:
- Auto-executes missing tests (no need to run tests separately!)
- Loads saved JSON test results
- Computes TP, FP, FN, Precision, Recall, F-score
- Creates timestamped CSV file with detailed metrics
- Creates summary report with overall statistics
- Displays summary on console
Usage:
# Process all suites with default SPARK call graph (default)
./scripts/compute-securibench-metrics.sh
./scripts/compute-securibench-metrics.sh all
# Process specific suite with SPARK call graph
./scripts/compute-securibench-metrics.sh inter
./scripts/compute-securibench-metrics.sh basic
./scripts/compute-securibench-metrics.sh session
./scripts/compute-securibench-metrics.sh aliasing
# ... (all 12 suites supported)
# Process with different call graph algorithms
./scripts/compute-securibench-metrics.sh inter cha # CHA call graph
./scripts/compute-securibench-metrics.sh basic spark_library # SPARK_LIBRARY call graph
./scripts/compute-securibench-metrics.sh inter rta # RTA call graph
./scripts/compute-securibench-metrics.sh basic vta # VTA call graph
./scripts/compute-securibench-metrics.sh all cha # All suites with CHA
# Clean all previous test data and metrics
./scripts/compute-securibench-metrics.sh cleanOutput Files:
target/metrics/securibench_metrics_[callgraph]_YYYYMMDD_HHMMSS.csv- Detailed CSV datatarget/metrics/securibench_summary_[callgraph]_YYYYMMDD_HHMMSS.txt- Summary report
CSV Format:
Suite,Test,Found,Expected,Status,TP,FP,FN,Precision,Recall,F-score,Execution_Time_ms
inter,Inter1,1,1,PASS,1,0,0,1.000,1.000,1.000,196
inter,Inter2,2,2,PASS,2,0,0,1.000,1.000,1.000,303
basic,Basic1,1,1,PASS,1,0,0,1.000,1.000,1.000,203
..../scripts/compute-securibench-metrics.shAuto-executes missing tests and generates metrics - that's it!
# Step 1: Execute all tests at once (optional)
./scripts/run-securibench-tests.sh
# Step 2: Compute metrics (fast, uses cached results)
./scripts/compute-securibench-metrics.sh- Open the CSV file in Excel, Google Sheets, or analysis tools
- Use the summary report for quick overview
- Re-run metrics computation anytime (uses cached results when available)
📊 METRICS SUMMARY:
--- Inter Test Suite ---
Tests: 14 total, 9 passed, 5 failed
Success Rate: 64.2%
Vulnerabilities: 12 found, 18 expected
Metrics: TP=12, FP=0, FN=6
Overall Precision: 1.000
Overall Recall: .666
--- Basic Test Suite ---
Tests: 42 total, 38 passed, 4 failed
Success Rate: 90.4%
Vulnerabilities: 60 found, 60 expected
Metrics: TP=58, FP=2, FN=2
Overall Precision: .966
Overall Recall: .966
target/
├── test-results/securibench/micro/
│ ├── inter/
│ │ ├── Inter1.json
│ │ ├── Inter2.json
│ │ └── ...
│ └── basic/
│ ├── Basic1.json
│ ├── Basic2.json
│ └── ...
└── metrics/
├── securibench_metrics_20251215_214119.csv
└── securibench_summary_20251215_214119.txt
Both scripts include comprehensive help:
# Detailed help for metrics computation
./scripts/compute-securibench-metrics.sh --help
./scripts/compute-securibench-metrics.sh -h
# Detailed help for test execution
./scripts/run-securibench-tests.sh --help
./scripts/run-securibench-tests.sh -hHelp includes:
- Complete usage instructions
- All available options and test suites
- Performance expectations
- Output file locations
- Practical examples
Remove all previous test results and metrics for a fresh start:
# Clean using metrics script
./scripts/compute-securibench-metrics.sh clean
# Clean and execute all tests
./scripts/run-securibench-tests.sh cleanWhat gets cleaned:
- All JSON test result files (
target/test-results/) - All CSV and summary reports (
target/metrics/) - Temporary log files (
/tmp/executor_*.log,/tmp/metrics_*.log)
When to use clean:
- Before important analysis runs
- When debugging test issues
- To free up disk space
- To ensure completely fresh results
- Create executor and metrics classes (see existing Basic/Inter examples)
- Add suite to both scripts'
SUITE_KEYSandSUITE_NAMESarrays - Scripts will automatically include the new suite
- Use the CSV file for custom analysis in R, Python, Excel
- Filter by suite, test name, or status
- Create visualizations and trend analysis
- Compare results across different SVFA configurations
# In CI pipeline
./scripts/run-securibench-tests.sh
if [ $? -eq 0 ]; then
./scripts/compute-securibench-metrics.sh
# Upload CSV to artifact storage
fiWhen running Securibench tests, you'll see two different success indicators:
✅ Aliasing test execution completed (technical success)
[info] All tests passed.
Meaning: SBT successfully executed all test cases without crashes or technical errors.
Executing: Aliasing1
Aliasing1: 1/1 conflicts - ✅ PASS (269ms)
Executing: Aliasing2
Aliasing2: 0/1 conflicts - ❌ FAIL (301ms)
Meaning: Whether SVFA found the expected number of vulnerabilities in each test case.
- Technical Success: Ensures the testing infrastructure works correctly
- Analysis Results: Shows how well SVFA detects vulnerabilities
- A suite can be "technically successful" even if many analysis tests fail
📊 EXECUTION SUMMARY:
Total tests: 6
Passed: 2 ← SVFA analysis accuracy
Failed: 4 ← SVFA analysis accuracy
Success rate: 33% ← SVFA analysis accuracy
This approach provides maximum flexibility for SVFA analysis and metrics computation! 🎯