Skip to content

Commit f032a0a

Browse files
authored
Merge pull request #18 from SiyuanSun0736/perf_pr
Add `@perf_event` program type with full attach/detach/count support
2 parents 82279e5 + 0f41619 commit f032a0a

24 files changed

Lines changed: 1931 additions & 315 deletions

BUILTINS.md

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -83,21 +83,27 @@ fn main() -> i32 {
8383

8484
---
8585

86-
#### `attach(handle, target, flags)`
86+
#### `attach(handle, target, flags)` / `attach(handle, opts, flags)`
8787
**Signature:** `attach(handle: ProgramHandle, target: str(128), flags: u32) -> u32`
88+
**Signature:** `attach(handle: ProgramHandle, opts: perf_options, flags: u32) -> PerfAttachment`
8889
**Variadic:** No
8990
**Context:** Userspace only
9091

91-
**Description:** Attach a loaded eBPF program to a target interface or attachment point.
92+
**Description:** Attach a loaded eBPF program to a target interface or attachment point, or to a perf event counter described by `perf_options`. Both forms take three arguments, keeping a uniform call shape across all program types.
9293

9394
**Parameters:**
94-
- `handle`: Program handle returned from `load()`
95-
- `target`: Target interface name (e.g., "eth0", "lo") or attachment point
96-
- `flags`: Attachment flags (context-dependent)
95+
- Standard form:
96+
- `handle`: Program handle returned from `load()`
97+
- `target`: Target interface name (e.g., "eth0", "lo") or attachment point
98+
- `flags`: Attachment flags (context-dependent)
99+
- Perf event form:
100+
- `handle`: Program handle returned from `load()`
101+
- `opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults
102+
- `flags`: Reserved (pass `0`)
97103

98104
**Return Value:**
99-
- Returns `0` on success
100-
- Returns error code on failure
105+
- Standard form returns `0` on success and an error code on failure
106+
- Perf event form returns a `PerfAttachment` value with the open counter/link identity and an internal stale-handle token
101107

102108
**Examples:**
103109
```kernelscript
@@ -106,24 +112,33 @@ var result = attach(prog, "eth0", 0)
106112
if (result != 0) {
107113
print("Failed to attach program")
108114
}
115+
116+
// Minimal perf attach — all non-perf_type/perf_config fields use defaults:
117+
// pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1, flags=false
118+
var perf_prog = load(on_branch_miss)
119+
var perf_att = attach(perf_prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
120+
var count = read(perf_att)
121+
detach(perf_att)
122+
detach(perf_prog)
109123
```
110124

111125
**Context-specific implementations:**
112126
- **eBPF:** Not available
113-
- **Userspace:** Uses `bpf_prog_attach` system call
127+
- **Userspace:** Uses `attach_bpf_program_by_fd` for standard targets and `ks_attach_perf_event` for perf events
114128
- **Kernel Module:** Not available
115129

116130
---
117131

118132
#### `detach(handle)`
119133
**Signature:** `detach(handle: ProgramHandle) -> void`
134+
**Signature:** `detach(handle: PerfAttachment) -> void`
120135
**Variadic:** No
121136
**Context:** Userspace only
122137

123-
**Description:** Detach a loaded eBPF program from its current attachment point.
138+
**Description:** Detach a loaded eBPF program from its current attachment point, or tear down one perf attachment.
124139

125140
**Parameters:**
126-
- `handle`: Program handle returned from `load()`
141+
- `handle`: Program handle returned from `load()`, or a `PerfAttachment` returned from perf `attach()`
127142

128143
**Return Value:**
129144
- No return value (void)
@@ -138,11 +153,28 @@ detach(prog) // Clean up
138153

139154
**Context-specific implementations:**
140155
- **eBPF:** Not available
141-
- **Userspace:** Uses `detach_bpf_program_by_fd` function
156+
- **Userspace:** Uses `detach_bpf_program_by_fd` for program handles and `ks_detach_perf_attachment` for perf attachments
142157
- **Kernel Module:** Not available
143158

144159
---
145160

161+
#### `read(handle)`
162+
**Signature:** `read(handle: PerfAttachment) -> i64`
163+
**Variadic:** No
164+
**Context:** Userspace only
165+
166+
**Description:** Read the current hardware/software counter value from a perf attachment.
167+
168+
**Parameters:**
169+
- `handle`: Perf attachment returned from `attach(handle, perf_options, flags)`
170+
171+
**Return Value:**
172+
- Returns the raw 64-bit counter value on success
173+
- Returns `-1` on invalid/stale attachment or read failure
174+
- Reads use the attachment's `perf_fd` directly; the internal token detects copied handles used after detach.
175+
176+
---
177+
146178
### 3. Struct Operations (struct_ops)
147179

148180
#### `register(impl_instance)`
@@ -340,7 +372,7 @@ fn main() -> i32 {
340372
|----------|------|-----------|---------------|-------|
341373
| `print()` |||| Different output destinations |
342374
| `load()` |||| Program management only |
343-
| `attach()` |||| Program management only |
375+
| `attach()` |||| Standard attach and perf_options attach |
344376
| `detach()` |||| Program management only |
345377
| `register()` |||| struct_ops registration |
346378
| `test()` |||| Testing framework only |
@@ -393,4 +425,4 @@ if (result != 0) {
393425
## See Also
394426

395427
- **SPEC.md**: Language specification and features
396-
- **examples/**: Example programs demonstrating builtin function usage
428+
- **examples/**: Example programs demonstrating builtin function usage

README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,13 @@ fn traffic_shaper(ctx: *__sk_buff) -> i32 {
121121
// Trace system call entry
122122
return 0
123123
}
124+
125+
// Perf event program for hardware counter sampling
126+
@perf_event
127+
fn on_branch_miss(ctx: *bpf_perf_event_data) -> i32 {
128+
// Runs on every hardware branch-miss event
129+
return 0
130+
}
124131
```
125132

126133
### Type System
@@ -285,6 +292,59 @@ fn main() -> i32 {
285292
}
286293
```
287294

295+
### Hardware Performance Counter Programs
296+
297+
Use `@perf_event` to attach eBPF programs to hardware or software performance counters. `perf_options` keeps the kernel's tagged `perf_type + perf_config` model, so adding new perf event families does not require flattening everything into one enum. Only `perf_type` and `perf_config` are required; all other fields have sensible defaults. Perf attaches return a first-class attachment value, so if you need the current count in userspace, call `read(att)`:
298+
299+
```kernelscript
300+
// eBPF program fires on every hardware branch-miss sample
301+
@perf_event
302+
fn on_branch_miss(ctx: *bpf_perf_event_data) -> i32 {
303+
return 0
304+
}
305+
306+
fn main() -> i32 {
307+
var prog = load(on_branch_miss)
308+
309+
// Minimal form — defaults: pid=-1 (all procs), cpu=0,
310+
// period=1_000_000, wakeup=1, all flags=false
311+
var att = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
312+
var count = read(att)
313+
print("branch misses: %lld", count)
314+
315+
detach(att) // disables counter, destroys BPF link, closes fd
316+
detach(prog) // safe cleanup for the loaded program handle
317+
return 0
318+
}
319+
```
320+
321+
**Available `perf_type` values:**
322+
323+
| Enum value | Hardware/software event |
324+
|---|---|
325+
| `perf_type_hardware` | `PERF_TYPE_HARDWARE` |
326+
| `perf_type_software` | `PERF_TYPE_SOFTWARE` |
327+
| `perf_type_tracepoint` | `PERF_TYPE_TRACEPOINT` |
328+
| `perf_type_hw_cache` | `PERF_TYPE_HW_CACHE` |
329+
| `perf_type_raw` | `PERF_TYPE_RAW` |
330+
| `perf_type_breakpoint` | `PERF_TYPE_BREAKPOINT` |
331+
332+
**Common `perf_config` constants:**
333+
334+
| Constant | Intended `perf_type` | Linux config |
335+
|---|---|---|
336+
| `cpu_cycles` | `perf_type_hardware` | `PERF_COUNT_HW_CPU_CYCLES` |
337+
| `instructions` | `perf_type_hardware` | `PERF_COUNT_HW_INSTRUCTIONS` |
338+
| `cache_references` | `perf_type_hardware` | `PERF_COUNT_HW_CACHE_REFERENCES` |
339+
| `cache_misses` | `perf_type_hardware` | `PERF_COUNT_HW_CACHE_MISSES` |
340+
| `branch_instructions` | `perf_type_hardware` | `PERF_COUNT_HW_BRANCH_INSTRUCTIONS` |
341+
| `branch_misses` | `perf_type_hardware` | `PERF_COUNT_HW_BRANCH_MISSES` |
342+
| `page_faults` | `perf_type_software` | `PERF_COUNT_SW_PAGE_FAULTS` |
343+
| `context_switches` | `perf_type_software` | `PERF_COUNT_SW_CONTEXT_SWITCHES` |
344+
| `cpu_migrations` | `perf_type_software` | `PERF_COUNT_SW_CPU_MIGRATIONS` |
345+
346+
For newer families such as `perf_type_hw_cache`, pass the kernel-compatible encoded `perf_config` value directly.
347+
288348
📖 **For detailed language specification, syntax reference, and advanced features, please read [`SPEC.md`](SPEC.md).**
289349

290350
🔧 **For complete builtin functions reference, see [`BUILTINS.md`](BUILTINS.md).**
@@ -328,6 +388,7 @@ my_project/
328388
- `tc` - Traffic control programs
329389
- `probe` - Kernel function probing
330390
- `tracepoint` - Kernel tracepoint programs
391+
- `perf_event` - Hardware/software performance counter programs
331392

332393
**Available struct_ops:**
333394
- `tcp_congestion_ops` - TCP congestion control

SPEC.md

Lines changed: 129 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ var flows : hash<IpAddress, PacketStats>(1024)
3535
KernelScript uses a simple and clear scoping model that eliminates ambiguity:
3636

3737
- **`@helper` functions**: Kernel-shared functions - accessible by all eBPF programs, compile to eBPF bytecode
38-
- **Attributed functions** (e.g., `@xdp`, `@tc`, `@tracepoint`): eBPF program entry points - compile to eBPF bytecode
38+
- **Attributed functions** (e.g., `@xdp`, `@tc`, `@tracepoint`, `@perf_event`): eBPF program entry points - compile to eBPF bytecode
3939
- **Regular functions**: User space - functions and data structures compile to native executable
4040
- **Maps and global configs**: Shared resources accessible from both kernel and user space
4141
- **No wrapper syntax**: Direct, flat structure without unnecessary nesting
@@ -440,6 +440,134 @@ kernelscript init tracepoint/syscalls/sys_enter_read my_syscall_tracer
440440
# appropriate KernelScript templates with correct context types
441441
```
442442

443+
#### 3.1.3 Perf Event Programs
444+
445+
`@perf_event` programs attach eBPF logic to hardware or software performance counters via `perf_event_open(2)`. The eBPF function is invoked for every counter sample; the userspace side controls which counter to monitor through a `perf_options` struct literal passed to the standard 3-argument `attach()`.
446+
447+
**Syntax:**
448+
```kernelscript
449+
@perf_event
450+
fn <handler_name>(ctx: *bpf_perf_event_data) -> i32 {
451+
// runs on every sample
452+
return 0
453+
}
454+
```
455+
456+
The context type is always `*bpf_perf_event_data` (from `vmlinux.h`).
457+
458+
**Userspace lifecycle:**
459+
```kernelscript
460+
fn main() -> i32 {
461+
var prog = load(my_handler)
462+
463+
// Only perf_type + perf_config are required; all other fields use language-level defaults:
464+
// pid=-1, cpu=0, period=1_000_000, wakeup=1, inherit/exclude_*=false
465+
var misses = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
466+
467+
// Override specific fields as needed:
468+
var cache = attach(prog, perf_options {
469+
perf_type: perf_type_hardware,
470+
perf_config: cache_misses,
471+
cpu: 2,
472+
period: 500000,
473+
exclude_kernel: true,
474+
}, 0)
475+
476+
print("misses=%lld cache=%lld", read(misses), read(cache))
477+
478+
detach(cache) // IOC_DISABLE → bpf_link__destroy → close(perf_fd)
479+
detach(misses)
480+
detach(prog)
481+
return 0
482+
}
483+
```
484+
485+
**`perf_options` fields and defaults:**
486+
487+
| Field | Type | Default | Description |
488+
|---|---|---|---|
489+
| `perf_type` | `perf_type` | *(required)* | `perf_event_attr.type` tag |
490+
| `perf_config` | `u64` | *(required)* | `perf_event_attr.config` value for that type |
491+
| `pid` | `i32` | `-1` | -1 = all processes; ≥0 = specific PID |
492+
| `cpu` | `i32` | `0` | ≥0 = specific CPU; -1 = any CPU (pid must be ≥0) |
493+
| `period` | `u64` | `1000000` | Sample after this many events |
494+
| `wakeup` | `u32` | `1` | Wake userspace after N samples |
495+
| `inherit` | `bool` | `false` | Inherit to forked children |
496+
| `exclude_kernel` | `bool` | `false` | Exclude kernel-mode samples |
497+
| `exclude_user` | `bool` | `false` | Exclude user-mode samples |
498+
499+
**`pid` / `cpu` rules enforced at runtime:**
500+
501+
| `pid` | `cpu` | Meaning |
502+
|---|---|---|
503+
| ≥ 0 | ≥ 0 | Specific process on specific CPU |
504+
| ≥ 0 | -1 | Specific process on any CPU |
505+
| -1 | ≥ 0 | All processes on specific CPU (system-wide) |
506+
| -1 | -1 | **Invalid** — rejected with error |
507+
508+
**`perf_type` enum:**
509+
510+
| Value | Linux constant |
511+
|---|---|
512+
| `perf_type_hardware` | `PERF_TYPE_HARDWARE` |
513+
| `perf_type_software` | `PERF_TYPE_SOFTWARE` |
514+
| `perf_type_tracepoint` | `PERF_TYPE_TRACEPOINT` |
515+
| `perf_type_hw_cache` | `PERF_TYPE_HW_CACHE` |
516+
| `perf_type_raw` | `PERF_TYPE_RAW` |
517+
| `perf_type_breakpoint` | `PERF_TYPE_BREAKPOINT` |
518+
519+
**Common `perf_config` constants:**
520+
521+
| Value | Intended `perf_type` | Linux constant |
522+
|---|---|---|
523+
| `cpu_cycles` | `perf_type_hardware` | `PERF_COUNT_HW_CPU_CYCLES` |
524+
| `instructions` | `perf_type_hardware` | `PERF_COUNT_HW_INSTRUCTIONS` |
525+
| `cache_references` | `perf_type_hardware` | `PERF_COUNT_HW_CACHE_REFERENCES` |
526+
| `cache_misses` | `perf_type_hardware` | `PERF_COUNT_HW_CACHE_MISSES` |
527+
| `branch_instructions` | `perf_type_hardware` | `PERF_COUNT_HW_BRANCH_INSTRUCTIONS` |
528+
| `branch_misses` | `perf_type_hardware` | `PERF_COUNT_HW_BRANCH_MISSES` |
529+
| `page_faults` | `perf_type_software` | `PERF_COUNT_SW_PAGE_FAULTS` |
530+
| `context_switches` | `perf_type_software` | `PERF_COUNT_SW_CONTEXT_SWITCHES` |
531+
| `cpu_migrations` | `perf_type_software` | `PERF_COUNT_SW_CPU_MIGRATIONS` |
532+
533+
For event families with a richer config space, such as `perf_type_hw_cache`, provide the encoded kernel `perf_config` value directly instead of relying on a flattened enum.
534+
535+
**Generated C helpers (emitted when `attach(prog, perf_options{...}, flags)` is used):**
536+
537+
| Function | Signature | Description |
538+
|---|---|---|
539+
| `ks_open_perf_event` | `int (ks_perf_options)` | Calls `perf_event_open(2)`, returns fd |
540+
| `ks_attach_perf_event` | `PerfAttachment (int prog_fd, ks_perf_options, int flags)` | Full open-reset-attach-enable lifecycle |
541+
| `ks_read_perf_count` | `int64_t (int perf_fd)` | Reads current 64-bit counter via `read()` |
542+
| `ks_perf_attachment_read` | `int64_t (PerfAttachment)` | Direct fd read through the attachment value with stale-handle detection |
543+
544+
**Attach sequence (compiler-generated, inside `ks_attach_perf_event`):**
545+
1. `ks_attr.attr.disabled = 1` — open counter without starting it
546+
2. `syscall(SYS_perf_event_open, ...)``perf_fd`
547+
3. `ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0)` — zero the counter
548+
4. `bpf_program__attach_perf_event(prog, perf_fd)` — link BPF program
549+
5. `ioctl(perf_fd, PERF_EVENT_IOC_ENABLE, 0)`**start counting**
550+
551+
**Detach sequence (compiler-generated):**
552+
1. `ioctl(perf_fd, PERF_EVENT_IOC_DISABLE, 0)` — stop counting
553+
2. `bpf_link__destroy(link)` — unlink BPF program
554+
3. `close(perf_fd)` — release the kernel perf event
555+
556+
**Compiler implementation:**
557+
- Detects `attach(prog, perf_options_value, flags)` (three-argument form with `perf_options` second arg) and routes to `ks_attach_perf_event`
558+
- Returns a first-class `PerfAttachment` value for perf attaches so one program can hold multiple live counters
559+
- `PerfAttachment` carries `perf_fd` plus an internal generation token; `read(attachment)` avoids global attachment-list scans and rejects copied handles after detach
560+
- Exposes omitted `perf_options` fields as language-level defaults (partial struct literal)
561+
- Validates `pid ≥ -1`, `cpu ≥ -1`, and rejects `pid == -1 && cpu == -1` at runtime
562+
- Emits `PERF_FLAG_FD_CLOEXEC` for safe fd inheritance
563+
- BPF program section is `SEC("perf_event")`
564+
565+
**Project Initialization:**
566+
```bash
567+
# Initialize a perf_event project
568+
kernelscript init perf_event my_perf_monitor
569+
```
570+
443571
### 3.2 Named Configuration Blocks
444572
```kernelscript
445573
// Named configuration blocks - globally accessible

examples/perf_cache_miss.ks

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
// perf_cache_miss.ks
2+
// Demonstrates @perf_event program type in KernelScript.
3+
// The eBPF program runs on every hardware cache-miss event.
4+
// The userspace side opens the perf event and attaches the BPF program.
5+
6+
@perf_event
7+
fn on_cache_miss(ctx: *bpf_perf_event_data) -> i32 {
8+
return 0
9+
}
10+
11+
fn main() -> i32 {
12+
var prog = load(on_cache_miss)
13+
14+
// Only perf_type + perf_config are required; pid, cpu, period, wakeup and flag fields
15+
// default to: pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1,
16+
// inherit/exclude_kernel/exclude_user=false.
17+
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses, period: 10000000, inherit: true }, 0)
18+
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true }, 0)
19+
print("Cache-miss and branch-miss perf_event demo attached")
20+
var cache_count = read(cache)
21+
print("Cache-miss count: %lld", cache_count)
22+
var branch_count = read(branch)
23+
print("Branch-miss count: %lld", branch_count)
24+
25+
detach(cache)
26+
detach(branch)
27+
detach(prog)
28+
print("Cache-miss and branch-miss perf_event demo detached")
29+
return 0
30+
}

0 commit comments

Comments
 (0)