Java performance profiling for Kubernetes services. Find where a HotSpot JVM is spending CPU, allocating memory, waiting on locks, pausing for GC, or blocking on Java I/O, using real async-profiler/JFR-derived data and a service-focused UI.
Docs · 中文文档 · Quickstart · Analyze a service · Contributing
Most observability stacks tell you that a Java service is slow. java-profiler is for the next question: which Java stack is responsible?
- Kubernetes-native opt-in: enable profiling with annotations or labels. No application code changes.
- Real JVM profile data: CPU, Wall Clock, allocation, lock-delay, Java I/O wait, and GC evidence come from async-profiler/JFR-derived collection.
- Expert Java workbench: Top Table, Flame Graph, Both mode, selected-frame details, allocation summary, target status, deadlocks, profile evidence guidance, and ingestion health in one workflow.
- Ownable storage: profile data lands in ClickHouse with retention bounded to 7 days or less.
- Focused scope: no required Pyroscope, Parca, or Grafana backend.
- Built for proof: real acceptance requires non-empty CPU, Wall Clock, Java I/O wait, GC, allocation, lock, ClickHouse, ingestion, and browser UI evidence.
Enable temporary profiling on a workload pod template:
metadata:
annotations:
java-profiler.io/profile-mode: temporary
java-profiler.io/profile-duration: 15mOpen the Web UI, select the namespace, service, and time range, then start with:
statusto confirm the JVM was accepted.cputo find expensive Java methods.wallwhen latency is not explained by CPU alone.ioto isolate Java-owned socket or file blocking paths.gcto correlate JVM pause evidence with allocation pressure.memoryto inspect allocation pressure with Allocation Summary, Top allocating paths, Top self allocating frames, and flamegraph context.locksanddeadlocksto investigate contention.ingestionto confirm profile batches were accepted.
See the Quickstart and Performance Analysis Manual.
- CPU hotspots: high-cost Java methods, self time, total time, and sampled stack context.
- Wall Clock latency: Java stack time spent runnable, blocked, waiting, sleeping, or doing I/O.
- Java I/O wait: socket or file blocking paths when JVM/JFR evidence preserves Java ownership.
- GC pauses: JVM GC event evidence correlated with allocation profiles and the incident window.
- Allocation hotspots: methods and call paths creating allocation pressure.
- Allocation summary: scoped sampled-allocation totals, top allocating paths, top self allocating frames, insight categories, partial-result limits, and clear empty-state reasons.
- Lock delay: synchronized or monitor paths that block under contention.
- Thread evidence: snapshots for CPU, lock, sleep, blocked, and waiting states.
- Deadlock evidence: deadlock cycles reported by the target JVM.
- Profiling health: accepted, disabled, unsupported, attach failure, profiler conflict, expired temporary windows, missing matching targets, rejected upload, or dropped ingestion data.
Kubernetes metadata
|
v
Node-local collector DaemonSet
|
v
async-profiler/JFR + thread diagnostics
|
v
Backend API -> ClickHouse
|
v
Service diagnosis UI
The first version targets Java services running on Kubernetes, HotSpot-compatible JVMs first. Profiling is controlled through Kubernetes metadata, collected node-locally, stored in ClickHouse, and exposed through a compact UI for service owners and platform engineers.
These screenshots come from a real Kubernetes acceptance environment, not mocked UI state. The allocation screenshot reflects the current wide analysis layout with summary cards, Top allocating paths, Top self allocating frames, and flamegraph context.
- Target status evidence
- CPU profile analysis
- Allocation evidence
- Wall Clock latency evidence
- Java I/O wait evidence
- GC pause and allocation correlation
- Deadlock diagnosis surface
- Ingestion health evidence
Regenerate them from a port-forwarded real UI:
export REAL_ACCEPTANCE_BASE_URL=http://127.0.0.1:18081
export REAL_ACCEPTANCE_NAMESPACE=java-profiler-qa
export REAL_ACCEPTANCE_SERVICE=jdk17-http-demo
node scripts/capture-doc-screenshots.mjsRun local checks before changing profiling, ingestion, backend APIs, or UI behavior:
go test ./...
javac --release 11 java-helper/thread-diagnostics/src/main/java/com/ebpfjava/threads/*.java
cd examples/jdk17-http-demo && mvn test
cd ../../web && npm ci && npm test && npm run buildBuild the docs site:
cd docs
npm install
npm run docs:buildFor changes touching collector profiling, ingestion, ClickHouse storage, backend query APIs, deployment, the demo service, or profile UI, run real Kubernetes acceptance. See Contributing and the Real Profiling Acceptance Standard.
If you are validating an existing non-demo workload, keep the same acceptance workflow but set JAVA_PROFILER_ACCEPTANCE_LOAD_PATHS to one or more HTTP paths that actually exist on that service.
- Online docs
- 中文文档
- Quickstart
- Analyze a Java service
- Enable profiling
- Deploy and operate the platform
- Development setup
- Localization policy
- Architecture
The first version does not include non-Java profiling, OpenJ9 support, heap dump analysis, distributed ClickHouse, tracing, log analysis, service maps, dashboarding, alerting, or Prometheus metric storage.
Metrics may be exposed by collector/backend exporters, but Prometheus-series systems own metric storage, dashboards, alerting, and retention.
