Skip to content

Latest commit

 

History

History
140 lines (105 loc) · 7.44 KB

File metadata and controls

140 lines (105 loc) · 7.44 KB

java-profiler

Docs 中文文档 GitHub stars

Java performance profiling for Kubernetes services. Find where a HotSpot JVM is spending CPU, allocating memory, waiting on locks, pausing for GC, or blocking on Java I/O, using real async-profiler/JFR-derived data and a service-focused UI.

Docs · 中文文档 · Quickstart · Analyze a service · Contributing

Why java-profiler

Most observability stacks tell you that a Java service is slow. java-profiler is for the next question: which Java stack is responsible?

  • Kubernetes-native opt-in: enable profiling with annotations or labels. No application code changes.
  • Real JVM profile data: CPU, Wall Clock, allocation, lock-delay, Java I/O wait, and GC evidence come from async-profiler/JFR-derived collection.
  • Expert Java workbench: Top Table, Flame Graph, Both mode, selected-frame details, allocation summary, target status, deadlocks, profile evidence guidance, and ingestion health in one workflow.
  • Ownable storage: profile data lands in ClickHouse with retention bounded to 7 days or less.
  • Focused scope: no required Pyroscope, Parca, or Grafana backend.
  • Built for proof: real acceptance requires non-empty CPU, Wall Clock, Java I/O wait, GC, allocation, lock, ClickHouse, ingestion, and browser UI evidence.

Quickstart

Enable temporary profiling on a workload pod template:

metadata:
  annotations:
    java-profiler.io/profile-mode: temporary
    java-profiler.io/profile-duration: 15m

Open the Web UI, select the namespace, service, and time range, then start with:

  • status to confirm the JVM was accepted.
  • cpu to find expensive Java methods.
  • wall when latency is not explained by CPU alone.
  • io to isolate Java-owned socket or file blocking paths.
  • gc to correlate JVM pause evidence with allocation pressure.
  • memory to inspect allocation pressure with Allocation Summary, Top allocating paths, Top self allocating frames, and flamegraph context.
  • locks and deadlocks to investigate contention.
  • ingestion to confirm profile batches were accepted.

See the Quickstart and Performance Analysis Manual.

What it analyzes

  • CPU hotspots: high-cost Java methods, self time, total time, and sampled stack context.
  • Wall Clock latency: Java stack time spent runnable, blocked, waiting, sleeping, or doing I/O.
  • Java I/O wait: socket or file blocking paths when JVM/JFR evidence preserves Java ownership.
  • GC pauses: JVM GC event evidence correlated with allocation profiles and the incident window.
  • Allocation hotspots: methods and call paths creating allocation pressure.
  • Allocation summary: scoped sampled-allocation totals, top allocating paths, top self allocating frames, insight categories, partial-result limits, and clear empty-state reasons.
  • Lock delay: synchronized or monitor paths that block under contention.
  • Thread evidence: snapshots for CPU, lock, sleep, blocked, and waiting states.
  • Deadlock evidence: deadlock cycles reported by the target JVM.
  • Profiling health: accepted, disabled, unsupported, attach failure, profiler conflict, expired temporary windows, missing matching targets, rejected upload, or dropped ingestion data.

How it works

Kubernetes metadata
        |
        v
Node-local collector DaemonSet
        |
        v
async-profiler/JFR + thread diagnostics
        |
        v
Backend API -> ClickHouse
        |
        v
Service diagnosis UI

The first version targets Java services running on Kubernetes, HotSpot-compatible JVMs first. Profiling is controlled through Kubernetes metadata, collected node-locally, stored in ClickHouse, and exposed through a compact UI for service owners and platform engineers.

Screenshots

These screenshots come from a real Kubernetes acceptance environment, not mocked UI state. The allocation screenshot reflects the current wide analysis layout with summary cards, Top allocating paths, Top self allocating frames, and flamegraph context.

Real allocation profile analysis from the acceptance environment

Regenerate them from a port-forwarded real UI:

export REAL_ACCEPTANCE_BASE_URL=http://127.0.0.1:18081
export REAL_ACCEPTANCE_NAMESPACE=java-profiler-qa
export REAL_ACCEPTANCE_SERVICE=jdk17-http-demo
node scripts/capture-doc-screenshots.mjs

Develop

Run local checks before changing profiling, ingestion, backend APIs, or UI behavior:

go test ./...
javac --release 11 java-helper/thread-diagnostics/src/main/java/com/ebpfjava/threads/*.java
cd examples/jdk17-http-demo && mvn test
cd ../../web && npm ci && npm test && npm run build

Build the docs site:

cd docs
npm install
npm run docs:build

For changes touching collector profiling, ingestion, ClickHouse storage, backend query APIs, deployment, the demo service, or profile UI, run real Kubernetes acceptance. See Contributing and the Real Profiling Acceptance Standard.

If you are validating an existing non-demo workload, keep the same acceptance workflow but set JAVA_PROFILER_ACCEPTANCE_LOAD_PATHS to one or more HTTP paths that actually exist on that service.

Documentation

Scope

The first version does not include non-Java profiling, OpenJ9 support, heap dump analysis, distributed ClickHouse, tracing, log analysis, service maps, dashboarding, alerting, or Prometheus metric storage.

Metrics may be exposed by collector/backend exporters, but Prometheus-series systems own metric storage, dashboards, alerting, and retention.