Skip to content

Latest commit

 

History

History
788 lines (560 loc) · 21.1 KB

File metadata and controls

788 lines (560 loc) · 21.1 KB

Heap Dump Analysis Tutorial

This tutorial teaches you how to use JAFAR's heap dump analysis capabilities for exploring and diagnosing Java heap dumps (HPROF files). You'll learn the HdumpPath query language and common analysis patterns.

Table of Contents

  1. Prerequisites
  2. Creating a Heap Dump
  3. Opening a Heap Dump
  4. Basic Queries
  5. Object Analysis
  6. Class Analysis
  7. GC Root Analysis
  8. Memory Leak Detection
  9. What-If Simulationobjects/Foo | whatif()
  10. Object Age Estimation
  11. Cache Profilingobjects/LinkedHashMap | cacheStats()
  12. Heap Health Reportreport
  13. Common Analysis Patterns
  14. Tips and Best Practices

Prerequisites

  • JAFAR shell installed (see main README for installation)
  • A Java heap dump file (.hprof format)
  • Basic understanding of Java memory concepts

Creating a Heap Dump

If you don't have a heap dump, create one from a running Java process:

Using jcmd (Recommended)

# Find Java process ID
jps -l

# Create heap dump
jcmd <pid> GC.heap_dump /path/to/dump.hprof

Using jmap

jmap -dump:format=b,file=/path/to/dump.hprof <pid>

Programmatically

import com.sun.management.HotSpotDiagnosticMXBean;
import java.lang.management.ManagementFactory;

HotSpotDiagnosticMXBean bean = ManagementFactory.getPlatformMXBean(HotSpotDiagnosticMXBean.class);
bean.dumpHeap("/path/to/dump.hprof", true);

On OutOfMemoryError

Add JVM flag to automatically dump on OOM:

java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps/ MyApp

Opening a Heap Dump

Start the shell and open your heap dump:

# Start with a heap dump
jafar-shell /path/to/dump.hprof

# Or start empty and open later
jafar-shell
jafar> open /path/to/dump.hprof

When a heap dump is opened, you'll see basic statistics:

Opened heap dump: dump.hprof
  Size: 158 MB
  Objects: 1,923,456
  Classes: 16,831
  GC Roots: 4,200

Basic Queries

Count Objects

# Total objects
hdump> objects | count

# String objects
hdump> objects/java.lang.String | count

# Objects larger than 1KB
hdump> objects[shallow > 1KB] | count

View Sample Objects

# First 10 objects
hdump> objects | head(10)

# First 10 strings
hdump> objects/java.lang.String | head(10)

Get Statistics

# Shallow size statistics for all objects
hdump> objects | stats(shallow)

| count   | sum         | min | max      | avg   |
|---------|-------------|-----|----------|-------|
| 1923456 | 78,345,280  | 16  | 1048576  | 40.7  |

Object Analysis

Find Large Objects

# Top 10 by shallow size
hdump> objects | top(10, shallow)

# Large objects (>1MB)
hdump> objects[shallow > 1MB] | select(class, shallow)

Analyze by Class

# Group objects by class
hdump> objects | groupBy(class, agg=count) | top(10, count)

| class                      | count   |
|---------------------------|---------|
| java.util.HashMap$Node    | 249,734 |
| java.lang.String          | 238,750 |
| java.lang.Object[]        | 156,234 |
| char[]                    | 145,632 |

Memory by Class

# Total memory by class (sum shallow sizes)
hdump> objects | groupBy(class, agg=sum) | top(10, sum)

| class                | sum         |
|---------------------|-------------|
| byte[]              | 45,234,567  |
| char[]              | 12,345,678  |
| java.lang.String    | 9,072,500   |

String Analysis

Strings often dominate heap usage:

# String statistics
hdump> objects/java.lang.String | stats(shallow)

# Large strings
hdump> objects/java.lang.String[shallow > 1KB] | top(10, shallow)

# String count by size range
hdump> objects/java.lang.String[shallow > 100] | count

Array Analysis

Arrays can consume significant memory:

# Large arrays
hdump> objects[arrayLength > 10000] | top(10, shallow)

# Byte arrays (common for buffers)
hdump> objects/byte[] | stats(shallow)

# Object arrays
hdump> objects/java.lang.Object[] | top(10, shallow)

Class Analysis

Class Overview

# All classes
hdump> classes | count

# Classes with most instances
hdump> classes | top(10, instanceCount)

| name                        | instanceCount |
|----------------------------|---------------|
| java.util.HashMap$Node     | 249,734       |
| java.lang.String           | 238,750       |
| java.lang.Object[]         | 156,234       |

Find Specific Classes

# Search by name pattern
hdump> classes/java.util.* | select(name, instanceCount)

# Classes with many instances
hdump> classes[instanceCount > 1000] | sortBy(instanceCount desc)

Class Hierarchy

# Find implementations of Map
hdump> objects/instanceof/java.util.Map | groupBy(class) | top(10, count)

| class                          | count  |
|-------------------------------|--------|
| java.util.HashMap             | 12,345 |
| java.util.LinkedHashMap       | 5,678  |
| java.util.concurrent.ConcurrentHashMap | 1,234 |

Collection Analysis

# All collections
hdump> objects/instanceof/java.util.Collection | groupBy(class) | top(10, count)

# List implementations
hdump> objects/instanceof/java.util.List | groupBy(class) | top(5, count)

# Set implementations
hdump> objects/instanceof/java.util.Set | groupBy(class) | top(5, count)

GC Root Analysis

GC roots are entry points that keep objects alive.

Root Distribution

# GC roots by type
hdump> gcroots | groupBy(type, agg=count) | sortBy(count desc)

| type         | count |
|-------------|-------|
| JAVA_FRAME  | 2,345 |
| THREAD_OBJ  | 1,234 |
| JNI_GLOBAL  | 456   |
| STICKY_CLASS| 165   |

Thread Roots

# Thread-related roots
hdump> gcroots/THREAD_OBJ | head(10)

# Java frame roots (stack variables)
hdump> gcroots/JAVA_FRAME | head(10)

JNI References

# JNI global references (potential leaks)
hdump> gcroots/JNI_GLOBAL | head(20)

hdump> gcroots/JNI_GLOBAL | count

Memory Leak Detection

Built-in Leak Detectors

The shell includes 6 built-in leak detectors that can be run with checkLeaks:

# Run all detectors (interactive wizard)
hdump> checkLeaks()

# Run a specific detector
hdump> checkLeaks(detector="threadlocal-leak")

Available detectors:

Detector Description
threadlocal-leak Find ThreadLocal instances with large retained sizes attached to threads
classloader-leak Detect ClassLoader instances that should be GC'd but are retained
duplicate-strings Find identical string values with high instance counts
growing-collections Detect collections (HashMap, ArrayList, etc.) with large retained sizes
listener-leak Find event listeners that may not have been deregistered
finalizer-queue Detect objects stuck in the finalizer queue

Detector parameters:

# Set minimum size threshold (default varies by detector)
hdump> checkLeaks(detector="growing-collections", minSize=10MB)

# Set count threshold
hdump> checkLeaks(detector="duplicate-strings", threshold=100)

Retained Size Analysis

Retained size shows how much memory would be freed if an object were garbage collected:

# Top objects by retained size
hdump> objects | top(10, retained)

# Classes by total retained memory
hdump> objects | groupBy(class) | top(10, retained)

# Large retained objects of specific type
hdump> objects/java.util.HashMap[retained > 10MB] | top(10, retained)

Path to GC Root

Find why an object is kept alive:

# Find retention path for a specific object
hdump> objects[id = 0x12345678] | pathToRoot

# Find path for largest HashMap
hdump> objects/java.util.HashMap | top(1, retained) | pathToRoot

Dominator Tree

See what objects dominate memory (keep other objects alive):

# Dominator tree for an object
hdump> objects[id = 0x12345678] | dominators

# Dominators grouped by class
hdump> objects[id = 0x12345678] | dominators(groupBy="class")

# Full dominator tree view
hdump> objects[id = 0x12345678] | dominators("tree")

Collection Waste Analysis

Find over-allocated collections wasting memory:

# Analyze HashMap waste — find over-allocated instances
hdump> objects/java.util.HashMap | waste() | sortBy(wastedBytes desc) | top(20)

# Find collections with large capacity but few entries
hdump> objects/java.util.HashMap | waste() | filter(loadFactor < 0.1) | top(20)

# Aggregate waste by collection class (analyze each supported type separately)
hdump> objects/java.util.HashMap | waste() | groupBy(class, agg=sum, value=wastedBytes)
hdump> objects/java.util.ArrayList | waste() | groupBy(class, agg=sum, value=wastedBytes)

Automatic Leak Cluster Detection

The clusters root uses graph-based analysis to detect unknown leak patterns by identifying densely-connected subgraphs with high retained size but few GC root anchors.

# List all detected clusters ranked by suspiciousness
hdump> clusters | sortBy(score desc) | head(10)

# Filter for significant clusters
hdump> clusters | filter(retainedSize > 10MB)

# Drill into a specific cluster to see its member objects
hdump> clusters[id = 3] | objects | sortBy(retained desc) | head(10)

# Find clusters anchored by thread objects
hdump> clusters | filter(anchorType = "THREAD_OBJ") | sortBy(retainedSize desc)

The leak score (retainedSize / rootPathCount) indicates how suspicious a cluster is: fewer GC root paths holding more memory is more suspicious.

Manual Leak Patterns

Pattern 1: Unusually Large Instance Counts

# Classes with most instances
hdump> classes | top(20, instanceCount)

# Focus on application classes
hdump> classes/com.myapp.* | top(10, instanceCount)

Pattern 2: Growing Collections

# Large HashMaps by retained size
hdump> objects/java.util.HashMap | top(10, retained)

# Large ArrayLists
hdump> objects/java.util.ArrayList[arrayLength > 1000] | head(10)

Pattern 3: Duplicate Strings

# Use the built-in detector
hdump> checkLeaks(detector="duplicate-strings", threshold=50)

# Or manual analysis
hdump> objects/java.lang.String | stats(shallow)

Pattern 4: Cache Analysis

# Find cache-like structures
hdump> objects/instanceof/java.util.Map | groupBy(class, agg=count)

# Large Maps that might be caches
hdump> classes/*Cache* | select(name, instanceCount)

What-If Simulation

The whatif() pipeline operator answers prioritisation questions like "if I fix this leak, how much memory would be freed?" Pipe any object or GC root query into whatif() to get a simulation result.

Output columns: action, targetCount, freedBytes, freedObjects, freedPct, remainingRetained.

# How much memory would be freed by fixing LeakyCache?
hdump> objects/com.example.LeakyCache | whatif()

# Focus on the key metrics
hdump> objects/com.example.LeakyCache | whatif() | select(freedBytes, freedPct)

# Simulate freeing thread-owned objects
hdump> gcroots/THREAD_OBJ | whatif()

# Compare impact of multiple candidates
hdump> objects/com.example.LeakyCache | whatif()
hdump> objects/com.example.SessionStore | whatif()

# Filter before simulating — only large instances
hdump> objects/com.example.Foo[retained > 1MB] | whatif()

# Target a specific object by ID
hdump> objects[id = 12345] | whatif()

Note: freedBytes is the sum of retained sizes of the input objects. When a full dominator tree has been computed, freedObjects reflects the total dominated subtree; otherwise it equals targetCount.

Object Age Estimation

The ages root and estimateAge() operator estimate how long objects have been alive using structural signals from a single heap snapshot — no second dump required.

Scoring signals (0–100):

  • GC root type: STICKY_CLASS (system classes) → +30, JNI_GLOBAL → +25, THREAD_OBJ → +5
  • Inbound reference count: min(25, refCount × 2)
  • BFS depth from GC root: max(0, 10 − depth) (shallower = older)

Age buckets: ephemeral (0–25), medium (26–50), tenured (51–75), permanent (76–100)

# Distribution of all objects by age bucket
hdump> ages | groupBy(ageBucket, agg=count)

# Oldest objects by retained size
hdump> ages | filter(ageBucket = "tenured") | top(10, retained)

# Class-filtered age view
hdump> ages/com.example.SessionStore | select(id, estimatedAge, ageBucket)

# As a pipeline operator on any object stream
hdump> objects/java.util.HashMap | estimateAge() | sortBy(estimatedAge desc) | top(20)

# Suspicious: WeakReferences that should be short-lived but are old
hdump> objects/java.lang.ref.WeakReference | estimateAge() | filter(ageBucket = "tenured") | top(20)

Note: Age scores are heuristic estimates based on structural signals, not GC generation metadata. Objects not reachable from any GC root default to score 20 (ephemeral). The first ages or estimateAge() call triggers a two-pass scan; subsequent calls use the cached result.

Cache Profiling

The cacheStats() operator enriches Map-based object rows with five additional columns:

Field Type Description
entryCount int Number of entries currently in the map
maxSize int Internal table capacity; -1 if unknown
fillRatio double entryCount / capacity; 0.0 if unknown
costPerEntry long retainedSize / entryCount; 0 if empty
isLruMode boolean true if LinkedHashMap with accessOrder=true

Basic Usage

# Analyze all LinkedHashMap instances
hdump> objects/java.util.LinkedHashMap | cacheStats()

# Find LRU caches (LinkedHashMap with accessOrder=true)
hdump> objects/java.util.LinkedHashMap | cacheStats() | filter(isLruMode = true)

# Top 10 most expensive caches by retained bytes per entry
hdump> objects/instanceof/java.util.Map | cacheStats() | top(10, costPerEntry)

# Find under-utilized maps (< 10% fill)
hdump> objects/java.util.HashMap | cacheStats() | filter(fillRatio < 0.1) | sortBy(entryCount desc)

Supported Types

cacheStats() recognizes HashMap, LinkedHashMap, and WeakHashMap directly. It also attempts to analyze any object that exposes size and table fields, covering most Map subclasses. Rows for unrecognized types receive null values in all cache columns.

Heap Health Report

The report command runs a set of non-destructive analyses and produces a severity-ranked narrative. It is the quickest way to get an overview of a heap dump.

Usage

# Plain-text report
hdump> report

# Markdown format
hdump> report --format=markdown

# Restrict to specific areas
hdump> report --focus=duplicates,waste

Report Structure

=== Heap Health Report ===
Heap: myapp.hprof (158 MB, 1,923,456 objects, 16,831 classes)

--- WARNING ---

[W1] Duplicate subgraphs: 48 groups, ~3.2 MB reclaimable
     Most wasteful: com.example.Config (12 copies)
     Action: Run: duplicates | sortBy(wastedBytes desc)
     Query: duplicates | sortBy(wastedBytes desc)

--- INFO ---

[I1] Heap summary: 1,923,456 objects, 16,831 classes, 158 MB total shallow
[I2] Top class by instance count: java.util.HashMap$Node (249,734 instances, 1.9 MB shallow)
[I3] GC roots: 3 STICKY_CLASS, 1 THREAD_OBJ

Severity Thresholds

  • CRITICAL: retained size > 100 MB
  • WARNING: retained size > 10 MB, or duplicate waste > 1 MB
  • INFO: all other findings

Analysis Contributors

report always runs all contributors, triggering expensive computations as needed. Progress is shown in the console for long-running steps.

Contributor Notes
Heap overview O(1)
Class histogram O(classes)
GC root summary O(roots)
Approximate retained sizes computed if not already done
Leak detectors (6 built-in) uses retained sizes
Collection waste estimate uses retained sizes
Duplicate subgraphs (depth 3) computed if not already done

Use --focus= to restrict which contributors run:

Focus value Contributors
leaks retained sizes + all leak detectors
waste retained sizes + collection waste
duplicates duplicate subgraph fingerprinting

Common Analysis Patterns

Memory Footprint by Package

# Group by package prefix
hdump> classes/com.myapp.* | top(10, instanceCount)
hdump> classes/org.springframework.* | top(10, instanceCount)

Heap Summary

Create a comprehensive overview:

# Object count
hdump> objects | count

# Total shallow size
hdump> objects | sum(shallow)

# Class count
hdump> classes | count

# Top consumers
hdump> objects | groupBy(class, agg=sum) | top(10, sum)

Compare Before/After (Heap Diff)

Open two heap dumps and use join to diff them:

hdump> open before.hprof
hdump> open after.hprof

# Diff class histograms — session 1 is before.hprof (the baseline)
hdump> classes | join(session=1) | sortBy(instanceCountDelta desc) | head(20)

# Find classes with growing instance counts
hdump> classes | join(session=1) | filter(instanceCountDelta > 0) | top(10, instanceCountDelta)

# New classes not present in the baseline
hdump> classes | join(session=1) | filter(baseline.exists = false)

# GC root type changes
hdump> gcroots | groupBy(type, agg=count) | join(session=1) | sortBy(countDelta desc)

The join operator adds baseline.* and *Delta columns for every numeric field. The join key is auto-inferred (name for classes, className for objects, type for GC roots) and can be overridden with by=field.

JFR + Heap Dump Correlation

When you have both a JFR recording and a heap dump from the same application, you can correlate allocation activity (from JFR) with heap state (from the dump). Use the root= parameter to specify the JFR event type:

# Open both sources
hdump> open recording.jfr
hdump> open dump.hprof

# Enrich class histogram with allocation data from JFR
hdump> classes | join(session="recording.jfr", root="jdk.ObjectAllocationSample", by=class)

# Find high-churn classes: many allocations but few survivors in the heap
hdump> classes | join(session=1, root="jdk.ObjectAllocationSample", by=class) | filter(allocCount > 1000) | sortBy(survivalRatio asc) | head(20)

# Top classes by total allocation weight
hdump> classes | join(session=1, root="jdk.ObjectAllocationSample", by=class) | sortBy(allocWeight desc) | top(10)

The JFR correlation adds enrichment columns: allocCount, allocWeight, allocRate, topAllocSite, and survivalRatio (instanceCount / allocCount). Classes with no matching JFR events get null values for all enrichment columns.

Export for Further Analysis

# Export to JSON for external tools
hdump> objects | groupBy(class, agg=count) | top(100, count) --format json > classes.json

# CSV output
hdump> classes | top(20, instanceCount) --format csv > top-classes.csv

# Limit output rows
hdump> objects | top(100, shallow) --limit 10

Tips and Best Practices

1. Start with Overview

Always begin with high-level statistics:

objects | count
objects | stats(shallow)
classes | count
gcroots | groupBy(type)

2. Use instanceof for Interface Analysis

Find all implementations:

objects/instanceof/java.io.Closeable | groupBy(class)

3. Focus on Your Application

Filter to your packages:

classes/com.myapp.* | top(10, instanceCount)

4. Compare Heap Dumps

Take dumps at different times and use join to diff them:

open before.hprof
open after.hprof
classes | join(session=1) | filter(instanceCountDelta > 0) | top(20, instanceCountDelta)

Good scenarios for comparison:

  • Before and after a suspected leak operation
  • Under normal load vs. high load
  • Fresh start vs. after extended run

5. Look for Patterns

Common leak indicators:

  • Steadily growing instance counts
  • Large collections (Maps, Lists)
  • Many JNI_GLOBAL references
  • Duplicate strings

6. Use Size Units

Make queries readable:

# Instead of
objects[shallow > 1048576]

# Use
objects[shallow > 1MB]

7. Chain Operations Efficiently

Filter early to reduce data:

# Good: filter first
objects/java.lang.String[shallow > 1KB] | groupBy(class) | top(10)

# Less efficient: groupBy all, then filter
objects/java.lang.String | groupBy(class) | filter(count > 100)

8. Start Leak Analysis with checkLeaks

Use built-in detectors for quick wins:

# Interactive wizard runs all detectors
checkLeaks()

# Or target specific leak types
checkLeaks(detector="threadlocal-leak")
checkLeaks(detector="classloader-leak")

9. Use Retained Size for Impact Analysis

Shallow size shows object's own memory; retained size shows total impact:

# Find objects with highest memory impact
objects | top(10, retained)

# Then trace why they're retained
objects[id = 0x12345] | pathToRoot

Next Steps