PInsight is a performance tool that help developers trace and measure the performance of the execution of parallel applications, and to analyze and visualize the tracing to identify performance issues. It includes PInsight shared library developed using LTTng UST for dynamic and asynchronous tracing, and Eclipse/Trace Compass plugins for analyzing and visualizing traces.
The PInsight trace library is a shared library called pinsight. The library implemnts OpenMP OMPT and CUDA CUPTI event
callbacks for tracing OpenMP and CUDA application, and MPI PMPI wrapper for tracing MPI applications.
The OMPT/CUPTI callbacks and PMPI wrappers redirect tracing to LTTng UST for processing tracing asynchronously. The library also
implements flexible runtime configuration of tracing options such as tracing rate, allowing 1) for optimizing dynamic tracing
to reduce unnecessnary and redundant tracing, 2) for runtime reconfiguration for both tracing and application for runtime performance optimization, and 3)
for performance debugging such that the application can introspect its own performance during execution, and resume with optimized configuration.
-
Install the essential dependencies. On Ubuntu systems:
sudo apt-get install build-essential cmake git -
Then install LTTng, which has some special setup instructions.
sudo apt-get install lttng-tools lttng-modules-dkms liblttng-ust-dev babeltrace -
Clone the repo and build the
pinsightshared library. It uses the cmake and make utilities. The option to enable features of PInsight includes:option(PINSIGHT_OPENMP "Build with OpenMP support" TRUE) option(PINSIGHT_MPI "Build with MPI support" FALSE) option(PINSIGHT_CUDA "Build with CUDA support" TRUE) option(PINSIGHT_ENERGY "Build with Energy tracing" FALSE) option(PINSIGHT_BACKTRACE "Build with Backtrace enabled" TRUE)
You can pass each option to cmake, e.g. cmake -DPINSIGHT_MPI=TRUE | FALSE <other options> to turn on or off each feature.
For OpenMP tracing, building the library requires omp.h and omp-tools.h headers. The PInsight repo src folder contains copies of these two header files
from Clang/LLVM 14.0. If you need to use the two headers from other locations, pass to cmake via OPENMP_INCLUDE_PATH setting.
For CUDA tracing, which depends on CUPTI, CUDA/CUPTI installation folder should be provided via CUDA_INSTALL with default to be /usr/local/cuda.
Example:
git clone https://github.com/passlab/pinsight.git
cd pinsight && mkdir build && cd build
cmake -DOPENMP_INCLUDE_PATH=/home/yanyh/tools/llvm-openmp-install/include ..
make
The instructions above will result in build/libpinsight.so to be created.
To build and trace OpenMP programs which uses the OMPT event tracing library, LLVM OpenMP runtime needs to be installed and OMPT
should be enabled to support runtime tracing. LLVM has combined modules into
one github repo. If you already have Clang/LLVM installed with OpenMP/OMPT enabled, you only need to provide the path to the OpenMP headers (omp.h and omp-tools.h) for
building the pinsight library. You will need the libomp.so library for PInsight to trace an OpenMP application.
If you do not have Clang/LLVM installed, you will need to clone the whole repo,
but only need to install the OpenMP library. The following is an example of building LLVM OpenMP runtime library (taken from our Dockerfile):
git clone https://github.com/llvm-project/openmp.git && \
cd openmp && \
git remote update && \
cd .. && \
mkdir -p openmp-build && \
cd openmp-build && \
cmake -G "Unix Makefiles" \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_INSTALL_PREFIX=/home/yanyh/tools/llvm-openmp-install \
-DOPENMP_ENABLE_LIBOMPTARGET=off \
-DLIBOMP_OMPT_SUPPORT=on \
../openmp
make && make install
For CUDA CUPTI tracing, PINSIGHT_CUDA should be set to TRUE for cmake. CUDA SDK with CUPTI SDK should be installed.
On Ubuntu 22.04, to install CUDA 12.2, follow steps in the
NVIDIA CUDA 12.2 download page, such as:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
The above command will install CUDA at /usr/local/cuda, with that, the pinsight library can be built with support for CUDA as:
git clone https://github.com/passlab/pinsight.git
cd pinsight && mkdir build && cd build
cmake -DPINSIGHT_CUDA=TRUE ..
make
If CUDA is not installed at /usr/local/cuda, The CUDA_INSTALL option should pass to cmake to provide the path to a CUDA install folder.
To be completed
In the scripts/ directory, a script called trace.sh is provided.
This script helps make generating LTTng traces for MPI/OpenMP/CUDA programs easier.
OpenMP OMP_NUM_THREADS should be set for the needed number of threads for OpenMP program and
mpirun can be used as command to launch MPI applications. The instruction of using trace.sh is as follows:
Usage for this tracing script for use with PInsight:
trace.sh PINSIGHT_LIB LD_LIBRARY_PATH_PREPEND TRACE_NAME TRACEFILE_DEST PROG_AND_ARGS...
Arguments:
PINSIGHT_LIB Full-path PInsight shared library file name to use with user application.
LD_LIBRARY_PATH_PREPEND A list of paths separated by :. The list is prepended to
the LD_LIBRARY_PATH env. This argument can be used to provide
path for the libraries used by the pinsight tracing, such as
path for libomp.so or libmpi.so. If none is needed, e.g. the path are
already set in the LD_LIBRARY_PATH, : should be provided for this arg.
TRACE_NAME Give a proper name for the trace to be displayed in tracecompass.
TRACEFILE_DEST Where to write the LTTng traces.
PROG_AND_ARGS... The program and its argument to be traced
Examples:
trace.sh /opt/pinsight/lib/libpinsight.so /opt/llvm-install/lib jacobi ./traces/jacobi ./test/jacobi 2048 2048
trace.sh /opt/pinsight/lib/libpinsight.so /opt/llvm-install/lib:/opt/openmpi-install/lib LULESH ./traces/LULESH mpirun -np 8 test/LULESH/build/lulesh2.0 -s 20
Example using the jacobi application with 8 threads:
yyan7@yyan7-Ubuntu:~/tools/pinsight$ export OMP_NUM_THREADS=8
yyan7@yyan7-Ubuntu:~/tools/pinsight$ ./scripts/trace.sh ./build/libpinsight.so /home/yyan7/compiler/llvm-openmp-install/lib jacobi ./traces/jacobi test/jacobi/jacobi 512
Session jacobi-tracing-session created.
Traces will be written in /home/yyan7/tools/pinsight/traces/jacobi
UST event lttng_pinsight_enter_exit:* created in channel channel0
UST event lttng_pinsight_ompt:* created in channel channel0
UST event lttng_pinsight_pmpi:* created in channel channel0
UST event lttng_pinsight_cuda:* created in channel channel0
Tracing started for session jacobi-tracing-session
Usage: jacobi [<n> <m> <alpha> <tol> <relax> <mits>]
n - grid dimension in x direction, default: 256
m - grid dimension in y direction, default: n if provided or 256
alpha - Helmholtz constant (always greater than 0.0), default: 0.0543
tol - error tolerance for iterative solver, default: 1e-10
relax - Successice over relaxation parameter, default: 1
mits - Maximum iterations for iterative solver, default: 5000
jacobi 512 512 0.0543 1e-10 1 5000
------------------------------------------------------------------------------------------------------
Total Number of Iterations: 5001
Residual: 2.35602684028891e-08
seq elasped time(ms): 13806
MFLOPS: 1224.58
Total Number of Iterations: 5001
Residual: 0.0269720666110516
OpenMP (8 threads) elasped time(ms): 4064
MFLOPS: 4160.06
Solution Error: 0.000921275
============================================================
Lexgion report from thread 0: total 13 lexgions
# codeptr_ra count trace count type end_codeptr_ra
1 0xffffff 1 1 3 0xffffff
2 0xffffff 1 0 7 (nil)
3 0x402405 1 1 3 0x402405
4 0x402405 1 1 7 (nil)
5 0x402544 1 1 20 0x402544
6 0x40257e 1 1 23 0x40257e
7 0x4010bd 5000 50 3 0x4010bd
8 0x4010bd 5000 50 7 (nil)
9 0x4012d9 5000 50 20 0x4013c5
10 0x401176 5000 50 3 0x401176
11 0x401176 5000 50 7 (nil)
12 0x40159d 5000 50 20 0x4017c7
13 0x401803 5000 50 23 0x401803
-------------------------------------------------------------
parallel lexgions (type 3) from thread 0
# codeptr_ra count trace count type end_codeptr_ra
1 0xffffff 1 1 3 0xffffff
2 0x402405 1 1 3 0x402405
3 0x4010bd 5000 50 3 0x4010bd
4 0x401176 5000 50 3 0x401176
-------------------------------------------------------------
sync lexgions (type 23) from thread 0
# codeptr_ra count trace count type end_codeptr_ra
1 0x40257e 1 1 23 0x40257e
2 0x401803 5000 50 23 0x401803
-------------------------------------------------------------
work lexgions (type 20) from thread 0
# codeptr_ra count trace count type end_codeptr_ra
1 0x402544 1 1 20 0x402544
2 0x4012d9 5000 50 20 0x4013c5
3 0x40159d 5000 50 20 0x4017c7
-------------------------------------------------------------
master lexgions (type 21) from thread 0
# codeptr_ra count trace count type end_codeptr_ra
============================================================
Waiting for data availability.
Tracing stopped for session jacobi-tracing-session
Session jacobi-tracing-session destroyed
yyan7@yyan7-Ubuntu:~/tools/pinsight$
scripts/trace.sh ./lib/libpinsight.so /home/yanyh/tools/llvm-openmp-install:/opt/openmpi-install/lib \
LULESH-MPI-8npX4th traces/LULESH-MPI-8npX4th mpirun -np 8 ./test/LULESH/build/lulesh2.0
After tracing complete, you can use babeltrace tools to dump the trace data to text on the terminal
babeltrace <tracefolder>
If not using the provided trace.sh script, one can follow LTTng document to set up tracing session, and then enable some event rules before starting tracing:
# Create a userspace session.
lttng create ompt-tracing-session --output=/tmp/ompt-trace
# Create and enable event rules.
lttng enable-event --userspace lttng_pinsight_ompt:'*'
lttng enable-event --userspace lttng_pinsight_pmpi:'*'
# Start LTTng tracing.
lttng start
Once the session has started, LTTng's daemon will be ready to accept the traces our application generates.
In order to generate traces about OMPT events, our libpinsight.so library needs to be preloaded by the application. Latest OpenMP standard provide OMP_TOOL_LIBRARIES
env to point the tool library (pinsight.so), thus no need to use LD_PRELOAD. The OpenMP runtime library is also need to provided for the execution.
In order to ensure the 2 shared libraries are loaded correctly, we use the LD_PRELOAD environment variable, like so:
# Run an OpenMP application with our shared libraries.
LD_PRELOAD=/opt/openmp-install/lib/libomp.so:/opt/pinsight/lib/libpinsight.so ./your_application
Once the application has run, we can tell LTTng to stop tracing:
# Stop LTTng tracing.
lttng stop
lttng destroy
The resulting trace files will be located at /tmp/ompt-trace, and can be loaded into TraceCompass or Babeltrace for analysis.
On larger trace datasets, the debug printouts from PInsight can be quite massive. By default, these printouts are disabled for sake of disk usage.
To enable the printouts, set the PINSIGHT_DEBUG_ENABLE environment variable to 1 before running a trace.
Shell variable example:
export PINSIGHT_DEBUG_ENABLE=1
# Normal tracing stuff from here on...
PInsight supports fine-grained control of tracing overhead through three domain trace modes and flexible runtime configuration.
Each domain (OpenMP, MPI, CUDA) can operate in one of three modes:
| Mode | Description | Overhead |
|---|---|---|
| OFF | Domain callbacks deregistered; zero per-event overhead | None |
| MONITORING | Bookkeeping active (lexgion creation, counters); no LTTng output | Low |
| TRACING | Full tracing with LTTng tracepoint emission (default) | Normal |
- OFF mode uses native deregistration APIs (
ompt_set_callback(event, NULL)for OpenMP) so the runtime never dispatches the event — truly zero overhead. - MONITORING mode maintains internal bookkeeping (lexgion tables, execution counters, rate sampling) without emitting traces, enabling later analysis of program structure.
- TRACING mode emits full LTTng tracepoints. Overhead depends on whether an LTTng session is active.
For detailed benchmark results, see doc/domain_trace_modes.md.
PINSIGHT_TRACE_OPENMP=<MODE> # OFF | MONITORING | TRACING (default: TRACING)
PINSIGHT_TRACE_MPI=<MODE> # OFF | MONITORING | TRACING (default: TRACING)
PINSIGHT_TRACE_CUDA=<MODE> # OFF | MONITORING | TRACING (default: TRACING)Accepted values for each mode:
| Mode | Accepted Values |
|---|---|
| OFF | OFF, FALSE, 0 |
| MONITORING | MONITORING, MONITOR |
| TRACING | ON, TRACING, TRUE, 1 (default) |
Examples:
# Full tracing (default behavior)
./myapp
# Monitor OpenMP regions without trace output
PINSIGHT_TRACE_OPENMP=MONITORING ./myapp
# Completely disable OpenMP tracing (zero overhead)
PINSIGHT_TRACE_OPENMP=OFF ./myapp
# Mix: trace MPI, monitor OpenMP, disable CUDA
PINSIGHT_TRACE_MPI=TRUE PINSIGHT_TRACE_OPENMP=MONITORING PINSIGHT_TRACE_CUDA=OFF ./myappPINSIGHT_TRACE_RATE=<trace_starts_at>:<max_num_traces>:<tracing_rate>[:<mode_after>]Controls how frequently lexgion executions are traced. The optional 4th field enables automatic mode switching when max_num_traces is reached (see Automatic Mode Switching below).
| Example | Meaning |
|---|---|
0:-1:1 |
Record all traces (system default) |
10:20:100 |
Record 20 traces starting from 10th iteration, 1 per 100 |
0:-1:10 |
Record 1 trace per 10 executions, indefinitely |
0:20:1 |
Record the first 20 executions |
0:0:0 |
Do not record any traces |
0:100:1:MONITORING |
Trace first 100, then switch all domains to MONITORING |
0:100:1:OpenMP:MONITORING,MPI:OFF |
Trace first 100, then set OpenMP to MONITORING, MPI to OFF |
PINSIGHT_TRACE_ENERGY=TRUE|FALSE # Energy tracing via RAPL
PINSIGHT_TRACE_BACKTRACE=TRUE|FALSE # Backtrace capture
PINSIGHT_DEBUG_ENABLE=0|1 # Debug printouts (default: 0)PInsight can automatically switch domain trace modes after a lexgion has been traced a specified number of times (max_num_traces). This eliminates the need for manual SIGUSR1 intervention and is useful for workflows like: trace the first N executions to capture behavior, then switch to low-overhead monitoring.
Append the mode (or domain-specific modes) as a 4th field to PINSIGHT_TRACE_RATE:
# Shorthand: switch ALL domains to MONITORING after 100 traces
PINSIGHT_TRACE_RATE=0:100:1:MONITORING
# Per-domain: switch OpenMP to MONITORING and MPI to OFF
PINSIGHT_TRACE_RATE=0:100:1:OpenMP:MONITORING,MPI:OFFUse the trace_mode_after key in any Lexgion section:
[Lexgion.default]
max_num_traces = 100
trace_mode_after = MONITORING # shorthand: all domains
[Lexgion(0x400500)]
max_num_traces = 50
trace_mode_after = OpenMP:MONITORING, MPI:OFF # per-domainNote: Both the config file and the env var use
:as the domain-mode separator (e.g.,OpenMP:MONITORING). In the env var, the first three:delimit the rate triple fields; everything after the third:is the mode_after string.
Key behaviors:
- The trigger fires once per domain per configuration load (
auto_triggeredflag prevents repeated mode switches) - After the mode switch, PInsight sets
mode_change_requested, which causes callback re-registration at the next safe point (same mechanism as SIGUSR1) - Sending
SIGUSR1to reload the config file resets theauto_triggeredflags, allowing triggers to fire again
A config file provides domain-specific and lexgion-specific configuration. PInsight looks for the config file in this order:
- Explicit path via environment variable:
export PINSIGHT_TRACE_CONFIG_FILE=/path/to/trace_config.txt - Default fallback:
pinsight_trace_config.txtin the current working directory
If neither exists, PInsight runs with default settings. The fallback means users can enable runtime reconfiguration at any time by simply creating a pinsight_trace_config.txt file — no env var needed.
See doc/PINSIGHT_TRACE_CONFIG_FORMAT.md for how to create a config file, doc/trace_config_example.txt for a complete example, and doc/trace_config_design.md for the overall configuration design.
Domain modes and tracing options can be changed at runtime by editing the config file and signaling the process:
- Create or edit the config file (e.g.,
pinsight_trace_config.txtin the working directory) - Send
kill -USR1 <pid>to the running application - PInsight re-reads the config and re-registers/deregisters domain callbacks accordingly
Note: Environment variables are read at process launch and cannot be changed from outside a running process. For runtime reconfiguration, use the config file.
This enables workflows such as:
- Start with OFF mode for warm-up, switch to TRACING for the region of interest
- Start with TRACING to capture initial behavior, switch to MONITORING to reduce overhead
- Runtime performance introspection: trace, analyze, reconfigure, resume
After tracing complete, you can use babeltrace tool to dump the trace data to text on the terminal
babeltrace <tracefolder>