Skip to content

Commit c0728c3

Browse files
authored
Merge pull request #2649 from Walnut356/debuginfo
Add Debug Info section
2 parents 3e64409 + 69b464d commit c0728c3

13 files changed

+1340
-1
lines changed

src/SUMMARY.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,11 +232,22 @@
232232
- [Debugging LLVM](./backend/debugging.md)
233233
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
234234
- [Implicit caller location](./backend/implicit-caller-location.md)
235+
- [Debug Info](./debuginfo/intro.md)
236+
- [Rust Codegen](./debuginfo/rust-codegen.md)
237+
- [LLVM Codegen](./debuginfo/llvm-codegen.md)
238+
- [Debugger Internals](./debuginfo/debugger-internals.md)
239+
- [LLDB Internals](./debuginfo/lldb-internals.md)
240+
- [GDB Internals](./debuginfo/gdb-internals.md)
241+
- [Debugger Visualizers](./debuginfo/debugger-visualizers.md)
242+
- [LLDB - Python Providers](./debuginfo/lldb-visualizers.md)
243+
- [GDB - Python Providers](./debuginfo/gdb-visualizers.md)
244+
- [CDB - Natvis](./debuginfo/natvis-visualizers.md)
245+
- [Testing](./debuginfo/testing.md)
246+
- [(Lecture Notes) Debugging support in the Rust compiler](./debugging-support-in-rustc.md)
235247
- [Libraries and metadata](./backend/libs-and-metadata.md)
236248
- [Profile-guided optimization](./profile-guided-optimization.md)
237249
- [LLVM source-based code coverage](./llvm-coverage-instrumentation.md)
238250
- [Sanitizers support](./sanitizers.md)
239-
- [Debugging support in the Rust compiler](./debugging-support-in-rustc.md)
240251

241252
---
242253

src/debuginfo/CodeView.pdf

209 KB
Binary file not shown.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Debugger Internals
2+
3+
It is the debugger's job to convert the debug info into an in-memory representation. Both the
4+
interpretation of the debug info and the in-memory representation are arbitrary; anything will do
5+
so long as meaningful information can be reconstructed while the program is running. The pipeline
6+
from raw debug info to usable types can be quite complicated.
7+
8+
Once the information is in a workable format, the debugger front-end then must provide a way to
9+
interpret and display the data, a way for users to interact with it, and an API for extensibility.
10+
11+
Debuggers are vast systems and cannot be covered completely here. This section will provide a brief
12+
overview of the subsystems directly relevant to the Rust debugging experience.
13+
14+
Microsoft's debugging engine is closed source, so it will not be covered here.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Debugger Visualizers
2+
3+
These are typically the last step before the debugger displays the information, but the results may
4+
be piped through a debug adapter such as an IDE's debugger API.
5+
6+
The term "Visualizer" is a bit of a misnomer. The real goal isn't just to prettify the output, but
7+
to provide an interface for the user to interact with that is as useful as possible. In many cases
8+
this means reconstructing the original type as closely as possible to its Rust representation, but
9+
not always.
10+
11+
The visualizer interface allows generating "synthetic children" - fields that don't exist in the
12+
debug info, but can be derived from invariants about the language and the type itself. A simple
13+
example is allowing one to interact with the elements of a `Vec<T>` instead of just it's `*mut u8`
14+
heap pointer, length, and capacity.
15+
16+
## `rust-lldb`, `rust-gdb`, and `rust-windbg.cmd`
17+
18+
These support scripts are distributed with Rust toolchains. They locate the appropriate debugger and
19+
the toolchain's visualizer scripts, then launch the debugger with the appropriate arguments to load
20+
the visualizer scripts before a debugee is launched/attached to.
21+
22+
## `#![debugger_visualizer]`
23+
24+
[This attribute][dbg_vis_attr] allows Rust library authors to include pretty printers for their
25+
types within the library itself. These pretty printers are of the same format as typical
26+
visualizers, but are embedded directly into the compiled binary. These scripts are loaded
27+
automatically by the debugger, allowing a seamless experience for users. This attribute currently
28+
works for GDB and natvis scripts.
29+
30+
[dbg_vis_attr]: https://doc.rust-lang.org/reference/attributes/debugger.html#the-debugger_visualizer-attribute
31+
32+
GDB python scripts are embedded in the `.debug_gdb_scripts` section of the binary. More information
33+
can be found [here](https://sourceware.org/gdb/current/onlinedocs/gdb.html/dotdebug_005fgdb_005fscripts-section.html). Rustc accomplishes this in [`rustc_codegen_llvm/src/debuginfo/gdb.rs`][gdb_rs]
34+
35+
[gdb_rs]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_llvm/src/debuginfo/gdb.rs
36+
37+
Natvis files can be embedded in the PDB debug info using the [`/NATVIS` linker option][linker_opt],
38+
and have the [highest priority][priority] when a type is resolving which visualizer to use. The
39+
files specified by the attribute are collected into
40+
[`CrateInfo::natvis_debugger_visualizers`][natvis] which are then added as linker arguments in
41+
[`rustc_codegen_ssa/src/back/linker.rs`][linker_rs]
42+
43+
[linker_opt]: https://learn.microsoft.com/en-us/cpp/build/reference/natvis-add-natvis-to-pdb?view=msvc-170
44+
[priority]: https://learn.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=visualstudio#BKMK_natvis_location
45+
[natvis]: https://github.com/rust-lang/rust/blob/e0e204f3e97ad5f79524b9c259dc38df606ed82c/compiler/rustc_codegen_ssa/src/lib.rs#L212
46+
[linker_rs]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_ssa/src/back/linker.rs#L1106
47+
48+
LLDB is not currently supported, but there are a few methods that could potentially allow support in
49+
the future. Officially, the intended method is via a [formatter bytecode][bytecode]. This was
50+
created to offer a comparable experience to GDB's, but without the safety concerns associated with
51+
embedding an entire python script. The opcodes are limited, but it works with `SBValue` and `SBType`
52+
in roughly the same way as python visualizer scripts. Implementing this would require writing some
53+
sort of DSL/mini compiler.
54+
55+
[bytecode]: https://lldb.llvm.org/resources/formatterbytecode.html
56+
57+
Alternatively, it might be possible to copy GDB's strategy entirely: create a bespoke section in the
58+
binary and embed a python script in it. LLDB will not load it automatically, but the python API does
59+
allow one to access the [raw sections of the debug info][SBSection]. With this, it may be possible
60+
to extract the python script from our bespoke section and then load it in during the startup of
61+
Rust's visualizer scripts.
62+
63+
[SBSection]: https://lldb.llvm.org/python_api/lldb.SBSection.html#sbsection
64+
65+
## Performance
66+
67+
Before tackling the visualizers themselves, it's important to note that these are part of a
68+
performance-sensitive system. Please excuse the break in formality, but: if I have to spend
69+
significant time debugging, I'm annoyed. If I have to *wait on my debugger*, I'm pissed.
70+
71+
Every millisecond spent in these visualizers is a millisecond longer for the user to see output.
72+
This can be especially painful for large stackframes that contain many/large container types.
73+
Debugger GUI's such as VSCode will request the whole stack frame at once, and this can result in
74+
delays of tens of seconds (or even minutes) before being able to interact with any variables in the
75+
frame.
76+
77+
There is a tendancy to balk at the idea of optimizing Python code, but it really can have a
78+
substantial impact. Remember, there is no compiler to help keep the code fast. Even simple
79+
transformations are not done for you. It can be difficult to find Python performance tips through
80+
all the noise of people suggesting you don't bother optimizing Python, so here are some things to
81+
keep in mind that are relevant to these scripts:
82+
83+
* Everything allocates, even `int`
84+
* Use tuples when possible. `list` is effectively `Vec<Box<[Any]>>`, whereas tuples are equivalent
85+
to `Box<[Any]>`. They have one less layer of indirection, don't carry extra capacity and can't
86+
grow/shrink which can be advantageous in many cases. An additional benefit is that Python caches and
87+
recycles the underlying allocations of all tuples up to size 20.
88+
* Regexes are slow and should be avoided when simple string manipulation will do
89+
* Strings are immutable, thus many string operations implictly copy the contents.
90+
* When concatenating large lists of strings, `"".join(iterable_of_strings)` is typically the fastest
91+
way to do it.
92+
* f-strings are generally the fastest way to do small, simple string transformations such as
93+
surrounding a string with parentheses.
94+
* The act of calling a function is somewhat slow (even if the function is completely empty). If the
95+
code section is very hot, consider inlining the function manually.
96+
* Local variable access is significantly faster than global and built-in function access
97+
* Member/method access via the `.` operator is also slow, consider reassigning deeply nested values
98+
to local variables to avoid this cost (e.g. `h = a.b.c.d.e.f.g.h`).
99+
* Accessing inherited methods and fields is about 2x slower than base-class methods and fields.
100+
Avoid inheritance whenever possible.
101+
* Use [`__slots__`](https://wiki.python.org/moin/UsingSlots) wherever possible. `__slots__` is a way
102+
to indicate to Python that your class's fields won't change and speeds up field access by a
103+
noticable amount. This does require you to name your fields in advance and initialize them in
104+
`__init__`, but it's a small price to pay for the benefits.
105+
* Match statements/if..elif..else are not optimized in any way. The conditions are checked in order,
106+
1 by 1. If possible, use an alternative such as dictionary dispatch or a table of values
107+
* Compute lazily when possible
108+
* List comprehensions are typically faster than loops, generator comprehensions are a bit slower
109+
than list comprehensions, but use less memory. You can think of comprehensions as equivalent to
110+
Rust's `iter.map()`. List comprehensions effectively call `collect::<Vec<_>>` at the end, whereas
111+
generator comprehensions do not.

src/debuginfo/gdb-internals.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# (WIP) GDB Internals
2+
3+
GDB's Rust support lives at `gdb/rust-lang.h` and `gdb/rust-lang.c`. The expression parsing support
4+
can be found in `gdb/rust-exp.h` and `gdb/rust-parse.c`

src/debuginfo/gdb-visualizers.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# (WIP) GDB - Python Providers
2+
3+
Below are links to relevant parts of the GDB documentation
4+
5+
* [Overview on writing a pretty printer](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-a-Pretty_002dPrinter.html#Writing-a-Pretty_002dPrinter)
6+
* [Pretty Printer API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pretty-Printing-API.html#Pretty-Printing-API) (equivalent to LLDB's `SyntheticProvider`)
7+
* [Value API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Values-From-Inferior.html#Values-From-Inferior) (equivalent to LLDB's `SBValue`)
8+
* [Type API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Types-In-Python.html#Types-In-Python) (equivalent to LLDB's `SBType`)
9+
* [Type Printing API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Type-Printing-API.html#Type-Printing-API) (equivalent to LLDB's `SyntheticProvider.get_type_name`)

src/debuginfo/intro.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Debug Info
2+
3+
Debug info is a collection of information generated by the compiler that allows debuggers to
4+
correctly interpret the state of a program while it is running. That includes things like mapping
5+
instruction addresses to lines of code in the source file, and type layout information so that
6+
bytes in memory can be read and displayed in a meaningful way.
7+
8+
Debug info can be a slightly overloaded term, covering all the layers between Rust MIR, and the
9+
end-user seeing the output of their debugger onscreen. In brief, the stack from beginning to end is
10+
as follows:
11+
12+
1. Rustc inspects the MIR and communicates the relevant source, symbol, and type information to LLVM
13+
2. LLVM translates this information into a target-specific debug info format during compilation
14+
3. A debugger reads and interprets the debug info, mapping source-lines and allowing the debugee's
15+
variables in memory to be located and read with the correct layout
16+
4. Built-in debugger formatting and styling is applied to variables
17+
5. User-defined scripts are run, formatting and styling the variables further
18+
6. The debugger frontend displays the variable to the user, possibly through the means of additional
19+
API layers (e.g. VSCode extension by way of the
20+
[Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/))
21+
22+
23+
> NOTE: This subsection of the dev guide is perhaps more detailed than necessary. It aims to collect
24+
> a large amount of scattered information into one place and equip the reader with as firm a grasp of
25+
> the entire debug stack as possible.
26+
>
27+
> If you are only interested in working on the visualizer
28+
> scripts, the information in the [debugger-visualizers](./debugger-visualizers.md) and
29+
> [testing](./testing.md) will suffice. If you need to make changes to Rust's debug node generation,
30+
> please see [rust-codegen](./rust-codegen.md). All other sections are supplementary, but can be
31+
> vital to understanding some of the compromises the visualizers or codegen need to make. It can
32+
> also be valuable to know when a problem might be better solved in LLVM or the debugger itself.
33+
34+
# DWARF
35+
36+
The is the primary debug info format for `*-gnu` targets. It is typically bundled in with the
37+
binary, but it [can be generated as a separate file](https://gcc.gnu.org/wiki/DebugFission). The
38+
DWARF standard is available [here](https://dwarfstd.org/).
39+
40+
> NOTE: To inspect DWARF debug info, [gimli](https://crates.io/crates/gimli) can be used
41+
> programatically. If you prefer a GUI, the author recommends [DWEX](https://github.com/sevaa/dwex)
42+
43+
# PDB/CodeView
44+
45+
The primary debug info format for `*-msvc` targets. PDB is a proprietary container format created by
46+
Microsoft that, unfortunately,
47+
[has multiple meanings](https://docs.rs/ms-pdb/0.1.10/ms_pdb/taster/enum.Flavor.html).
48+
We are concerned with ordinary PDB files, as Portable PDB is used mainly for .Net applications. PDB
49+
files are separate from the compiled binary and use the `.pdb` extension.
50+
51+
PDB files contain CodeView objects, equivalent to DWARF's tags. CodeView, the debugger that
52+
consumed CodeView objects, was originally released in 1985. Its original intent was for C debugging,
53+
and was later extended to support Visual C++. There are still minor alterations to the format to
54+
support modern architectures and languages, but many of these changes are undocumented and/or
55+
sparsely used.
56+
57+
It is important to keep this context in mind when working with CodeView objects. Due to its origins,
58+
the "feature-set" of these objects is very limited, and focused around the core features of C. It
59+
does not have many of the convenience or features of modern DWARF standards. A fair number of
60+
workarounds exist within the debug info stack to compensate for CodeView's shortcomings.
61+
62+
Due to its proprietary nature, it is very difficult to find information about PDB and CodeView. Many
63+
of the sources were made at vastly different times and contain incomplete or somewhat contradictory
64+
information. As such this page will aim to collect as many sources as possible.
65+
66+
* [CodeView 1.0 specification](./CodeView.pdf)
67+
* LLVM
68+
* [CodeView Overview](https://llvm.org/docs/SourceLevelDebugging.html#codeview-debug-info-format)
69+
* [PDB Overview and technical details](https://llvm.org/docs/PDB/index.html)
70+
* Microsoft
71+
* [microsoft-pdb](https://github.com/microsoft/microsoft-pdb) - A C/C++ implementation of a PDB
72+
reader. The implementation does not contain the full PDB or CodeView specification, but does
73+
contain enough information for other PDB consumers to be written. At time of writing (Nov 2025),
74+
this repo has been archived for several years.
75+
* [pdb-rs](https://github.com/microsoft/pdb-rs/) - A Rust-based PDB reader and writer based on
76+
other publicly-available information. Does not guarantee stability or spec compliance. Also
77+
contains `pdbtool`, which can dump PDB files (`cargo install pdbtool`)
78+
* [Debug Interface Access SDK](https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk).
79+
While it does not document the PDB format directly, details can be gleaned from the interface
80+
itself.
81+
82+
# Debuggers
83+
84+
Rust supports 3 major debuggers: GDB, LLDB, and CDB. Each has its own set of requirements,
85+
limitations, and quirks. This unfortunately creates a large surface area to account for.
86+
87+
> NOTE: CDB is a proprietary debugger created by Microsoft. The underlying engine also powers
88+
>WinDbg, KD, the Microsoft C/C++ extension for VSCode, and part of the Visual Studio Debugger. In
89+
>these docs, it will be referred to as CDB for consistency
90+
91+
While GDB and LLDB do offer facilities to natively support Rust's value layout, this isn't
92+
completely necessary. Rust currently outputs debug info very similar to that of C++, allowing
93+
debuggers without Rust support to work with a slightly degraded experience. More detail will be
94+
included in later sections, but here is a quick reference for the capabilities of each debugger:
95+
96+
| Debugger | Debug Info Format | Native Rust support | Expression Style | Visualizer Scripts |
97+
| --- | --- | --- | --- | --- |
98+
| GDB | DWARF | Full | Rust | Python |
99+
| LLDB | DWARF and PDB | Partial | C/C++ | Python |
100+
| CDB | PDB | None | C/C++ | Natvis |
101+
102+
> IMPORTANT: CDB can be assumed to run only on Windows. No assumptions can be made about the OS
103+
>running GDB or LLDB.
104+
105+
## Unsupported
106+
107+
Below, are several unsupported debuggers that are of particular note due to their potential impact
108+
in the future.
109+
110+
* [Bugstalker](https://github.com/godzie44/BugStalker) is an x86-64 Linux debugger written in Rust,
111+
specifically to debug Rust programs. While promising, it is still in early development.
112+
* [RAD Debugger](https://github.com/EpicGamesExt/raddebugger) is a Windows-only GUI debugger. It has
113+
a custom debug info format that PDB is translated into. The project also includes a linker that can
114+
generate their new debug info format during the linking phase.

0 commit comments

Comments
 (0)