Skip to content

Latest commit

 

History

History
334 lines (210 loc) · 20.1 KB

index.md

File metadata and controls

334 lines (210 loc) · 20.1 KB

title: Annotated Debugger Implementation Bibliography ...

This is a hierarchical annotated bibliography of resources related to the development and functioning of debuggers, with a particular emphasis on debugging Go executables and even more in particular about the Delve debugger.

Table of Contents

  1. Introductory material
  2. Delve specific resources
  3. Target layer resources
    1. 86x64 / AMD64
    2. Windows
    3. Linux
    4. Other
  4. Symbolic layer resources
  5. Miscellaneous stuff

Introductory material

  • Backtrace.io series on implementing a debugger

    Blog series about writing a debugger for linux using ptrace and DWARF debug symbols.

  • Michał Łowicki: making a debugger for Golang

    Blog series about writing a simple debugger for linux using ptrace and the Go debugging symbols (gosymtab and gopclntab). In a real debugger you will probably want to use DWARF debug symbols instead of those.

  • Liz Rice: Debuggers from Scratch (Gophercon UK 2018)

    Recording of a talk (and written version of the talk) about writing a simple debugger for linux using ptrace and the Go debugging symbol (gosymtab and gopclntab). In a real debugger you will probably want to use DWARF debug symbols instead of those.

    Both this and Michał Łowicki blog series above suffer from a relatively common pitfall: golang/go#28315.

  • Microsoft: Creating a Basic Debugger

    Microsoft tutorial on creating a debugger for Windows using ContinueDebugEvent/WaitForDebugEvent and other related Win32 API.

  • Jonathan B. Rosenberg: How Debuggers Work

    The only book I could find about writing debuggers. It explains how to write a debugger for Windows using Win32 API calls and the STI debug format (aka CodeView debug format, which is what Microsoft compilers used to produce until Visual Studio 4). It's pretty outdated, being written in 1996 and the "step over" algorithm is, AFAICT, needlessly complicated and wrong.

    Not recommended.

     How Debuggers Work (Algorithms, Data Structures, and Architecture)
     Jonathan B. Rosenberg, 1996
     Wiley Computer Publishing (John Wiley & Sons, Inc.)
     ISBN 0-471-14966-7
    
  • David J. Agans: Debugging

    This doesn't have anything to do with writing debuggers. Instead it's a book about debugging that I really like. It isn't even about using debuggers in particular, it just talks about how to approach a debugging problem in general. The 9 rules it outlays are crucial to using debuggers effectively.

     Debugging—The Nine Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
     David J. Agans, 2002
     American Management Association
     ISBN 0−8144−7168−4
    
  • Tim Misiak's stuff

    Former Microsoft Debugger Platform engineer, he worked on WinDbg and KD. Has a blog where he talks about debuggers (including a tutorial on implementing a toy debugger for Windows in Rust) and a Youtube channel with interviews (also about debuggers).

Delve specific resources

  • Derek Parker: Advanced Go debugging with Delve (Fosdem 2018)

    Details of the Go runtime that make Delve necessary.

  • Alessandro Arzilli: Architecture of Delve (Gophercon Iceland 2018)

    My talk about Delve internals. Also describes the three layer architecture of Delve (UI, symbolic, target) which is appropriate for other debuggers as well.

    If you want to contribute to Delve this is probably the quickest introduction there is.

    Also contains a description of a Step Over algorithm that actually works, unlike other algorithms you might find on the internet.

  • How to write a Delve Client

    Tutorial on how to write a client for Delve (for example an editor plugin using delve to debug code).

Target Layer Resources

This lists reference useful for writing the "target layer" of a debugger, i.e. that part of the debugger that is responsible for managing the target process and manipulating its memory.

86x64 / AMD64

Windows

  • Microsoft: Basic Debugging

    MSDN entry point for documentation about the Win32 debugging API.

  • Minidump file format

    This is the file format used on Windows to record core dumps of running applications. It's called minidump to distinguish it from crashdumps, which are full-system kernel dumps.

    Unlike linux and macOS, which use the same file format for executables and core dumps, the file format for core dumps on Windows is completely different from executables.

    It can be produced automatically by Windows, by a WinDbg command or by using the Procdump utility.

    A minidump is divide into streams: it has a header, followed by a stream directory and then a bunch of streams. Each stream either describes things about the process in general, about a thread in particular or contains a chunk of memory from the dumped process.

    To read a minidump start reading the header, get the offset of the stream directory from it, then read the stream directory and form that read the streams you need.

  • Thread Naming on Windows

    Windows has a couple of facilities to give names to threads to aid debugging, even if you don't care about supporting this you should know about it or you might get an exception that you don't know what to do with.

Linux

  • Linux Multithreading implementation

    Linux has a weird way of implementing threads. Basically, there are no threads. Instead a multithreaded process is actually a group of processes that share memory, file handles and signal handling.

    As a linux user you don't need to care about this, because the user-space utilities and libc do a decent job of hiding the complexity. If you are writing a debugger backend for linux, however, there is no way to avoid all the weirdness.

  • Ptrace

    Ptrace is the name of the POSIX syscall used to control a process you want to debug. It's very powerful but also very complicated and janky. Use with caution.

Other

  • Gdb Remote Serial Protocol

    This protocol was originally devised to debug programs running in environment that were too constrained to host the full gdb program, such as embedded processors or operating system kernels. The idea was that you would embed a small assembly level debugger, implementing only what I call the "target layer", and then connect gdb to it using this protocol and end up with a full symbolic debugger.

    Notable programs implementing this protocol are gdbserver, lldb-server, debugserver (a stripped down version of lldb-server available on macOS) and Mozilla RR.

    Beware that there are two different wire encodings for packets, the "binary" encoding and the "not-binary" encoding that differ on whether RLE compression is available and which character is the escape code. There is no good way to tell which packet uses which encoding and sometimes it isn't even documented.

  • Notes on hardware breakpoints/watchpoints

Symbolic Layer Resources

This section contains anything pertaining interpreting debug symbols and extracting them from executable files.

For a modern debugger you only need to be concerned with three executable formats (PE, ELF and Mach-O) and two debug formats (DWARF and PDB). Anything else is of historical interest only at this point.

  • Practical Binary Analysis

    This book has a good introduction to both ELF and PE, as well as other interesting things.

     Practical Binary Analysis
     Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly
     by Dennis Andriesse
     no starch press, December 2018, 456 pp.
     ISBN-13: 978-1-59327-912-7
    
  • Portable Executable (PE)

    This is the executable file format used on Windows. It obsoletes the MS-DOS MZ file format and is derived from COFF (Common Object File Format), an older UNIX executable file format.

  • Program Database (PDB)

    This is the debug format currently used in Windows. It is supported by Visual Studio, WinDbg and the DbgHelp library. Unfortunately it's also largely undocumented. Gcc, LLVM and Go do not produce debug symbols in this format, instead they opt for embedding DWARF symbols inside PE files even on Windows.

    Unlike stabs and DWARF this debug format is not embedded inside the executable file and lives instead in separate .PDB files.

  • Executable and Linkable Format (ELF) and System V release 4 Application Binary Interface

    This is the executable format used on Linux and most other unix-like operating systems. Originally introduced by UNIX System V release 4, it replaces COFF and the older a.out. It is used to represent executables, object files, shared objects and core dumps.

    If you start reading the source code of GDB you'll come across a file called solib-svr4.c the name is a reference to the document introducing the ELF file format: "Shared Object LIBrary - System V Release 4".

    • man elf: Linux manpage for ELF
    • System V Release 4 Application Binary Interface, in particular:
      • "Chapter 4: Object Files", original description of ELF
      • "Chapter 5: Program Loading and Dynamic Linking", also contains a description of the PT_DYNAMIC section, which is used to locate dynamically loaded libraries.
    • System V Application Binary Interface AMD64 Architecture Processor Supplement: supplement describing the architecture-specific parts of svr4 ABI for the amd64 architecture. Of particular interest to debuggers:
      • "Section 3.4.3: Auxiliary Vector" describes the format of the auxiliary vector on amd64
      • "Section 3.6: DWARF Definition" and Table 3.36 describes a mapping between DWARF register numbers and actual amd64 registers.
      • "Section 4.2.4: EH_FRAME sections" describes the format of the .eh_frame section. The document claims that the formats of eh_frame and debug_frame (defined by DWARF) are identical but this is not true.
    • ELF Handling For Thread-Local Storage: describes how Thread-Local Storage should be handled in ELF files. If you ever hear the words "local exec model" this is what you are looking for.
  • Mach-O

    The file format used on macOS to represent executables, dynamic libraries and core dumps.

  • Examining executable files

    To examine executable files you can use objdump on Linux or otool on macOS. My diexplorer can show the debug sections inside a browser window, with cross-references. Sometimes it is useful to examine executable files for an architecture other than the one you are using, diexplorer can do that, objdump from GNU's binutils can also do that, but only if it is build in a special way -- which Linux distributions usually don't do. See compiling a cross-platform objdump.

  • DWARF debug format

    This is the debug format used on most unix-like systems, including Linux and macOS. It obsoletes stabs.

    • Michael Eager: Introduction to the DWARF Debugging Format. An introduction to the DWARF format, mostly focuses on the debug_info section but also briefly touches on the other two main DWARF sections: debug_line and debug_frame.
    • DWARF version 2 the first version of the DWARF standard (version 1 was retconned out of existence because nobody liked it).
    • DWARF version 3: introduces the 64-bit version of DWARF to handle huge executable files.
    • DWARF version 4: adds debug_types section
    • DWARF version 5: removes debug_types section, major backwards-incompatible changes to the location and ranges sections, minor backwards incompatible changes to debug_info.
    • Ian Lance Taylor: .eh_frame: another description of the eh_frame section, a section derived from the format of debug_frame (see also "System V Application Binary Interface AMD64 Architecture Processor Supplement" above)

Miscellaneous stuff

  • Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code

    If you are interested in obsolete things, like me.

     Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code
     Reverend Bill Blunden, 2003
     Apress
     ISBN: 978-1-4302-0788-7
    
  • Embecosm: Howto: Porting the GNU Debugger

    A detailed document describing how to port GDB to a different CPU architecture. Also a good introduction to the GDB architecture.

  • r_debug and link_map

    The DT_DEBUG entry in the .dynamic section can be used to find out which shared libraries are used by a Linux program. Code using this entry is not portable.

    • /usr/include/elf/link.h
  • Mozilla RR Project

    A time traveling debugger backend. Can be used as a backend of any debugger that speaks the Gdb Remote Serial Protocol. By default it starts GDB as its frontend.

  • Peter B. Kessler: Fast Breakpoints

    Details of an implementation for fast conditional breakpoints using jumps to generated code.

     Fast Breakpoints
     Peter B. Kessler, 1990
     Proceedings of the ACM SIGPLAN '90
     White Plains, New York, June 20-22, 1990.
     DOI: 10.1.1.90.2322
    
  • Acid: A Debugger Built From A Language

    Plan 9's debugger, built around a programming language. The same programming language is used as the command line of Acid, to build most of Acid's functionality and by the compiler to describe debug symbols.

     Acid: A Debugger Built From A Language
     Phil Winterbottom, Lucent Technologies Inc.
     Proc. of the Winter 1994 USENIX Conf., pp. 211-222, San Francisco, CA
     DOI: 10.1.1.472.8070