Menno Markus

Part 3: Software Instruction Breakpoints

May 02, 202611 min read

The code for this part can be found here. Feeling lost? Consider starting at part 1.

Last time we managed to set our first breakpoint using hardware registers. While quite powerful, we could only set 4 of them at the same time. In fact, because we depend on hardware, if you are on ARM you might only get 2 or even just 1! This is why nearly all debuggers use software breakpoints instead. Or at least they do for instruction breakpoints. Data breakpoints are often still left to the hardware registers and their limitations. In this part we’ll look at implementing software instruction breakpoints. Than in the next part we’ll see if we can come up with a data breakpoint equivalent.

As implied by the name, software breakpoints are implemented in software by overriding part of the code we are executing. X86 has a special instruction for this called interrupt 3 or int3. When the CPU encounters an interrupt it looks up the corresponding entry in the “interrupt vector table” for a function to call. This is normally used for events such as “illegal instruction” or “invalid memory access”. But in the case of int3 the OS has placed a handler here that raises a breakpoint exception.

The int3 instruction is very convenient as it’s encoding is 0xCC 1, occupying just 1 byte. This means we can override the start of any other instruction, turning it into a breakpoint when the CPU tries to execute it. On X86 instructions are between 1 and 16-bytes, so if int3 were 2-bytes or more we’d risk overriding part of the next instruction.

The plan will be that we write int3 to the address we want to break on. Than when the breakpoint exception hits we can step back the debugee, restore the original instruction and continue execution. We than reapply the int3 so the breakpoint hits again.

Overwriting memory

To replace an instruction with int3 we must read and write the memory of our debugee process. Windows provides ReadProcessMemory and WriteProcessMemory to do this. These might give a sense of deja vu. Similar to the previous part, when we tried to set the thread context, these functions require we open a privileged process handle. Additionally any instruction memory is likely marked execute-only as a security feature by the OS. We’ll first have to make it readable/writable before we can do anything.

Recall that the DEBUG_EVENT gave us the debugee process id 2. We obtain a process handle with 3 permissions:

Once done, we have to close the handle again. If you are not familiar with Zigs defer keyword, it just means: “Execute this code when leaving the scope.” This ensures we close the handle when the function returns.

const process_access = win.PROCESS_VM_READ | win.PROCESS_VM_WRITE | win.PROCESS_VM_OPERATION;
const process_handle = win.OpenProcess(process_access, win.FALSE, process_id);
defer _ = win.CloseHandle(process_handle);

The last permission requires some explanation. The OS allocates memory in pages, often 4Kb each. Even if you try to alloc something smaller, somewhere an allocator has to request a full page from the OS before dividing it up into smaller parts returned to you. Code too is also placed in a page. Each of these pages has a set of protection flags and PROCESS_VM_OPERATION gives us the privilege to change them.

This is exactly what we’ll do next using VirtualProtectEx. We’ll ensure the page has the PAGE_READWRITE flag before we attempt to do so. When done, we restore the original page permissions (using defer) so it’s as nothing happened.

Notice the Windows API asks us to pass the address as a pointer. It’s important to remember though, this address isn’t pointing into our memory space but that of the debugee instead. So it’s not valid to try and access it.

// Instruction memory is likely be protected, ensure we can read and write to the memory.
var old_page_protection: win.DWORD = undefined;
if (win.VirtualProtectEx(
    process_handle,         // hProcess
    @ptrFromInt(address),   // lpAddress
    @sizeOf(u8),            // dwSize
    win.PAGE_READWRITE,     // flNewProtect
    &old_page_protection,   // lpflOldProtect
) == win.FALSE) {
    return error.VirtualProtectFailed;
}

// Ensure we restore the memory protection after we are done writing to it.
var ignore: win.DWORD = undefined;
defer _ = win.VirtualProtectEx(
    process_handle,         // hProcess
    @ptrFromInt(address),   // lpAddress
    @sizeOf(u8),            // dwSize
    old_page_protection,    // flNewProtect
    &ignore,                // lpflOldProtect
);

We now have the permission to actually read and write memory. We’ll implement our function as swapProcessByte, replacing 1-byte at the target address and returning whatever we overwrote. This will allow us to read the original instruction byte, than replace it with int3.

var byte_read: u8 = undefined;
if (win.ReadProcessMemory(
    process_handle,         // hProcess
    @ptrFromInt(address),   // lpBaseAddress
    &byte_read,             // lpBuffer
    @sizeOf(u8),            // nSize
    null,                   // lpNumberOfBytesRead
) == win.FALSE) {
    return error.ReadProcessMemoryFailed;
}

if (win.WriteProcessMemory(
    process_handle,         // hProcess
    @ptrFromInt(address),   // lpBaseAddress
    &byte_to_write,         // lpBuffer
    @sizeOf(u8),            // nSize
    null,                   // lpNumberOfBytesWritten
) == win.FALSE) {
    return error.WriteProcessMemoryFailed;
}

Finally an important detail is to flush the instruction cache. Because code is stored in main memory it can be slow to access and decode. To speed up execution, the CPU will copy code into a fast instruction cache. If we don’t tell it the cache is outdated, it might not pick up on the changes we made.

_ = win.FlushInstructionCache(process_handle, @ptrFromInt(address), @sizeOf(u8));
return byte_read;

Setting a breakpoint

Using our new swapProcessByte function it becomes trivial to replace the implementation of setBreakpoint. We’ll write the int3 instruction and record the original byte and location we replaced.

const breakpoint_instruction: u8 = 0xCC; // Equal to int3 instruction.
var breakpoint_address: usize = undefined;
var orginal_instruction_byte: u8 = undefined;

pub fn setBreakpoint(process_id: win.DWORD, break_at_address: usize) !void {
    orginal_instruction_byte = try swapProcessByte(process_id, break_at_address, breakpoint_instruction);
    breakpoint_address = break_at_address;
    log.info("Successfully set breakpoint at address 0x{X}", .{break_at_address});
}

Handling a breakpoint hit

Handling the breakpoint is more complicated. Let’s start by replacing the EXCEPTION_SINGLE_STEP check for hardware breakpoints with the more appropriately named EXCEPTION_BREAKPOINT raised when int3 is hit. We also grab the address triggering the exception to check if it’s our breakpoint.

if (debug_event.dwDebugEventCode == win.EXCEPTION_DEBUG_EVENT) {
    const exception_code = debug_event.u.Exception.ExceptionRecord.ExceptionCode;
    const exception_addr = debug_event.u.Exception.ExceptionRecord.ExceptionAddress;

    // Received the signal the debugee is ready.
    if (exception_code == 0xE0000001) {
        // ...
        try setBreakpoint(process_id, break_at_address);
    }

    // A software breakpoint has been hit.
    if (exception_code == win.EXCEPTION_BREAKPOINT) {
        log.info("Breakpoint hit! Continue?", .{});
        _ = try stdin.takeDelimiter('\n') orelse unreachable;

        try handleBreakpoint(process_id, thread_id, exception_addr);
    }
}

To continue from our breakpoint we must restore and execute the original instruction we replaced.

pub fn handleBreakpoint(process_id: win.DWORD, thread_id: win.DWORD, exception_address: win.PVOID) !void {
    if (breakpoint_address != @intFromPtr(exception_address)) {
        return; // Make sure it's our breakpoint.
    }

    // Restore the original instruction so we can execute it.
    _ = try swapProcessByte(process_id, breakpoint_address, orginal_instruction_byte);

If you remember from the previous part, an exception is either a fault, trap or abort. The breakpoint exception is a trap, meaning it completed executing the int3 instruction. Therefore we must move the instruction pointer back 1 byte to execute the original instruction.

We must also re-apply the int3 after executing the original instruction, so the breakpoint is set again. X86 has another feature which can help, the trap flag found on the EFlags register. This tells the CPU to execute only 1 instruction before raising EXCEPTION_SINGLE_STEP again. This is the same exception our hardware breakpoints used, and it’s name makes a bit more sense now.

We use the GetThreadContext/SetThreadContext implemented in the previous part to modify our registers. Than execute and re-apply the breakpoint.

    const thread_handle, var thread_ctx = try getThreadContext(thread_id);
    {
        // Step back one instruction, to right before our breakpoint got executed.
        thread_ctx.Rip = breakpoint_address;

        // Tell the CPU to only execute 1 instruction before raising an exception again.
        const trap_flag_bit = 0x00000100;
        thread_ctx.EFlags |= trap_flag_bit;
    }
    try setThreadContext(thread_handle, thread_ctx);

    // Execute the restored instruction.
    var debug_event = std.mem.zeroes(win.DEBUG_EVENT);
    _ = win.ContinueDebugEvent(process_id, thread_id, win.DBG_CONTINUE);
    _ = win.WaitForDebugEvent(&debug_event, win.INFINITE);

    std.debug.assert(debug_event.dwDebugEventCode == win.EXCEPTION_DEBUG_EVENT);
    const exception_code = debug_event.u.Exception.ExceptionRecord.ExceptionCode;
    std.debug.assert(exception_code == win.EXCEPTION_SINGLE_STEP);

    // Reinsert the breakpoint.
    orginal_instruction_byte = try swapProcessByte(process_id, breakpoint_address, breakpoint_instruction);

    log.info("Resumed execution.", .{});
}

Result

Trying out this code you’ll notice something strange: we immediately hit a breakpoint that’s not ours! This is actually a Windows feature called the initial breakpoint. When Windows has loaded our program into memory, but before execution begins, it triggers a breakpoint. This is normally the point at which real debuggers can setup any breakpoints required. We’ll simply ignore it though.

// A software breakpoint has been hit.
if (exception_code == win.EXCEPTION_BREAKPOINT) {
    if (!initial_breakpoint_hit) {
        initial_breakpoint_hit = true;
    } else {
        log.info("Breakpoint hit! Continue?", .{});
        _ = try stdin.takeDelimiter('\n') orelse unreachable;

        try handleBreakpoint(process_id, thread_id, exception_addr);
    }
}

Now let’s see if our software breakpoint works…

info(debugee): Address of doWork() == 0x7FF6FC771720
info(debugee): Address of counter == 0x7FF6FC839600
info(debugee): Waiting for debugger...
info(debugger): Set software breakpoint at address?
0x7FF6FC771720
info(debugger): Successfully set breakpoint at address 0x7FF6FC771720
info(debugee): Starting work!
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugee): 0
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugee): 1
info(debugger): Breakpoint hit! Continue?

Identical to our hardware breakpoints, but without the same limitations! Of course hardware breakpoints still have their strengths. They don’t require modifying code (which might not be possible), work per thread and can be used as data breakpoints too. Therefore most debuggers allow you to set both types. But any mainstream debugger is likely to use software instruction breakpoints the most.

In the next part we’ll explore if we can replace data breakpoints with a software version too. We’ll into overcoming their hardware limitations too, allowing to place more than 4 at once and watch memory beyond 8 bytes.

Footers

  1. You might often see 0xCC used to initialise undefined memory, you can probably see why now. Anything trying to execute it would immediately break in a debugger.

    1
  2. When we called CreateProcessA the PROCESS_INFORMATION struct also returned a process handle. Though just like the thread handle it returns, it might not have the right privileges.

    1
< Part 2: Hardware Breakpoints   •   Part 4: Software Data Breakpoints >
Return to Home