Menno Markus

Part 4: Software Data Breakpoints

May 02, 20269 min read

The code for this part can be found here. Feeling lost? Consider starting at part 1.

Data breakpoints allow us to break on data reads and writes. Most mainstream debuggers will stick to using hardware breakpoints for this, as discussed in part 2. However this limits us to the architecture, allowing only 4 breakpoints watching up to 8-bytes each on X86_64. If the debugger ever complained to you about this, you know how annoying it can be! Maybe you are trying to catch a memory stomp over on an array or observe when a vec3 structure is accessed!

In the previous part we overcame similar limitations for stepping though code by adding software instruction breakpoints. With some creativity we can implement a software equivalent for data breakpoints to. A feature I wished more debuggers would implement.

Guard pages

By now you’ll have noticed breakpoints almost always work by triggering an exception the debugger can catch. So what we need is an exception we can set on data access. Luckily there is a feature for this called guard pages. Recall from the previous part that all memory is split into pages, often 4Kb each. Each page has permissions on how they can be accessed. The PAGE_GUARD modifier will prevent any memory access and trigger an exception when tried. This is commonly used by the OS catch errors or implement dynamic growth of memory like the stack.

We use this to implement our data breakpoint, marking the page holding the data we want to watch as a guard page 1. Do note that we can only mark entire 4Kb pages. Any access outside the data we want to watch but sharing the same page(s) will also trigger an exception. We’ll have to deal with these false positives when we go to handle our breakpoint.

Changing the page protection using VirtualProtectEx should be familiar from the previous part. VirtualQueryEx is new, but simply returns the page information used to add/remove the guard flag. Both these functions require a process handle with the correct permissions to call.

var breakpoint_address: usize = undefined;
var breakpoint_byte_count: usize = undefined;
var breakpoint_memory_info = std.mem.zeroes(win.MEMORY_BASIC_INFORMATION);

pub fn setBreakpoint(
    process_id: win.DWORD, 
    break_at_address: usize, 
    break_byte_count: usize
) !void {
    // Ensure we obtain the handle with page protection and query info permissions.
    const process_access = win.PROCESS_QUERY_INFORMATION | win.PROCESS_VM_OPERATION;
    const process_handle = win.OpenProcess(process_access, win.FALSE, process_id);
    defer _ = win.CloseHandle(process_handle);

    // Get the current page protection of the memory we want to watch.
    if (win.VirtualQueryEx(
        process_handle,                                     // hProcess
        @ptrFromInt(break_at_address),                      // lpAddress
        &breakpoint_memory_info,                            // lpBuffer
        @sizeOf(win.MEMORY_BASIC_INFORMATION),              // dwLength
    ) == win.FALSE) {
        return error.VirtualQueryExFailed;
    }

    // Change the page protection to add the guard page flag, prohibiting any access.
    var ignore: win.DWORD = undefined;
    if (win.VirtualProtectEx(
        process_handle,                                     // hProcess
        breakpoint_memory_info.BaseAddress,                 // lpAddress
        breakpoint_memory_info.RegionSize,                  // dwSize
        breakpoint_memory_info.Protect | win.PAGE_GUARD,    // flNewProtect
        &ignore,                                            // lpflOldProtect
    ) == win.FALSE) {
        return error.VirtualProtectFailed;
    }

    breakpoint_address = break_at_address;
    breakpoint_byte_count = break_byte_count;
    log.info("Successfully set breakpoint at address 0x{X}", .{break_at_address});
}

Guard page exceptions

When a guard page is accessed it raises the STATUS_GUARD_PAGE_VIOLATION exception. This exception is undocumented by Microsoft, but some searching around reveals it returns the same information as EXCEPTION_ACCESS_VIOLATION. The first element it holds is the type of access that caused the exception: read, write or execute. The second element gives the address trying to be accessed. We can use this to filter out any false positives not within the data we want to watch.

if (exception_code == win.STATUS_GUARD_PAGE_VIOLATION) {
    const has_params = debug_event.u.Exception.ExceptionRecord.NumberParameters >= 2;
    const access_type = debug_event.u.Exception.ExceptionRecord.ExceptionInformation[0];
    const access_addr = debug_event.u.Exception.ExceptionRecord.ExceptionInformation[1];

    const execute_access = 8;
    //const write_access = 1;
    //const read_access = 0;

    const start_addr = breakpoint_address;
    const end_addr = breakpoint_address + breakpoint_byte_count;
    const breakpoint_hit = start_addr <= access_addr and access_addr < end_addr;

    if (has_params and access_type != execute_access and breakpoint_hit) {
        log.info("Breakpoint hit! Continue?", .{});
        _ = try stdin.takeDelimiter('\n') orelse unreachable;
    }

    // If this is not our data watch, immediately continue.
    try handleBreakpoint(process_id, thread_id);
}

When the guard page exception triggers it automatically removes the guard page flag. This exception is classified as a fault, which if you remember means the instruction causing the exception has not completed executing. This means we can single step the CPU to allow through the instructions access to our page. In the previous part we showed how this was done with the trap flag.

Finally we re-apply the guard page flag so our breakpoint remains set.

pub fn handleBreakpoint(process_id: win.DWORD, thread_id: win.DWORD) !void {
    // Ensure we obtain the process handle with page protection permissions.
    const process_access = win.PROCESS_VM_OPERATION;
    const process_handle = win.OpenProcess(process_access, win.FALSE, process_id);
    defer _ = win.CloseHandle(process_handle);

    // The guard page flag is automatically removed upon triggering it.
    // Allow the CPU to execute the memory access instruction than re-apply the guard flag.

    const thread_handle, var thread_ctx = try getThreadContext(thread_id);
    {
        // Tell the CPU to only execute 1 instruction before raising an exception again.
        const trap_flag_bit = 0x00000100;
        thread_ctx.EFlags |= trap_flag_bit;
    }
    try setThreadContext(thread_handle, thread_ctx);

    // Execute the instruction.
    var debug_event = std.mem.zeroes(win.DEBUG_EVENT);
    _ = win.ContinueDebugEvent(process_id, thread_id, win.DBG_CONTINUE);
    _ = win.WaitForDebugEvent(&debug_event, win.INFINITE);

    std.debug.assert(debug_event.dwDebugEventCode == win.EXCEPTION_DEBUG_EVENT);
    const exception_code = debug_event.u.Exception.ExceptionRecord.ExceptionCode;
    std.debug.assert(exception_code == win.EXCEPTION_SINGLE_STEP);

    // Reapply the guard flag.
    var ignore: win.DWORD = undefined;
    if (win.VirtualProtectEx(
        process_handle,                                     // hProcess
        breakpoint_memory_info.BaseAddress,                 // lpAddress
        breakpoint_memory_info.RegionSize,                  // dwSize
        breakpoint_memory_info.Protect | win.PAGE_GUARD,    // flNewProtect
        &ignore,                                            // lpflOldProtect
    ) == win.FALSE) {
        return error.VirtualProtectFailed;
    }
    log.info("Resumed execution.", .{});
}

Result

To test if we were successful in lifting restriction, let’s increase our debugee counter variable from u64 (8-bytes) to u128 (16-bytes). A size hardware breakpoints can’t watch. Running the debugger…

info(debugee): Address of doWork() == 0x7FF73ABB1720
info(debugee): Address of counter == 0x7FF73AC79600
info(debugee): Waiting for debugger...
info(debugger): Set software data breakpoint at address?
0x7FF73AC79600
info(debugger): Memory byte size?
16
info(debugger): Successfully set breakpoint at address 0x7FF73AC79600
info(debugee): Starting work!
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugee): 0
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugger): Breakpoint hit! Continue?
y
info(debugger): Resumed execution.
info(debugee): 1
info(debugger): Breakpoint hit! Continue?

It works but… aren’t we hitting the breakpoint a lot? Lets consider the code we are running:

var counter: u128 = 0;
fn doWork() void {
    log.info("{}", .{counter});
    counter += 1;
}

You might break this down into 3 operations:

But there are 6 hits of our breakpoint for each call to doWork(). This starts to make sense if you consider that the compiler might had to split reading/writing a 128-bit value into two 64-bit accesses. Each access to counter involves first touching the lower 64-bits we are watching, than touching the higher 64-bits. A look at the actual assembly output confirms this theory.

So great! We managed to create software data breakpoints! Of course, just like with software instruction breakpoints, hardware breakpoints aren’t obsolete. A lot of debuggers still prefer them for data breakpoints. You might not be able to change page protection flags. And having to watch an entire 4Kb page can generate a lot of false positives, slowing down the program. But it’s certainly a useful tool to be aware of. Even in the absence of debugger support, it’s not difficult to implement this in our own programs.

In the next part we’ll look into how we can remove some of the overhead repeatedly breaking has.

Footnotes

  1. The OS or program may already use guard pages to implement specific features. As such a real debugger should check if a page is already a guard page before trying to change it. It would be bad if we accidentally removed the guard flag.

    1
< Part 3: Software Instruction Breakpoints   •   Part 5: Conditional Breakpoints >
Return to Home