Overview
Memory monitoring and debugging involves identifying common access errors and tracing memory-related events across user and kernel space. Typical memory access errors include out-of-bounds access, use-after-free, double free, memory leaks, and stack overflow.
Common memory access errors
- Out-of-bounds access
- Use after free
- Double free
- Memory leak
- Stack overflow
Event sources for tracing memory activity
| Event type | Source |
|---|---|
| User-space memory allocation | uprobes to track allocator functions; USDT probes in libc |
| Kernel-space memory allocation | kprobes for allocator functions and kmem tracepoints |
| Heap growth | brk syscall tracepoint |
| Shared memory functions | system call tracepoints |
| Page faults | kprobes, software events, exception tracepoints |
| Page migration | migration tracepoints |
| Page compaction | compaction tracepoints |
| VM scanner | vmscan tracepoints |
| Memory access cycles | PMC |
USDT probes in libc
Processes using the libc allocator call functions such as malloc() and free(). libc includes USDT probes that can be used to observe allocator behavior from the application level.
# sudo bpftrace -l usdt:/lib/x86_64-linux-gnu/libc-2.31.so
usdt:/lib/x86_64-linux-gnu/libc-2.31.sosetjmp
usdt:/lib/x86_64-linux-gnu/libc-2.31.solongjmp
usdt:/lib/x86_64-linux-gnu/libc-2.31.solongjmp_target
usdt:/lib/x86_64-linux-gnu/libc-2.31.solll_lock_wait_private
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_arena_max
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_arena_test
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_tunable_tcache_max_bytes
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_tunable_tcache_count
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_tunable_tcache_unsorted_limit
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_trim_threshold
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_top_pad
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_mmap_threshold
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_mmap_max
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_perturb
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_mxfast
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_heap_new
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_arena_reuse_free_list
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_arena_reuse
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_arena_reuse_wait
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_arena_new
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_arena_retry
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_sbrk_less
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_heap_free
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_heap_less
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_tcache_double_free
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_heap_more
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_sbrk_more
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_malloc_retry
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_memalign_retry
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt_free_dyn_thresholds
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_realloc_retry
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_calloc_retry
usdt:/lib/x86_64-linux-gnu/libc-2.31.somemory_mallopt
OOM killer tracing
Use kprobes to trace the oom_kill_process() function in order to capture OOM killer events. Reading /proc/loadavg provides system load averages which give context about overall system activity when OOM occurs.
static void oom_kill_process(struct oom_control *oc, const char *message)
# cat /proc/loadavg
0.05 0.10 0.13 1/875 23359
memleak
memleak can trace allocation and free events and capture associated call stacks. Over time it can reveal memory that is not freed. For user-space processes, memleak tracks user allocators such as malloc(), calloc(), and free(). For kernel memory it uses kmem tracepoints:
kmem:kfree [Tracepoint event]
kmem:kmalloc [Tracepoint event]
kmem:kmalloc_node [Tracepoint event]
kmem:kmem_cache_alloc [Tracepoint event]
kmem:kmem_cache_alloc_node [Tracepoint event]
kmem:kmem_cache_free [Tracepoint event]
kmem:mm_page_alloc [Tracepoint event]
kmem:mm_page_free [Tracepoint event]
percpu:percpu_alloc_percpu [Tracepoint event]
percpu:percpu_free_percpu [Tracepoint event]
Example: simulate a leak
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
long long *fibonacci(long long *n0, long long *n1) {
/* allocate space for observation */
long long *v = (long long *) calloc(1024, sizeof(long long));
*v = *n0 + *n1;
return v;
}
void *child(void *arg) {
long long n0 = 0;
long long n1 = 1;
long long *v = NULL;
int n = 2;
for (n = 2; n > 0; n++) {
v = fibonacci(&n0, &n1);
n0 = n1;
n1 = *v;
printf("%dth => %lld\n", n, *v);
sleep(1);
}
}
int main(void) {
pthread_t tid;
pthread_create(&tid, NULL, child, NULL);
pthread_join(tid, NULL);
printf("main thread exit\n");
return 0;
}
Run the program and monitor system memory with vmstat:
# vmstat 3
The "free", "buff", and "cache" columns are shown in KB and represent free memory, buffer memory used for block I/O, and file system cache respectively. The "si" and "so" columns show page-ins and page-outs if they occur. The first line is averaged since system boot; subsequent lines show per-interval statistics.
When monitored, free memory may slowly decrease while buff and cache fluctuate less. To profile the leaking process, find its PID and run memleak with the PID option:
# ps aux | grep app
# sudo /usr/sbin/memleak-bpfc -p 6867
memleak output can show the leaking call stack, for example:
fibonacci+0x23 [leak]
child+0x5a [leak]
This indicates the allocated buffer pointed to by
*v was not freed, causing a leak. After adding the appropriate free() call in the code and re-running the process, the leak report should no longer appear.
Note: memleak alone cannot determine whether allocations are true leaks (unreferenced and never freed) versus intended long-lived allocations. Distinguishing these requires understanding the intended code paths.
If memleak is run without the -p PID option, it traces kernel memory allocation events globally.
mmapsnoop
mmapsnoop uses the syscalls:sys_enter_mmap tracepoint to observe mmap system calls system-wide and print details of mapping requests. Applications may call mmap() explicitly to load files or create large segments, or allocators may use mmap() for large allocations instead of brk(). Munmap() can return those mappings to the system.
syscalls:sys_enter_mmap [Tracepoint event]
brkstack
Heap memory expansion is performed via brk. Tracing brk and showing the user-space call stack that triggered growth can be valuable for analysis. sbrk is a libc wrapper that uses brk internally.
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_brk { printf("%s\n", comm); }'
shmsnoop
shmsnoop traces System V shared memory system calls: shmget, shmat, shmdt, and shmctl. It helps debug shared memory usage by showing which processes allocate and operate on shared segments and their parameters. Shared memory allows unrelated processes to access the same logical memory region, but it does not provide synchronization; separate synchronization primitives are required to coordinate access.
asmlinkage long sys_shmget(key_t key, size_t size, int flag);
asmlinkage long sys_shmctl(int shmid, int cmd, struct shmid_ds __user *buf);
faults
Tracing page faults and the associated call stacks provides insight into memory usage growth. Page faults directly contribute to RSS growth, so captured stacks can explain increases in a process's resident set size.
vmscan and drsnoop
vmscan tracepoints observe the page reclaim daemon (kswapd) activity, which frees memory when system pressure increases.
drsnoop traces direct reclaim using mm_vmscan_direct_reclaim_begin and mm_vmscan_direct_reclaim_end tracepoints to show which processes were affected and the latency caused by direct reclaim. This helps quantify the performance impact of memory pressure on applications.
The direct reclaim path is:
__alloc_pages_slowpath() -> __alloc_pages_direct_reclaim() -> __perform_reclaim() -> try_to_free_pages() -> do_try_to_free_pages() -> shrink_zones() -> shrink_zone()
Within __alloc_pages_slowpath(), kswapd threads for nodes may be woken depending on conditions; whether kswapd is woken per node depends on those conditions.
ALLPCB