Posted on Jun 1

From reversed linked-list to registers, memory and assembly

Reversing a linked-list is like the "Hello World" from programmers. In this article I'm gonna use it as an example (written below in C) to showcase how C code gets translated into assembly, with a specific focus on stack frames, register usage, memory layout, and the power of the popular debugging tool gdb.

#include <stdio.h>  typedef struct Node { int val; int id; struct Node* next; } Node; // Reverses a singly linked list and returns the new head Node* reverse(Node* head) { Node* prev = NULL; Node* curr = head; Node* next; while (curr != NULL) { next = curr->next; curr->next = prev; prev = curr; curr = next; } return prev; } // Prints the list void print_list(Node* head) { while (head != NULL) { printf("Node ID: %d, Value: %d\n", head->id, head->val); head = head->next; } } int main() { // Declare four nodes on the stack and manually connect them Node node_4 = {40, 4, NULL}; Node node_3 = {30, 3, &node_4}; Node node_2 = {20, 2, &node_3}; Node node_1 = {10, 1, &node_2}; printf("Original list:\n"); print_list(&node_1); // Reverse the list Node* new_head = reverse(&node_1); printf("\nReversed list:\n"); print_list(new_head); return 0; }

The code above should be pretty straightforward. Before we dig into this piece of code, let's refresh our mind with some basic knowledge on CPU registers, memory and assembly.

Registers and Memory in a 64-bit World

When writing low-level programs like C, the compiler transforms our code into assembly instructions that run on the CPU. Each CPU has a small set of registers, which are fast-access storage units. On a 64-bit machine, these registers are 64 bits wide — i.e., they can hold a number as large as 2⁶⁴.

Computer memory is essentially a giant array of bytes. Every byte in memory has a unique address (think of it like array index), which is also a 64-bit number on modern architectures. So, each register can natually point to a particular memory address.

Notice that the values in the registers don't always have to be memory address. They can also simply be numbers like -5, 42, etc. The CPU doesn't care about the meaning—it just sees binary values. It's up to us (and the compiler) to interpret them correctly. This dual interpretation — value or address — is what makes assembly programming flexible yet confusing to beginners.

Essentially all data inside a machine is just bits. When we see two bytes 0x48 0x69, we can either intepret them as string "Hi", or as decimal number 18537, and the computer itself wouldn't know the difference. Note that the prefix '0x' means this a hex number (0x10 is 16 in decimal). You'll run into hex numbers all the time in the low-level world, because memory addresses are 64-bit and decimal is just too clumsy - you would definitely prefer using 0x401256to 010 000 000 001 001 001 010 110 to denote an address. Hex is compact and maps neatly to binary.

There are 16 general-purpose registers in x86-64 systems:

%rax, %rbx, %rcx, %rdx, %rsi, %rdi, %rbp, %rsp, %r8 through %r15

Some registers have conventional purposes:

%rsp: stack pointer, always points to the top of the stack
%rax: return value from a function
%rdi, %rsi, %rdx, %rcx, %r8, %r9: used to pass the first 6 function arguments

Common Assembly Instructions

Assembly instructions manipulate the data between registers and memory. These instructions are where we begin to see how higher-level constructs like linked lists are implemented under the hood.

Here are a few of the most common instructions you'll run into all the time. These might seem overwhelming for now, but don't worry, we will see them work in action later.

mov: Copy value from source to destination (register-to-register, memory-to-register, etc.). For example, say register %rax is now 0x402000
- mov $0x28, %rax updates register %rax to value 0x28.
- mov $0x28, (%rax) puts integer 0x28 into memory starting at address 0x402000
lea: Load effective address. Usually used to perform simple arthmetic operations.
- Say %rax is 0x402000, and %rbx is 0x2. Instruction lea 0x1(%rax, %rbx, 4), %rdx sets %rdx = 0x402000 + 2 * 4 + 0x1 = 0x402009.
call: Push return address onto stack and jump to the function
ret: Pop return address and jump back to where the function call was made
push: Decrease %rsp and store a value at the new top of the stack
pop: Opposite of push: Load value from top of stack and increase %rsp
sub, add: Perform arithmetic (often used to grow/shrink stack frame)
cmp: Compare two values
jmp, je, jne, jg, etc.: Control flow via unconditional or conditional jumps. Usually used together with cmp instruction, for example, cmp $0x3, %rax
- je equal: %rax == 3
- jne not equak: %rax != 3
- jg greater: %rax > 3

Dissecting the main Function

I use the following code to compile the C file and then disassemble it:

gcc -g -no-pie -Og -o linked_list linked_list.c objdump -d --no-show-raw-insn linked_list > linked_list.s

Let's look at the disassembled version of our C program’s main function. I've added lots of annotations to help reading through. At each line, the leftmost hex number is the address of the instruction.

00000000004011c5 <main>: 4011c5: endbr64 # Control-flow enforcement. Let's ignore it 4011c9: push %rbx # Register rbx will be modified in the process. We push it onto the stack so that we can retrieve later 4011ca: sub $0x50,%rsp # Move the top of the stack to allocate 80 bytes on stack(0x50 is 80 in decimal) 4011ce: mov %fs:0x28,%rax 4011d7: mov %rax,0x48(%rsp) # Put special value in the stack to detect stack buffer overflows. We can ignore for now. 4011dc: xor %eax,%eax # Clear %eax # === Initialize 4 linked list nodes === 4011de: movl $0x28,(%rsp) # node4.val = 40 4011e5: movl $0x4,0x4(%rsp) # node4.id = 4 4011ed: movq $0x0,0x8(%rsp) # node4.next = NULL 4011f6: movl $0x1e,0x10(%rsp) # node3.val = 30 4011fe: movl $0x3,0x14(%rsp) # node3.id = 3 401206: mov %rsp,%rax # address of node4 401209: mov %rax,0x18(%rsp) # node3.next = &node4 40120e: movl $0x14,0x20(%rsp) # node2.val = 20 401216: movl $0x2,0x24(%rsp) # node2.id = 2 40121e: lea 0x10(%rsp),%rax # address of node3 401223: mov %rax,0x28(%rsp) # node2.next = &node3 401228: movl $0xa,0x30(%rsp) # node1.val = 10 401230: movl $0x1,0x34(%rsp) # node1.id = 1 401238: lea 0x20(%rsp),%rax # address of node2 40123d: mov %rax,0x38(%rsp) # node1.next = &node2 # === Print Original List === 401242: lea 0xdd3(%rip),%rdi # Load "Original list:" format string 401249: call 401060 <puts@plt> # Print string to our screen 40124e: lea 0x30(%rsp),%rbx # Set rbp to point to node1 401253: mov %rbx,%rdi # Set pointer to node1 as the first argument to function call `print_list` so that we're ready for the next line 401256: call 401195 <print_list> # === Reverse the List === 40125b: mov %rbx,%rdi # first arg: head node 40125e: call 401176 <reverse> # returns new head 401263: mov %rax,%rbx # save new head # === Print Reversed List === 401266: lea 0xdbe(%rip),%rdi # Load "Reversed list:" string 40126d: call 401060 <puts@plt> 401272: mov %rbx,%rdi 401275: call 401195 <print_list> # === Stack Canary Check === 40127a: mov 0x48(%rsp),%rax 40127f: sub %fs:0x28,%rax 401288: jne 401295 <main+0xd0> # if canary changed → stack overflow → crash # === Return Normally === 40128a: mov $0x0,%eax # 0 is the success return code 40128f: add $0x50,%rsp # Clean up stack frame 401293: pop %rbx # Retrieve the original value of register rbx 401294: ret # Return 0 401295: call 401070 <__stack_chk_fail@plt> # stack corrupted → crash

A very important concept is stack frame - a block of memory to use whenever a function is called. It stores everything that the function needs while it runs, such as return address (where to jump back after the function finishes), local variables, function arguments (if passed via the stack) and sometimes, saved registers (like %rbp, %rbx) to restore later. It’s created at the start of a function and destroyed at the end.

The register %rsp is the top of the stack frame, and is extremely important since it marks the boundary of the stack frame. When we say the stack frame is destroyed, we don't mean physically destroying the hardware. Instead, it means the previous chunk of memory is no longered considered part of the current stack frame, and other programs are now allowed to write new values to this memory space to overwrite the existing ones.

Now let’s run GDB (GNU Debugger) to execute the program. add a breakpoint to address 401256 (you can find this number in the disassembled code above), right before calling print_list function.

$ gdb linked_list GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. (gdb) break *0x401256 (gdb) run Starting program: /home/ubuntu/linked_list [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Now let’s examine the stack frame. The GDB command to examine memory (a very powerful tool!) is x/<count><format><size> <address>.

codes for [format]: x = hex, d = decimal, u = unsigned, c = char, s = string
codes for [size]: b = byte (1 byte), h = halfword (2 bytes), w = word (4 bytes), g = giant (8 bytes)

For example, the following code examines 20 words (4 bytes each) at the current stack pointer:

(gdb) x/20wx $rsp 0x7fffffffe350: 0x00000028 0x00000004 0x00000000 0x00000000 0x7fffffffe360: 0x0000001e 0x00000003 0xffffe350 0x00007fff 0x7fffffffe370: 0x00000014 0x00000002 0xffffe360 0x00007fff 0x7fffffffe380: 0x0000000a 0x00000001 0xffffe370 0x00007fff 0x7fffffffe390: 0x00000000 0x00000000 0xfbeca800 0x9629f2d4

The first 4 lines correspond to the 4 linked-list nodes we just created, and we can clearly see each node’s value, id and the pointer to next node. Each node consumes 16bytes (two integers, 4 bytes for each, and a 8-byte pointer), and that’s why their memory addresses all differ by 0x10.

Here’s how the stack looks during execution.

|------------------------| <- at 0x50(%rsp), 0x7fffffffe3a0 | stack overflow check | |------------------------| | | |------------------------| <- at 0x40(%rsp), 0x7fffffffe390 | node1 (next) | |------------------------| <- at 0x38(%rsp), 0x7fffffffe368 | node1 (val, id) | |------------------------| <- at 0x30(%rsp), 0x7fffffffe370 | node2 (next) | |------------------------| <- at 0x28(%rsp), 0x7fffffffe368 | node2 (val, id) | |------------------------| <- at 0x20(%rsp), 0x7fffffffe370 | node3 (next) | |------------------------| <- at 0x18(%rsp), 0x7fffffffe368 | node3 (val, id) | |------------------------| <- at 0x10(%rsp), 0x7fffffffe360 | node4 (next) | |------------------------| <- at 0x8(%rsp), 0x7fffffffe358 | node4 (val, id) | |------------------------| <- %rsp, 0x7fffffffe350

The stack frame contains exactly 0x50 bytes of memory, as allocated by the instruction 4011ca: sub $0x50,%rsp above. You may notice that the 8 bytes at address 0x40(%rsp) are simply unused, and you may wonder why did we bother to allocate space for it. On x86-64, the stack must be aligned to 16 bytes before a call instruction, and we need those “0x00” bytes for padding.

Dissecting the print_list Function

Now let's resume from we set the breakpoint, and dig into the print_list function.

0000000000401195 <print_list>: 401195: endbr64 # === Function Prologue === 401199: push %rbx # Save %rbx 40119a: mov %rdi,%rbx # Move first arg (head pointer) into %rbx # === Loop Condition === 40119d: jmp 4011be # Jump to address 4011be for loop condition check # === Loop Body === 40119f: mov (%rbx),%ecx # Load val into %ecx 4011a1: mov 0x4(%rbx),%edx # Load id into %edx 4011a4: lea 0xe59(%rip),%rsi # Load the address of format string "Node ID: %d, Value: %d\n" into %rsi 4011ab: mov $0x1,%edi # First arg to printf (fd = stdout) 4011b0: mov $0x0,%eax # Clear %eax before variadic call 4011b5: call 401080 <__printf_chk@plt> # secure version of function printf 4011ba: mov 0x8(%rbx),%rbx # Move to next node (follow .next) # === Loop Condition === 4011be: test %rbx,%rbx # Check if current node is NULL 4011c1: jne 40119f # If not NULL, jump to address 40119f to continue loop # === Function Epilogue === 4011c3: pop %rbx # Restore %rbx 4011c4: ret # Return

We can add a second breakpoint to the function to examine the stack

(gdb) break *0x4011ba Breakpoint 2 at 0x4011ba: file linked_list.c, line 30. (gdb) continue Continuing. Node ID: 1, Value: 10 Breakpoint 2, print_list (head=head@entry=0x7fffffffe380) at linked_list.c:30 30 head = head->next; (gdb) x/20wx $rsp 0x7fffffffe340: 0xffffe380 0x00007fff 0x0040125b 0x00000000 0x7fffffffe350: 0x00000028 0x00000004 0x00000000 0x00000000 0x7fffffffe360: 0x0000001e 0x00000003 0xffffe350 0x00007fff

We notice that the stack pointer register %rsp has been changed from 0x7fffffffe350 to 0x7fffffffe340, even though the print_list function doesn't allocate any space to the stack frame. So, why did the stack frame grow by 0x10 (16bytes)?

In the main function, when we execute instruction 401256: call 401195 <print_list>, we push the address of next instruction 40125b: mov %rbx,%rdi onto the stack, so that when function call print_list finishes, we can return to the correct location of main function and continue executing. This return addreses takes 8 bytes.
Inside the print_list function , we run 401199: push %rbx to push the value of register %rbx onto the stack. That takes another 8 bytes.

So, even though register %rsp isn't directly called, it was modified by instructions call and push. Notice that instructions ret and pop are counterparts of them, and will do the opposite - pop data out of the stack.

Dissecting the reverse Function

Now let's move on to the last function reverse.

0000000000401176 <reverse>: 401176: endbr64 # === Function Prologue === 40117a: mov $0x0,%eax # Set return register %rax = NULL (this will track the new head) # === Loop Condition (initial jump) === 40117f: jmp 40118f # Jump to loop condition before first iteration # === Loop Body === 401181: mov 0x8(%rdi),%rdx # Load next node: %rdx = current->next 401185: mov %rax,0x8(%rdi) # Reverse the link: current->next = previous 401189: mov %rdi,%rax # Update previous: previous = current 40118c: mov %rdx,%rdi # Move to next: current = next # === Loop Condition === 40118f: test %rdi,%rdi # Check if current (now in %rdi) is NULL 401192: jne 401181 # If not, continue the loop # === Function Epilogue === 401194: ret # Return; %rax holds new head of reversed list

This piece of assembly should be rather straightforward now.

One thing interesting here is that in my C function reverse , I declare 3 local variables prev, curr and next, but in the assembly code there isn't any stack space allocated for them. It's because the compiler tries to optimize away unnecessary memory operations, and it's smart enough to realize that we don't actually need any additional memory - simply using registers to update pointers is enough. And this process is just beautiful.

Thanks

I just started learning fundamentals about machines recently, although I've been a developer for many years. This article is my personal attempt to internalize my learnings and share them with others. If you're just getting into the low-level world, I hope you're just as excited as I'm.

One last note: I'm using my M2 Macbook Air for daily task and it doesn't work well with GDB and all those x86-64 stuff. I ended up getting an AWS ECS Ubuntu t2.micro machine.

$ uname -a Linux ip-172-31-18-227 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0