Hacking: The Art of Exploitation

Hacking: The Art of Exploitation, 2nd Edition · Jon Erickson ·480 pages

Low-level exploitation from first principles — C programming, x86 memory layout, buffer overflows, format strings, shellcode writing, network hacking, and countermeasure bypasses. Erickson explains the 'why' behind techniques rather than just the 'how'.

Capabilities (10)
  • Analyze stack frame layout to identify buffer overflow offset to return address
  • Classify vulnerability type: stack overflow, heap overflow, BSS overflow, format string
  • Explain format string %n write primitive and arbitrary memory write technique
  • Describe shellcode requirements: position-independent, null-free, size-constrained
  • Explain return-to-libc as NX/DEP bypass without shellcode
  • Map OSI layers to relevant attack surface (hijacking, spoofing, ARP poisoning)
  • Explain TCP/IP hijacking via sequence number prediction
  • Analyze WEP FMS attack — RC4 KSA weakness and IV-based key recovery
  • Describe ASLR bypass conditions: 32-bit brute force, heap spray, info leak chaining
  • Apply password probability matrices vs brute force vs rainbow table tradeoffs
How to use

Install this skill and Claude can analyze stack frame layouts to calculate buffer overflow offsets, explain format string and heap exploitation mechanics, reason through shellcode constraints and countermeasure bypass strategies, and map network attacks like TCP hijacking to their underlying protocol weaknesses

Why it matters

Understanding exploitation at the implementation level — not just conceptually — is the foundation of both offensive security and meaningful defense; practitioners who can reason from first principles about memory corruption and countermeasure bypasses build better mitigations and evaluate vulnerability reports with real accuracy

Example use cases
  • Analyzing a vulnerable C program to identify the buffer overflow offset to the return address, determine NX and stack canary status, and construct a return-to-libc payload
  • Reviewing shellcode for null bytes, position-dependence, or other properties that would cause it to fail in specific delivery contexts such as strcpy-based overflow exploits
  • Explaining the mechanics of TCP session hijacking or ARP poisoning and identifying which defensive controls (TLS, DNSSEC, DHCP snooping) mitigate each attack

Hacking: The Art of Exploitation Skill

Core Philosophy

Hacking is creative problem solving — finding unintended uses of a system’s own rules. Security researchers must understand attacks at the implementation level to build meaningful defenses. Understanding exploitation is prerequisite to understanding how to prevent it.

The co-evolutionary model: attacking hackers find weaknesses → defending hackers build mitigations → attacking hackers develop evasion → better mitigations emerge. Understanding both sides produces smarter security.


Memory Layout Fundamentals

Process Memory Segments

High addresses
┌────────────────┐
│   Stack        │ ← grows DOWN — local vars, return addresses, saved frames
│   (grows ↓)    │
├────────────────┤
│   ...          │
│   (grows ↑)    │
│   Heap         │ ← dynamic allocation (malloc/new)
├────────────────┤
│   BSS          │ ← uninitialized globals
│   Data         │ ← initialized globals
│   Text/Code    │ ← read-only executable instructions
Low addresses

Stack Frame Layout (x86)

When a function is called:

  1. Arguments pushed onto stack (right to left in cdecl)
  2. Return address pushed (EIP saved)
  3. Previous frame pointer pushed (EBP saved)
  4. ESP moved to create space for locals
Higher addresses
┌──────────────────┐
│   arg2           │
│   arg1           │
│   return address │ ← EIP before call
│   saved EBP      │ ← old base pointer
│   local var 1    │
│   local var 2    │  ← ESP points here
└──────────────────┘
Lower addresses

Key registers:

  • EIP: instruction pointer — what executes next
  • ESP: stack pointer — top of stack
  • EBP: base pointer — reference for current frame

Exploitation Vulnerability Classes

Stack Buffer Overflow

When a fixed-size buffer on the stack is written past its end, attacker can overwrite:

  1. Adjacent local variables
  2. Saved EBP (frame pointer)
  3. Saved return address ← the primary target

Exploitability check:

  • Is the destination buffer on the stack?
  • Is there an unbounded copy (strcpy, gets, scanf %s)?
  • Can attacker control input length and content?

Classic payload structure:

[NOP sled][shellcode][padding][new_return_address → NOP sled]

Off-by-one errors: writing exactly one byte past the end of an array can still overwrite the null terminator of an adjacent string or the low byte of a saved pointer.

Heap Buffer Overflow

Overflows on the heap corrupt heap metadata (prev_size, size fields in glibc’s dlmalloc), allowing:

  • Arbitrary write primitive via free() unlink operation
  • Overwriting function pointers stored on heap
  • Use-after-free conditions

BSS/Data Segment Overflows

Global/static variables in BSS/Data can be overflowed into adjacent variables. Often more reliable than stack overflows (no ASLR for BSS in older systems, no stack canary).

Format String Vulnerabilities

printf(user_input);         // vulnerable
printf("%s", user_input);   // safe

The %n format specifier writes the number of bytes printed so far to the address provided as argument. With no argument provided, it reads from wherever the stack pointer lands.

Capabilities:

  • Read: %x or %s to read stack values
  • Write: %n to write arbitrary 4-byte values to arbitrary addresses
  • Arbitrary read/write: combine %[n]$x parameter field with controlled stack layout

Technique — writing to an address:

  1. Put target address in the input string (it lands on the stack)
  2. Use %[offset]$n to write to that address
  3. Control the write value via padding in the format string

Shellcode Fundamentals

Shellcode Requirements

  • Position-independent: no hardcoded addresses
  • No null bytes: strcpy stops at \x00
  • Small: must fit in available buffer space

Shell-Spawning Shellcode (x86 Linux)

Uses execve syscall (int 0x80) with:

  • EAX = 0x0b (execve syscall number)
  • EBX = pointer to “/bin/sh” string
  • ECX = pointer to argv array
  • EDX = pointer to envp (NULL)

Key technique: use call instruction to get address of “/bin/sh” string onto stack (the call pushes EIP, which points past the call to the string data).

Port-Binding Shellcode

Creates a socket, binds to a port, listens, accepts, then dup2() the socket fd to stdin/stdout/stderr, then execve /bin/sh. Gives remote shell on the target machine.

Connect-Back Shellcode

Connects back to attacker’s IP:port using socket + connect syscalls, then dup2 + execve. Bypasses firewalls that block inbound connections.


Network Layer Knowledge

OSI Model for Security Analysis

7  Application  ← HTTP, FTP, SSH — protocol vulns here
6  Presentation ← encoding/encryption issues
5  Session      ← session hijacking
4  Transport    ← TCP/UDP — SYN floods, TCP hijacking
3  Network      ← IP — spoofing, routing attacks
2  Data Link    ← ARP — ARP poisoning
1  Physical     ← physical access

TCP/IP Hijacking

  1. Sniff a TCP connection (get seq/ack numbers)
  2. Wait for silence (no data flowing)
  3. Inject packet with correct seq/ack numbers and spoofed source IP
  4. The server accepts it as legitimate traffic from the original client

Prevention: encrypted transport (TLS/SSH) makes hijacking useless even if sequence numbers are guessed.

Port Scanning Technique (SYN scan)

Send SYN packets; analyze responses:

  • SYN-ACK → port open
  • RST → port closed
  • No response → filtered

Countermeasures and Bypass Techniques

Nonexecutable Stack (NX/DEP)

What it does: marks stack pages with NW (No-Execute) bit; shellcode on stack triggers fault.

Bypass — Return-to-libc: Instead of jumping to shellcode, overwrite return address with address of system() in libc, with “/bin/sh” as the argument on the stack. No shellcode needed.

Address Space Layout Randomization (ASLR)

What it does: randomizes base addresses of stack, heap, libraries.

Bypass conditions:

  • 32-bit systems: only ~16 bits of entropy for stack, brute-forceable
  • Heap spray: allocate large blocks of NOP+shellcode, increases hit probability
  • Info leak: find a read primitive to leak addresses first, then calculate offsets

Stack Canaries

What they do: place a random value before the saved return address; checked on return.

Bypass conditions:

  • Format string read can leak the canary value
  • Heap overflows bypass the canary entirely
  • Off-by-one that only corrupts EBP (some implementations)

IDS Evasion

  • Fragment packets below IDS reassembly threshold
  • Send out-of-order fragments (IDS may not handle overlap correctly)
  • Use shellcode encodings/stubs that decode at runtime
  • Use polymorphic shellcode (different bytes, same behavior)

Cryptography Fundamentals

Symmetric vs Asymmetric

SymmetricAsymmetric
KeysOne shared secretPublic/private pair
SpeedFastSlow
ProblemKey distributionComputationally expensive
UseBulk dataKey exchange + signatures

Hybrid ciphers (TLS, PGP): use asymmetric to exchange a symmetric session key, then symmetric for data.

Password Cracking Approaches

  1. Dictionary attack: try known words and variants
  2. Brute force: exhaustive — infeasible for long passwords
  3. Probability matrix: weight guesses by character frequency in real passwords (most effective)
  4. Rainbow tables: precomputed hash→password mappings (defeated by salt)

WEP Weakness (FMS Attack)

WEP uses RC4 with a weak key scheduling algorithm (KSA). When an IV starting with (A+3, 255, X) is used:

  • The second byte of the keystream output reveals information about the key byte
  • With 4–6 million packets, enough weak IVs accumulate to reconstruct the key

Lesson: IV reuse + weak KSA = statistical key recovery. Never design stream ciphers with predictable, small IVs.


Debugging for Exploitation

GDB Commands for Exploit Development

info registers          # dump all register values
x/20x $esp             # examine 20 words at ESP in hex
x/s <address>          # examine as string
break *<address>        # breakpoint at exact address
run $(python -c 'print "A"*100')  # pass generated input
disassemble <function>  # show disassembly

Finding Buffer Offset to Return Address

  1. Generate a De Bruijn sequence (unique 4-byte patterns at every position)
  2. Run program, let it crash
  3. EIP contains a unique 4-byte sequence
  4. Look up position in the De Bruijn sequence → that’s the offset

Finding the Right Return Address

  • On vulnerable local binary: use GDB to find ESP at the overflow point
  • On remote target: estimate based on binary info, use NOP sled to increase margin
  • For ret-to-libc: ldd binary | grep libc then nm -D libc.so | grep ' system'