Simple in this case means that it’s easy to understand and very difficult to solve. We have the ability to swap bytes relative to the program base but… there isn’t really much there.
__dso_handle?
tl;dr its just a uuid lol
it is initialized at runtime to a recursive pointer (PIE + 0xf008) and is used to filter which atexit functions run when an object is unloaded. It is a pointer because it is implicitly unique but it is never dereferenced.
I was stuck at this point for a long time — we had an obvious and fairly strong primitive but nothing to do with it. This challenge is running under ASLR so we don’t know the location of any memory segments (besides the program itself, which can be leaked from __dso_handle).
Not all areas of memory are randomized the same way. The offset between .data and the heap is randomized by ASLR but it’s not… that… random? I knew from staring at memory maps that it was always in the same general area, tested it experimentally with gdb, and then after the fact looked it up in the kernel source code. The heap on x86/64 Linux starts between 0 and 8192 pages after the end of the program (in the no-aslr case this is always 0; it starts directly after the program).
To be quite honest this is enough on it’s own. A 1-in-8192 brute isn’t exactly fast but frankly I’ve done stupider things for a flag than a three hour brute (sry not sry infra; someone actually took it down doing this and got a POW added).
In the end though there was a pretty easy optimization that could cut that down to merely a couple hundred throws. The heap is (in this program, at the current state) 33 pages long and all we need to do is land somewhere inside the heap. Once we know a valid heap offset, we can walk back until the tcache perthread header is found — bringing an 1/8192 chance down to 1/250-ish.
Usually it’s fairly straightforward to get pointers into libc in the heap. Free a chunk into unsorted bins and either side of the free list will be pointing at main_arena in libc.
Unfortunately, in this case we don’t have much ability to work with the heap in this binary. There is (as far as I’m aware) a single relevant primitive — scanf allocates a scratch buffer and then frees it at the end. However, the lifetime of this chunk (allocated, used, freed) usually just means it gets consolidated against the predecessor chunk (top chunk in this case).
So, then, how can we prevent this consolidation? We don’t have enough control over the ordering of the heap chunks to prevent it from consolidating naturally — but we do have a very strong write primitive. Can the heap be corrupted in such a way so as to prevent consolidation? Keeping in mind that we have no control between the allocation and corresponding free?
There isn’t really much on the heap to work with but the first place to look is the top chunk — where our allocated chunk is split off from and then consolidated against.
There are two cases when allocating a chunk without pulling from the bins. If the top chunk has sufficient size then a chunk is split off from the top chunk. Otherwise, it will call into sysmalloc to handle “system-dependent cases”.
Sysmalloc has a lot of weird alternate cases! Allocations of sufficient size (sufficient size being a sliding scale, starts at 128k bytes and caps at 4mb on amd64 libc 2.35) are fulfilled with mmap. If needed, it will attempt to use sbrk to extend the length of the heap. The key to our problem lies in how malloc handles an edge case involving the heap extension — new heap pages which are not not contiguous with the old heap (either because the address space is noncontiguous or because non-libc code called sbrk). In such a case malloc will skip over that segment, create a new top chunk, and then prevent consolidation and free the old top chunk.
This is very promising, but we don’t have the ability to actually call force sbrk to return a noncontiguous page right? The answer is no — but it’s actually unnecessary! Contiguity is checked naively — the old heap end is computed based off the top chunk + top chunk size.
We don’t need to force sbrk to return a noncontiguous page — just convince malloc that it did do so. By using our byte swap primitive to shrink the size of the top chunk (from 0x20550 to 0x550) and then making an allocation larger than the new top chunk size (which extends the heap) we end up with the old top chunk in an unsorted bin with two pointers to libc present.
With arbitrary write (ish — its a swap but we could put arb bytes in the stdin buffer if needed) it’s basically over. I chose to replace a saved return address (and rbp, as rbp-0x78 needed to be writable) with a one gadget.