Skip to content

Latest commit

 

History

History
183 lines (151 loc) · 7.48 KB

File metadata and controls

183 lines (151 loc) · 7.48 KB

tile

pwn, 2 solves, 916 points

An old IoT device. attachment

http://111.186.59.27:28088

TL;DR / links

  • TILE-Gx ELF binary implementing a HTTP server.
  • ISA reference.
  • Build QEMU v5 (TILE-Gx is removed in v6) with --target-list=tilegx-linux-user, play with the binary.
  • Build afl++ with this patch, find crashes in the basic authentication handling.
  • QEMU GDB stub is broken, use tracing with -singlestep -d in_asm,cpu -strace instead.
  • Build binutils with --target=tilegx-linux-gnu, decompile objdump output by hand - the code appears to be built with -O0, so it's simple, but tedious.
  • Understand the crashes: username and password are essentially strcpy()ed into alloca(32)ed buffers.
  • Check mitigations: no ASLR (binary, heap and stack are all at known locations), no NX (heap is executable), stack canary present.
  • Use the username overflow to overwrite the password pointer with the address of the return address, bypassing the canary.
  • Overwrite the return address with the shellcode address (on heap).
  • Realize that the remote server may have slightly different stack addresses (due to argv, envp, and especially auxv) and brute force the delta.
  • Exploit.
  • Flag: flag{rop_on_t1111111le-gx_is_funny_27b7d3}.

Fuzzing

As I learned from CFZone 2021's FuzzAN-US challenge, afl++ supports foreign architectures with its QEMU instrumentation, so I tried building it for TILE-Gx:

qemu_mode$ CPU_TARGET=tilegx ./build_qemu_support.sh

Unfortunately, afl-fuzz refused to run, stating that handshake with forkserver has failed. I had to debug and figured out that the instrumented code does not call shmat(getenv(SHM_ENV_VAR)). QEMU's tilegx implementation does not use the common translator_loop(), to which afl++ adds its initialization, but rather implements a similar loop on its own. Adding the similar logic there fixed the issue.

By looking at strings output it's possible to figure out what subset of HTTP the binary implements:

  • GET, HEAD and POST methods.
  • Content-Length, Host and Authorization (Basic) headers.

Admin credentials (admin:70p_s3cr37_!@#) are also there. That's enough to give afl++ some initial input data to work with, and indeed, after several minutes it finds a bunch of crashes by passing weird data with the Authorization header.

Debugging

When trying to connect to QEMU GDB stub, GDB (both from Ubuntu 20.04 and binutils master) thinks the other side is sending garbage, and bails. I've decided against trying to fix that and went with inspecting the trace that QEMU writes when invoked with -singlestep -d in_asm,cpu -strace flags. -singlestep makes it trace each instruction, instead of each basic block, -d in_asm shows correspondence of program counter values to instructions, -d cpu shows register values after each instruction, and -strace shows the syscalls.

Traces produced this way aren't too large and can be navigated using vim or even grep -C. Knowing that it's better to debug in the remote-like environment, I also added these flags to the xinetd config and used docker exec grep to look at the data.

Static analysis

There don't seem to be good tools to decompile TILE-Gx (there is a plugin from Talos, but it's for the Linux version of IDA, which I don't have), so I've decided to manually work with the objdump output. The local variables in the binary are constantly spilled and loaded back (most likely because it's built with -O0). Even though it means that a single line of code corresponds to like 10 assembly instructions, it's easy to keep track of frame offsets (as opposed to register juggling in optimized code) and recognize many patterns (e.g. loading constants with moveli + 2 x shl16insli or function calls).

Knowing the crash location, I probably could have gotten away with not decompiling as much code as I did, but I wanted to be sure I understood what was happening. The code works with the following structures:

struct req {
  /* 0x00 */ void *;
  /* 0x08 */ void *method;
  /* 0x10 */ void *uri;
  /* 0x18 */ hdr *hdr;
  /* 0x20 */ char *content;
  /* 0x28 */ long content_length;
  /* 0x30 */ char *host;
  /* 0x38 */
};

struct hdr {
  /* 0x00 */ char *name;
  /* 0x08 */ void *value;
  /* 0x10 */ hdr *next;
  /* 0x18 */
};

First, sub_1001378 reads the request from stdin, then sub_1003750 handles it and prints the response to stdout, and finally sub_1002270 frees the request memory. sub_1003750 calls sub_1002f98 to check the Authorization header. It uses sub_10054b0 to decode base64-encoded credentials, splits them by :, and ...

Bug

... does alloca(32) two times and copies the username and the password to the resulting buffers without any length checks whatsoever.

Exploitation

The stack layout is as follows:

+-----------+  <- stack top
| password  |
+-----------+
| username  |
+-----------+
| &password |  <- username overflow -\
+-----------+                        |
| &username |                        |
+-----------+                        |
| ......... |                        |
+-----------+                        |
|  cookie   |                        |
+-----------+                        |
|  retaddr  |  <---------------------/
+-----------+

The code copies the username first, so we can use the overflow there to overwrite the password pointer. The new value can point to the return address, bypassing the cookie. From here on one could just write a ROP, but there is an even easier way: shellcode.

A few years back I made a similar challenge - Learning the Ropes, where the goal was to make players write ROP for IBM Z. However, some of them realized that the version of QEMU the challenge was running on did not implement all the memory protections, and simply jumped to shellcode on stack.

Here it's not that simple, because the copying is done until '\0', and writing shellcode without '\0's in it might be more challenging than writing a ROP chain. However, there is still a base64-decoded array on the heap. Since there is no ASLR either, it has a fixed address.

The last problem was that the exploit failed very early on the remote system. Of course, my approach with hardcoding a bunch of addresses is fragile, but on the other hand the docker environments should be identical. Finally, I realized, that while I tried to keep argv and envp the same, auxv is managed by the kernel and cannot be easily manipulated. So I ran exploit in a loop, trying different stack offsets, and at 16 it clicked. So, there must have been one extra auxv element on the server.

Conclusion

Quite an interesting challenge - a tribute to a dying architecture (the TILE-Gx support is being removed from everywhere). The bug is not that complicated, but this is compensated by the lack of reversing tools - it's nice that afl++ could be easily made to work though. Judging by the flag, the intended solution must have been a ROP chain, but a known QEMU deficiency allowed me to get away with the shellcode.