[Reverse] Cursed App

This is a puzzle from the ASIS CTF Final 2019.

cursed_app.elf

This puzzle comes with a single ELF file. Executing it produces the following line of output:

$ chmod +x cursed_app.elf
$ ./cursed_app.elf
please locate license file, run ./app license_key
Bash

We open this in radare2 and quickly realise that the program consists of a huge main function containing approximately 1000 x86-64 instructions long, calling only C standard library functions (hence the puzzle title). Near the top of main we see several instructions that reads a file given in argv[1].

After a successful call to read, the code starts to execute the main instructions that are of interest to us. The program structure is quite clear for this puzzle but it was even easier to see with radare2's graph view. Essentially, the program consisted of similar blocks of instructions that always ended in a conditional branch to the address 0x1f1f.

0x00001177      4889ef         mov rdi, rbp                 ; 1st block
0x0000117a      e8d1feffff     call sym.imp.fclose
0x0000117f      0fbe03         movsx eax, byte [rbx]
0x00001182      be00010000     mov esi, 0x100
0x00001187      69c0af000000   imul eax, eax, 0xaf
0x0000118d      0591000000     add eax, 0x91
0x00001192      99             cdq
0x00001193      f7fe           idiv esi
0x00001195      89d0           mov eax, edx
0x00001197      69caeb000000   imul ecx, edx, 0xeb
0x0000119d      0fafc2         imul eax, edx
0x000011a0      6bd267         imul edx, edx, 0x67
0x000011a3      69c064010000   imul eax, eax, 0x164
0x000011a9      8d8408020200.  lea eax, [rax + rcx + 0x202]
0x000011b0      8d4a02         lea ecx, [rdx + 2]
0x000011b3      99             cdq
0x000011b4      f7f9           idiv ecx
0x000011b6      85d2           test edx, edx
0x000011b8      0f85910d0000   jne 0x1f4f
0x000011be      0fbe4301       movsx eax, byte [rbx + 1]    ; 2nd block
0x000011c2      6bc05d         imul eax, eax, 0x5d
0x000011c5      05da000000     add eax, 0xda
0x000011ca      99             cdq
0x000011cb      f7fe           idiv esi
0x000011cd      89d0           mov eax, edx
0x000011cf      6bca3c         imul ecx, edx, 0x3c
0x000011d2      0fafc2         imul eax, edx
0x000011d5      6bd256         imul edx, edx, 0x56
0x000011d8      69c0da030000   imul eax, eax, 0x3da
0x000011de      8d8408540300.  lea eax, [rax + rcx + 0x354]
0x000011e5      8d8a5f030000   lea ecx, [rdx + 0x35f]
0x000011eb      99             cdq
0x000011ec      f7f9           idiv ecx
0x000011ee      85d2           test edx, edx
0x000011f0      0f85590d0000   jne 0x1f4f
(Excerpt of the output by radare2 for cursed_app.elf, adapted)

We see that at the beginning of each block, just after the previous block's conditional jump instruction, the assembly reads from memory a byte at increasing offsets from the value stored in the rbx register.

Around the address 0x1f4f, we discover that at the end of all the branches is a message congratulating us on getting the correct flag. On the other hand, taking any of the conditional branches leads to another message stating that our flag is wrong. This leads us to believe that the goal is to find the string that yields the former when passed in by a file given in argv[1] as stated above.

0x00001f3b      488d3dfe1000.  lea rdi, str.Congratz__You_got_the_correct_flag
0x00001f42      e8e9f0ffff     call sym.imp.puts
0x00001f47      c74424040100.  mov dword [var_4h], 1
0x00001f4f      837c240400     cmp dword [var_4h], 0
0x00001f54      7421           je 0x1f77
0x00001f56      4881c4e00000.  add rsp, 0xe0
0x00001f5d      31c0           xor eax, eax
0x00001f5f      5b             pop rbx
0x00001f60      5d             pop rbp
0x00001f61      415c           pop r12
0x00001f63      c3             ret
...
0x00001f77      488d3dea1000.  lea rdi, str.Bummer__You_got_the_wrong_flag
0x00001f7e      e8adf0ffff     call sym.imp.puts
0x00001f83      ebd1           jmp 0x1f56
(Excerpt of the output by radare2 for cursed_app.elf, adapted)

One way to achieve this is to utilise a fuzzer. Inspection of the blocks indicate that the test for each character of the input is independent of the others, so that instead of the whole search space of 256N256^N characters, one can find the correct string in at most 256N256N trials. However, we actually used angr, a symbolic execution tool written in Python.

pip install angr
31.3s
Cursed_App (Bash in Python)

The gist is that we can pass in a symbolic argument to a simulator that will operate on it as an unknown rather than a concrete value. At each branch, the process will fork into two separate ones, each adding a constraint opposite to the other corresponding to whether the branch was taken or not. When the execution is complete, a solver can deduce what values will cause the execution to take a certain path. This suffers from exponential growth in the number of non-nested branches, so symbolic execution engines may also use a constraint solver to determine if a path is feasible, discarding it if not.

angr has a pretty helpful feature that allows us to begin simulation in the middle of a program. This enables us to skip past the cumbersome calls to reading a file and directly inject our symbolic variable into memory. Furthermore, we can also prune paths from execution, which in this case would be those that pass through address 0x1f4f. Naturally, we can also choose to terminate the search before execution normally finishes. We choose this to be the address 0x1f3b, which is after all the conditional checks have passed.

# Constants discovered during static analysis using r2 (or any other decompiler)
# Our .text (code) section will be loaded at offset 0x400000
start = 0x40117f    # Skip past the file reading stuff
target = 0x401f3b   # Earliest point(s) where we know we've succeeded
avoid = 0x401f4f    # Place(s) where we know we've failed
0.2s
Cursed_App (Python)

(angr has the capability to prune and terminate execution based on any information obtainable from the state rather than just addresses. However, this suffices for our purpose.)

One downside to beginning execution in the middle is that we have to set up the state manually to what the program expects or it will fail to work correctly. In this case it was easy to inspect the assembly and determine that the only register used which does not have its value initialised after the start address is rbx. It is assigned in the instruction mov rbx, rax at address 0x115c, following the call to malloc. A few instructions down, this is used as the first argument to fread, so it means that our input string should be loaded into the address at rbx. It does not actually matter what the value of rbx is as long as it matches with our input—indeed, it might even differ between different executions.

We now have all the information needed to reverse engineer the flag.

import angr
# We require no library functions
proj = angr.Project(
cursed_app.elf
, auto_load_libs=False)
myaddr = 0x600000   # Arbitrary address for our string
# Setting up the starting state
state = proj.factory.blank_state(addr=start)
chars = []  # The symbolic string
for i in range(0x3b):   # The last reference is [rbx+0x3a]
  bvs = state.solver.BVS('c{}'.format(i), 8)  # 8-bits, name arbitrary
  state.memory.store(myaddr + i, bvs)
  state.add_constraints(bvs >= 32)            # We only accept printable chars
  state.add_constraints(bvs < 127)
  chars.append(bvs)
state.regs.rbx = myaddr   # Initially rbx is set to the address of our string
# Begin symbolic execution
import logging
logging.getLogger('angr').setLevel('INFO')
simgr = proj.factory.simulation_manager(state)
simgr.explore(find=target, avoid=avoid)
# Final states are in simgr.found
118.5s
Cursed_App (Python)
<SimulationMa...und, 59 avoid>

We are almost done. The final step is to solve for the given input. angr's backend for constraint solving is claripy, which we can invoke directly from the final state object through the solver property. At first, we did the simplest thing, which was to call the eval function on each character, as in the following listing.

# There is only one execution path to finish, so there should only be one state
# in simgr.found (unless it failed completely), which is the one we want
if len(simgr.found) == 0:
  print("Flag not found!")
  exit()
final = simgr.found[0]
flag = b''
for c in chars:
  flag += final.solver.eval(c, cast_to=bytes)
print(flag.decode())   # Yay?
0.6s
Cursed_App (Python)

Unfortunately, this did not give us the correct flag. It was of the correct format, but on further inspection the l33+ string we generated appeared to have spelling errors. The problem was that some characters did not have unique solutions. The sha256 hash of the correct flag was given, so we generate all possible solutions instead and print the one with the correct hash. We can do this with the eval_upto function, since we know that there are only 95 possible characters within the range (fewer if we take into account the flag format).

import hashlib
import itertools
target_hash = 'dd791346264fc02ddcd2466b9e8b2b2bba8057d61fbbc824d25c7957c127250d'
c_possibilities = []
for c in chars:
  c_possibilities.append(final.solver.eval_upto(c, 95, cast_to=bytes))
flag_possibilities = itertools.product(*c_possibilities)
for flag in flag_possibilities:
  flag = b''.join(list(flag))   # flag is originally a tuple of bytes
  # In case the flag is in the middle of the string
  flag = flag[flag.find(b'ASIS{'):flag.find(b'}')+1]
  if hashlib.sha256(flag).hexdigest() == target_hash:
    print(flag.decode())  # Yay!
83.1s
Cursed_App (Python)

And we are done!

solve.py

N.B. Initially we made the mistake of creating one large bit-vector for the entire string. It resulted in a huge drop in the execution speed. In particular, the execution of each basic block took longer over time. That was possibly due to the complexity in checking that the constraints were satisfiable at each branch.

Runtimes (1)