1. Create an angr project.
2. Create an entry to start execution.
3. Create a simulation manager initialized with the starting state.
4. Explore the binary.
5. Find the input.
import angr import sys
defmain(argv): # Create an Angr project. # If you want to be able to point to the binary from the command line, you can # use argv[1] as the parameter. Then, you can run the script from the command # line as follows: # python ./scaffold00.py [binary] # (!) path_to_binary = argv[1] # m:string project = angr.Project(path_to_binary)
# Tell Angr where to start executing (should it start from the main() # function or somewhere else?) For now, use the entry_state function # to instruct Angr to start from the main() function. initial_state = project.factory.entry_state()
# Create a simulation manager initialized with the starting state. It provides # a number of useful tools to search and execute the binary. simulation = project.factory.simgr(initial_state)
# Explore the binary to attempt to find the address that prints "Good Job." # You will have to find the address you want to find and insert it here. # This function will keep executing until it either finds a solution or it # has explored every possible path through the executable. # (!) print_good_address = ??? # :integer (probably in hexadecimal) simulation.explore(find=print_good_address)
# Check that we have found a solution. The simulation.explore() method will # set simulation.found to a list of the states that it could find that reach # the instruction we asked it to search for. Remember, in Python, if a list # is empty, it will be evaluated as false, otherwise true. if simulation.found: # The explore method stops after it finds a single state that arrives at the # target address. solution_state = simulation.found[0]
# Print the string that Angr wrote to stdin to follow solution_state. This # is our solution. print solution_state.posix.dumps(sys.stdin.fileno()) else: # If Angr could not find a path that reaches print_good_address, throw an # error. Perhaps you mistyped the print_good_address? raise Exception('Could not find the solution')
target_addr = 0x08048678 simulation.explore(find=target_addr) if simulation.found: print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode("utf-8"))
else: raise Exception("Cound't find it!")
if __name__ == "__main__": main(sys.argv)
01_angr_avoid
Tick stash “stash” forward (up to “n” times or until “num_find” states are found), looking for condition “find”, avoiding condition “avoid”. Stores found states into “find_stash’ and avoided states into “avoid_stash”.
The “find” and “avoid” parameters may be any of:
An address to find
A set or list of addresses to find
A function that takes a state and returns whether or not it matches.
If an angr CFG is passed in as the “cfg” parameter and “find” is either a number or a list or a set, then any states which cannot possibly reach a success state without going through a failure state will be preemptively avoided.
The “avoid” argument will not store the found state. This will help us save the time.
When I fixed the problem, IDA suggested that the function was too large.
When I solved this challenge and used ida to decompile, IDA showed that the function was too big.
Solution: change the config file.
MacOS: IDA_HOME/idabin/cfg/hexrays.cfg
MAX_FUNCSIZE = 64// Functions over 64K are not decompiled
Change that to the following:
MAX_FUNCSIZE = 1024// Functions over 1024K are not decompiled
Although we fixed the configuration file, this function is too big and we will spend too much time decompiling it. So we can read the assembly code directly.
import angr import sys
defmain(argv): # Explore the binary, but this time, instead of only looking for a state that # reaches the print_good_address, also find a state that does not reach # will_not_succeed_address. The binary is pretty large, to save you some time, # everything you will need to look at is near the beginning of the address # space. # (!) binary_path = argv[1] proj = angr.Project(binary_path, auto_load_libs=False)
if simulation.found: print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode('utf-8')) else: raise Exception("Don't find the target.")
if __name__ == "__main__": main(sys.argv)
02_angr_find_condition
It is very useful to be able to search for a state that reaches a certain
instruction. However, in some cases, you may not know the address of the
specific instruction you want to reach (or perhaps there is no single
instruction goal.) In this challenge, you don’t know which instruction
grants you success. Instead, you just know that you want to find a state where
the binary prints “Good Job.”
Angr is powerful in that it allows you to search for a states that meets an
arbitrary condition that you specify in Python, using a predicate you define
as a function that takes a state and returns True if you have found what you
are looking for, and False otherwise.
Although we can find the address of the target instruction, in order to train the writing of angr, we use the recommended method of the script, which is to find the conditions we hope to achieve by writing functions.
if simulation.found: print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode("utf-8")) else: raise Exception("Can't find it.")
if __name__ == "__main__": main(sys.argv)
03_angr_symbolic_registers
Through this challenge we can learn how to control registers and manually set entries through angr.
# Angr doesn't currently support reading multiple things with scanf (Ex: # scanf("%u %u).) You will have to tell the simulation engine to begin the # program after scanf is called and manually inject the symbols into registers.
# Sometimes, you want to specify where the program should start. The variable # start_address will specify where the symbolic execution engine should begin. # Note that we are using blank_state, not entry_state. # (!) start_address = 0x8048980# :integer (probably hexadecimal) initial_state = project.factory.blank_state(addr=start_address)
# Create a symbolic bitvector (the datatype Angr uses to inject symbolic # values into the binary.) The first parameter is just a name Angr uses # to reference it. # You will have to construct multiple bitvectors. Copy the two lines below # and change the variable names. To figure out how many (and of what size) # you need, dissassemble the binary and determine the format parameter passed # to scanf. # (!) password_size_in_bits = 20 * 8# :integer password0 = claripy.BVS('password0', password_size_in_bits) password1 = claripy.BVS('password1', password_size_in_bits) password2 = claripy.BVS('password2', password_size_in_bits) ...
# Set a register to a symbolic value. This is one way to inject symbols into # the program. # initial_state.regs stores a number of convenient attributes that reference # registers by name. For example, to set eax to password0, use: # # initial_state.regs.eax = password0 # # You will have to set multiple registers to distinct bitvectors. Copy and # paste the line below and change the register. To determine which registers # to inject which symbol, dissassemble the binary and look at the instructions # immediately following the call to scanf. # (!) initial_state.regs.eax = password0 initial_state.regs.ebx = password1 initial_state.regs.edx = password2
if simulation.found: solution_state = simulation.found[0]
# Solve for the symbolic values. If there are multiple solutions, we only # care about one, so we can use eval, which returns any (but only one) # solution. Pass eval the bitvector you want to solve for. # (!) solution0 = solution_state.solver.eval(password0) solution1 = solution_state.solver.eval(password1) solution2 = solution_state.solver.eval(password2) ...
# Aggregate and format the solutions you computed above, and then print # the full string. Pay attention to the order of the integers, and the # expected base (decimal, octal, hexadecimal, etc). # solution = solution0 + " " + solution1 + " " + solution2 # :string print("{:x} {:x} {:x}".format(solution0, solution1, solution2)) else: raise Exception('Could not find the solution')
This challenge exercises our ability to construct stacks using angr.
Tips:
Since we are starting after scanf, we are skipping this stack construction step. To make up for this, we need to construct the stack ourselves. Let us start by initializing ebp in the exact same way the program does.
# This challenge will be more challenging than the previous challenges that you # have encountered thus far. Since the goal of this CTF is to teach symbolic # execution and not how to construct stack frames, these comments will work you # through understanding what is on the stack. # ! ! ! # IMPORTANT: Any addresses in this script aren't necessarily right! Dissassemble # the binary yourself to determine the correct addresses! # ! ! !
In ida, the stack memory for this function is 0x18 bytes.
“v1” is located at the position of “ebp - 0x10” and the “v2” is located at the position of “ebp - 0xc”. So when the address is indexed by ESP, the offset is 0x18-0x10, which is exactly the position of the first parameter.
keywords: initial_state.stack_push()
05_angr_symbolic_memory
Determine the address of the global variable to which scanf writes the user input. The function ‘initial_state.memory.store(address, value)’ will write ‘value’ (a bitvector) to ‘address’ (a memory location, as an integer.) The ‘address’ parameter can also be a bitvector (and can be symbolic!).
Instead of telling the binary to write to the address of the memory allocated with malloc, we can simply fake an address to any unused block of memory and overwrite the pointer to the data. This will point the pointer with the address of pointer_to_malloc_memory_address0 to fake_heap_address. Be aware, there is more than one pointer! Analyze the binary to determine global location of each pointer. Note: by default, Angr stores integers in memory with big-endianness. To specify to use the endianness of your architecture, use the parameter endness=project.arch.memory_endness. On x86, this is little-endian.
The key to this challenge is that we skipped malloc at the address at the beginning, so we have to manually allocate a fake heap address for it.
# Store our symbolic values at our fake_heap_address. Look at the binary to # determine the offsets from the fake_heap_address where scanf writes. # (!) initial_state.memory.store(fake_heap_address0, password0) initial_state.memory.store(fake_heap_address1, password1) ...
The binary asks for a 16 character password to which is applies a complex function and then compares with a reference string with the function checkequals[reference string]. (Decompile the binary and take a look at it!)
The source code for this function is provided here. However, the reference string in your version will be different than AABBCCDDEEFFGGHH.
The function checks if *to_check == “AABBCCDDEEFFGGHH”. Verify this yourself. While you, as a human, can easily determine that this function is equivalent to simply comparing the strings, the computer cannot. Instead the computer would need to branch every time the if statement in the loop was called (16 times), resulting in 2^16 = 65,536 branches, which will take too long of a time to evaluate for our needs.
We do not know how the complex_function works, but we want to find an input that, when modified by complex_function, will produce the string: AABBCCDDEEFFGGHH.
In this puzzle, your goal will be to stop the program before this function is called and manually constrain the to_check variable to be equal to the password you identify by decompiling the binary. Since, you, as a human, know that if the strings are equal, the program will print “Good Job.”, you can be assured that if the program can solve for an input that makes them equal, the input will be the correct password.
This challenge is quite practical. I have seen many ctf challenges using this writing method: compare the input string with the target string character by character. This is easy for us humans, but not for computers. When we use symbols to execute, this challenge has 16 characters, so we have to compare 2 to the 16th power, which is obviously unbearable. In order to solve this problem, we need to manually specify constraints.
Save the result of check_equals_AABBCCDDEEFFGGHH (or similar)
Get another 16 bytes from the user and encrypt it.
Check that it’s equal to a predefined password.
The ONLY part of this program that we have to worry about is #2.We will be replacing the call to checkequals with our own version, using a hook, since checkequals will run too slowly otherwise.
You may be thinking: Why can’t I just use hooks? The function is called many times, but if I hook the address of the function itself (rather than the addresses where it is called), I can replace its behavior everywhere. Furthermore, I can get the parameters by reading them off the stack (with memory.load(regs.esp + xx)), and return a value by simply setting eax! Since I know the length of the function in bytes, I can return from the hook just before the ‘ret’ instruction is called, which will allow the program to jump back to where it was before it called my hook. If you thought that, then congratulations! You have just invented the idea of SimProcedures! Instead of doing all of that by hand, you can let the already-implemented SimProcedures do the boring work for you so that you can focus on writing a replacement function in a Pythonic way. As a bonus, SimProcedures allow you to specify custom calling conventions, but unfortunately it is not covered in this CTF.
A SimProcedure replaces a function in the binary with a simulated one written in Python. Other than it being written in Python, the function acts largely the same as any function written in C. Any parameter after 'self' will be treated as a parameter to the function you are replacing.
The parameters will be bitvectors. Additionally, the Python can return in the ususal Pythonic way. Angr will treat this in the same way it would treat a native function in the binary returning. An example:
int add_if_positive(int a, int b) {
if (a >= 0 && b >= 0) return a + b;
else return 0;
}
could be simulated with...
class ReplacementAddIfPositive(angr.SimProcedure):
def run(self, a, b):
if a >= 0 and b >=0:
return a + b
else:
return 0
Finish the parameters to the check_equals_ function. Reminder: int check_equals_AABBCCDDEEFFGGHH(char* to_check, int length) { ...
(!)
classReplacementCheckEquals(angr.SimProcedure): defrun(self, to_check, length): # We can almost copy and paste the solution from the previous challenge. # Hint: Don't look up the address! It's passed as a parameter. # (!) user_input_buffer_address = to_check user_input_buffer_length = length
# Note the use of self.state to find the state of the system in a SimProcedure. user_input_string = self.state.memory.load( user_input_buffer_address, user_input_buffer_length )
check_against_string = "ORSDDWXHZURJRBDH" # Finally, instead of setting eax, we can use a Pythonic return statement # to return the output of this function. # Hint: Look at the previous solution. return claripy.If(user_input_string == check_against_string, claripy.BVV(1, 32), claripy.BVV(0, 32))
# Hook the check_equals symbol. Angr automatically looks up the address # associated with the symbol. Alternatively, you can use 'hook' instead # of 'hook_symbol' and specify the address of the function. To find the # correct symbol, disassemble the binary. # (!) check_equals_symbol = "check_equals_ORSDDWXHZURJRBDH"# :string project.hook_symbol(check_equals_symbol, ReplacementCheckEquals())
if simulation.found: solution_state = simulation.found[0] print(solution_state.posix.dumps(sys.stdin.fileno())) else: raise Exception('Could not find the solution')
if __name__ == '__main__': main(sys.argv)
11_angr_sim_scanf
# This time, the solution involves simply replacing scanf with our own version, # since Angr does not support requesting multiple parameters with scanf.
classReplacementScanf(angr.SimProcedure): # Finish the parameters to the scanf function. Hint: 'scanf("%u %u", ...)'. # (!) defrun(self, format_string, scanf0_address, scanf1_address): scanf0 = claripy.BVS('scanf0', 32) scanf1 = claripy.BVS('scanf1', 32) ...
# The scanf function writes user input to the buffers to which the parameters point. self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness) self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness) ...
# Now, we want to 'set aside' references to our symbolic values in the # globals plugin included by default with a state. You will need to # store multiple bitvectors. You can either use a list, tuple, or multiple # keys to reference the different bitvectors. # (!) self.state.globals['solution0'] = scanf0 self.state.globals['solution1'] = scanf1
if simulation.found: solution_state = simulation.found[0]
# Grab whatever you set aside in the globals dict. stored_solutions0 = solution_state.globals['solution0'] stored_solutions1 = solution_state.globals['solution1'] ...
print(solution_state.solver.eval(stored_solutions0), solution_state.solver.eval(stored_solutions1)) else: raise Exception('Could not find the solution')
if __name__ == '__main__': main(sys.argv)
12_angr_veritesting
# When you construct a simulation manager, you will want to enable Veritesting: # project.factory.simgr(initial_state, veritesting=True) # Hint: use one of the first few levels' solutions as a reference.
simulation = project.factory.simgr(initial_state, veritesting=True) simulation.explore(find=0x804867C) if simulation.found: print(simulation.found[0].posix.dumps(0))
if __name__ == '__main__': main(sys.argv)
13_angr_static_binary
Additionally, note that, when the binary is executed, the main function is not
the first piece of code called. In the _start function, __libc_start_main is
called to start your program. The initialization that occurs in this function
can take a long time with Angr, so you should replace it with a SimProcedure.
angr.SIM_PROCEDURES[‘glibc’][‘__libc_start_main’]
Note ‘glibc’ instead of ‘libc’.
# This challenge is the exact same as the first challenge, except that it was # compiled as a static binary. Normally, Angr automatically replaces standard # library functions with SimProcedures that work much more quickly.
# Here are a few SimProcedures Angr has already written for you. They implement # standard library functions. You will not need all of them: # angr.SIM_PROCEDURES['libc']['malloc'] # angr.SIM_PROCEDURES['libc']['fopen'] # angr.SIM_PROCEDURES['libc']['fclose'] # angr.SIM_PROCEDURES['libc']['fwrite'] # angr.SIM_PROCEDURES['libc']['getchar'] # angr.SIM_PROCEDURES['libc']['strncmp'] # angr.SIM_PROCEDURES['libc']['strcmp'] # angr.SIM_PROCEDURES['libc']['scanf'] # angr.SIM_PROCEDURES['libc']['printf'] # angr.SIM_PROCEDURES['libc']['puts'] # angr.SIM_PROCEDURES['libc']['exit'] # # As a reminder, you can hook functions with something similar to: # project.hook(malloc_address, angr.SIM_PROCEDURES['libc']['malloc']) # # There are many more, see: # https://github.com/angr/angr/tree/master/angr/procedures/libc # # Additionally, note that, when the binary is executed, the main function is not # the first piece of code called. In the _start function, __libc_start_main is # called to start your program. The initialization that occurs in this function # can take a long time with Angr, so you should replace it with a SimProcedure. # angr.SIM_PROCEDURES['glibc']['__libc_start_main'] # Note 'glibc' instead of 'libc'.
if simulation.found: print(simulation.found[0].posix.dumps(0))
if __name__ == '__main__': main(sys.argv)
14_angr_shared_library
The shared library has the function validate, which takes a string and returns either true (1) or false (0). The binary calls this function. If it returns true, the program prints “Good Job.” otherwise, it prints “Try again.”
Note: When you run this script, make sure you run it on lib14_angr_shared_library.so, not the executable. This level is intended to teach how to analyse binary formats that are not typical executables.
# The shared library has the function validate, which takes a string and returns # either true (1) or false (0). The binary calls this function. If it returns # true, the program prints "Good Job." otherwise, it prints "Try again."
# Note: When you run this script, make sure you run it on # lib14_angr_shared_library.so, not the executable. This level is intended to # teach how to analyse binary formats that are not typical executables.
import angr import claripy import sys
defmain(argv): path_to_binary = argv[1]
# The shared library is compiled with position-independent code. You will need # to specify the base address. All addresses in the shared library will be # base + offset, where offset is their address in the file. # (!) base = 0x4000000 project = angr.Project(path_to_binary, load_options={ 'main_opts' : { 'custom_base_addr' : base } })
# Initialize any symbolic values here; you will need at least one to pass to # the validate function. ...
# Begin the state at the beginning of the validate function, as if it was # called by the program. Determine the parameters needed to call validate and # replace 'parameters...' with bitvectors holding the values you wish to pass. # Recall that 'claripy.BVV(value, size_in_bits)' constructs a bitvector # initialized to a single value. # Remember to add the base value you specified at the beginning to the # function address! # Hint: int validate(char* buffer, int length) { ... # Another hint: the password is 8 bytes long. # (!) validate_function_address = 0x6d7 buffer_pointer = claripy.BVV(0x3000000, 32) initial_state = project.factory.call_state(validate_function_address, buffer_pointer, claripy.BVV(8, 32)) password = claripy.BVS('password', 8*8) initial_state.memory.store(buffer_pointer, password)
# You will need to add code to inject a symbolic value into the program at the # end of the function that constrains eax to equal true (value of 1) just # before the function returns. There are multiple ways to do this: # 1. Use a hook. # 2. Search for the address just before the function returns and then # constrain eax (this may require putting code elsewhere) ...
simulation = project.factory.simgr(initial_state)
success_address = base + 0x783 simulation.explore(find=success_address)
if simulation.found: solution_state = simulation.found[0] # Determine where the program places the return value, and constrain it so # that it is true. Then, solve for the solution and print it. # (!) pirnt(solution_state.solver.eval(password, cast_to=bytes)) else: raise Exception('Could not find the solution')
# You can either use a blank state or an entry state; just make sure to start # at the beginning of the program. # (!) initial_state = project.factory.entry_state()
# The bitvector.chop(bits=n) function splits the bitvector into a Python # list containing the bitvector in segments of n bits each. In this case, # we are splitting them into segments of 8 bits (one byte.) for char in scanf1.chop(bits=8): # Ensure that each character in the string is printable. An interesting # experiment, once you have a working solution, would be to run the code # without constraining the characters to the printable range of ASCII. # Even though the solution will technically work without this, it's more # difficult to enter in a solution that contains character you can't # copy, paste, or type into your terminal or the web form that checks # your solution. # (!) self.state.add_constraints(char >= 'A', char <= 'Z')
# Warning: Endianness only applies to integers. If you store a string in # memory and treat it as a little-endian integer, it will be backwards. scanf0_address = param0 self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness) scanf1_address = param1 self.state.memory.store(scanf1_address, scanf1)
# We will call this whenever puts is called. The goal of this function is to # determine if the pointer passed to puts is controllable by the user, such # that we can rewrite it to point to the string "Good Job." defcheck_puts(state): # Treat the implementation of this function as if puts was just called. # The stack, registers, memory, etc should be set up as if the x86 call # instruction was just invoked (but, of course, the function hasn't copied # the buffers yet.) # The stack will look as follows: # ... # esp + 7 -> /----------------\ # esp + 6 -> | puts | # esp + 5 -> | parameter | # esp + 4 -> \----------------/ # esp + 3 -> /----------------\ # esp + 2 -> | return | # esp + 1 -> | address | # esp -> \----------------/ # (!) puts_parameter = state.memory.load(state.regs.esp + 4, 4, endness=project.arch.memory_endness)
# The following function takes a bitvector as a parameter and checks if it # can take on more than one value. While this does not necessary tell us we # have found an exploitable path, it is a strong indication that the # bitvector we checked may be controllable by the user. # Use it to determine if the pointer passed to puts is symbolic. # (!) if state.se.symbolic(puts_parameter): # Determine the location of the "Good Job.\n" string. We want to print it # out, and we will do so by attempting to constrain the puts parameter to # equal it. (Hint: look at .rodata). # (!) good_job_string_address = 0x594e4257# :integer, probably hexadecimal
# Create an expression that will test if puts_parameter equals # good_job_string_address. If we add this as a constraint to our solver, # it will try and find an input to make this expression true. # (!) is_vulnerable_expression = puts_parameter == good_job_string_address # :boolean bitvector expression
# Have Angr evaluate the state to determine if all the constraints can # be met, including the one we specified above. If it can be satisfied, # we have found our exploit! copied_state = state.copy()
copied_state.add_constraints(is_vulnerable_expression) if copied_state.satisfiable(): # Before we return, let's add the constraint to the solver for real, # instead of just querying whether the constraint _could_ be added. state.add_constraints(is_vulnerable_expression) returnTrue else: returnFalse else: # not path.state.se.symbolic(???) returnFalse
simulation = project.factory.simgr(initial_state)
defis_successful(state): # We are looking for puts. Check that the address is at the (very) beginning # of the puts function. Warning: while, in theory, you could look for # any address in puts, if you execute any instruction that adjusts the stack # pointer, the stack diagram above will be incorrect. Therefore, it is # recommended that you check for the very beginning of puts. # (!) puts_address = 0x8048370 if state.addr == puts_address: # Return True if we determine this call to puts is exploitable. return check_puts(state) else: # We have not yet found a call to puts; we should continue! returnFalse
# Determine the situation in which you should avoid. Optionally, you can # remove the avoid parameter, although it may cause the program to run more # slowly. # (!) simulation.explore(find=is_successful)
if simulation.found: solution_state = simulation.found[0]