通过angr-ctf学习angr符号执行

angr_ctf: https://github.com/jakespringer/angr_ctf
angr: https://angr.io/

00_angr_find

Step

1. Create an angr project.
 2. Create an entry to start execution.
 3. Create a simulation manager initialized with the starting state.
 4. Explore the binary.
 5. Find the input.

import angr
import sys

def main(argv):
  # Create an Angr project.
  # If you want to be able to point to the binary from the command line, you can
  # use argv[1] as the parameter. Then, you can run the script from the command
  # line as follows:
  # python ./scaffold00.py [binary]
  # (!)
  path_to_binary = argv[1]  # m:string
  project = angr.Project(path_to_binary)

  # Tell Angr where to start executing (should it start from the main()
  # function or somewhere else?) For now, use the entry_state function
  # to instruct Angr to start from the main() function.
  initial_state = project.factory.entry_state()

  # Create a simulation manager initialized with the starting state. It provides
  # a number of useful tools to search and execute the binary.
  simulation = project.factory.simgr(initial_state)

  # Explore the binary to attempt to find the address that prints "Good Job."
  # You will have to find the address you want to find and insert it here. 
  # This function will keep executing until it either finds a solution or it 
  # has explored every possible path through the executable.
  # (!)
  print_good_address = ???  # :integer (probably in hexadecimal)
  simulation.explore(find=print_good_address)

  # Check that we have found a solution. The simulation.explore() method will
  # set simulation.found to a list of the states that it could find that reach
  # the instruction we asked it to search for. Remember, in Python, if a list
  # is empty, it will be evaluated as false, otherwise true.
  if simulation.found:
    # The explore method stops after it finds a single state that arrives at the
    # target address.
    solution_state = simulation.found[0]

    # Print the string that Angr wrote to stdin to follow solution_state. This 
    # is our solution.
    print solution_state.posix.dumps(sys.stdin.fileno())
  else:
    # If Angr could not find a path that reaches print_good_address, throw an
    # error. Perhaps you mistyped the print_good_address?
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

The problem solving script:

import angr
import sys

def main(argv):

  bin_path = argv[1]
  project = angr.Project(bin_path)
  initial_state = project.factory.entry_state()
  simulation = project.factory.simgr(initial_state)

  target_addr = 0x08048678
  simulation.explore(find=target_addr)
  
  if simulation.found:
    print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode("utf-8"))

  else:
    raise Exception("Cound't find it!")


if __name__ == "__main__":
  main(sys.argv)

01_angr_avoid

Tick stash “stash” forward (up to “n” times or until “num_find” states are found), looking for condition “find”, avoiding condition “avoid”. Stores found states into “find_stash’ and avoided states into “avoid_stash”.

The “find” and “avoid” parameters may be any of:

An address to find

A set or list of addresses to find

A function that takes a state and returns whether or not it matches.

If an angr CFG is passed in as the “cfg” parameter and “find” is either a number or a list or a set, then any states which cannot possibly reach a success state without going through a failure state will be preemptively avoided.

The “avoid” argument will not store the found state. This will help us save the time.

When I fixed the problem, IDA suggested that the function was too large.

When I solved this challenge and used ida to decompile, IDA showed that the function was too big.

Solution: change the config file.

MacOS: IDA_HOME/idabin/cfg/hexrays.cfg

MAX_FUNCSIZE            = 64        // Functions over 64K are not decompiled

Change that to the following:

MAX_FUNCSIZE            = 1024        // Functions over 1024K are not decompiled

Although we fixed the configuration file, this function is too big and we will spend too much time decompiling it. So we can read the assembly code directly.

import angr
import sys

def main(argv):
  # Explore the binary, but this time, instead of only looking for a state that
  # reaches the print_good_address, also find a state that does not reach 
  # will_not_succeed_address. The binary is pretty large, to save you some time,
  # everything you will need to look at is near the beginning of the address 
  # space.
  # (!)
  binary_path = argv[1]
  proj = angr.Project(binary_path, auto_load_libs=False)

  initial_state = proj.factory.entry_state()
  simulation = proj.factory.simulation_manager(initial_state)

  target_address = 0x80485E0
  avoid_address = [0x080485A8, 0x080485F2]

  simulation.explore(find=target_address, avoid=avoid_address)

  if simulation.found:
    print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode('utf-8'))
  else:
    raise Exception("Don't find the target.")


if __name__ == "__main__":
  main(sys.argv)

02_angr_find_condition

It is very useful to be able to search for a state that reaches a certain

instruction. However, in some cases, you may not know the address of the

specific instruction you want to reach (or perhaps there is no single

instruction goal.) In this challenge, you don’t know which instruction

grants you success. Instead, you just know that you want to find a state where

the binary prints “Good Job.”

Angr is powerful in that it allows you to search for a states that meets an

arbitrary condition that you specify in Python, using a predicate you define

as a function that takes a state and returns True if you have found what you

are looking for, and False otherwise.

Although we can find the address of the target instruction, in order to train the writing of angr, we use the recommended method of the script, which is to find the conditions we hope to achieve by writing functions.

import angr
import sys

def main(argv):
  bin_path = argv[1]
  project = angr.Project(bin_path)
  
  initial_state = project.factory.entry_state()
  simulation = project.factory.simulation_manager(initial_state)
  
  def is_successful(state):
    return True if "Good Job." in state.posix.dumps(sys.stdout.fileno()).decode("utf-8") else False
  
  def should_abort(state):
    return True if "Try again." in state.posix.dumps(sys.stdout.fileno()).decode("utf-8") else False

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode("utf-8"))
  else:
    raise Exception("Can't find it.")
  

if __name__ == "__main__":
  main(sys.argv)

03_angr_symbolic_registers

Through this challenge we can learn how to control registers and manually set entries through angr.

# Angr doesn't currently support reading multiple things with scanf (Ex: 
# scanf("%u %u).) You will have to tell the simulation engine to begin the
# program after scanf is called and manually inject the symbols into registers.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # Sometimes, you want to specify where the program should start. The variable
  # start_address will specify where the symbolic execution engine should begin.
  # Note that we are using blank_state, not entry_state.
  # (!)
  start_address = 0x8048980  # :integer (probably hexadecimal)
  initial_state = project.factory.blank_state(addr=start_address)

  # Create a symbolic bitvector (the datatype Angr uses to inject symbolic
  # values into the binary.) The first parameter is just a name Angr uses
  # to reference it. 
  # You will have to construct multiple bitvectors. Copy the two lines below
  # and change the variable names. To figure out how many (and of what size)
  # you need, dissassemble the binary and determine the format parameter passed
  # to scanf.
  # (!)
  password_size_in_bits = 20 * 8  # :integer
  password0 = claripy.BVS('password0', password_size_in_bits)
  password1 = claripy.BVS('password1', password_size_in_bits)
  password2 = claripy.BVS('password2', password_size_in_bits)
  ...

  # Set a register to a symbolic value. This is one way to inject symbols into
  # the program.
  # initial_state.regs stores a number of convenient attributes that reference
  # registers by name. For example, to set eax to password0, use:
  #
  # initial_state.regs.eax = password0
  #
  # You will have to set multiple registers to distinct bitvectors. Copy and
  # paste the line below and change the register. To determine which registers
  # to inject which symbol, dissassemble the binary and look at the instructions
  # immediately following the call to scanf.
  # (!)
  initial_state.regs.eax = password0
  initial_state.regs.ebx = password1
  initial_state.regs.edx = password2

  ...

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return True if "Good Job." in stdout_output.decode("utf-8") else False

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return True if "Good Job." in stdout_output.decode("utf-8") else False

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # Solve for the symbolic values. If there are multiple solutions, we only
    # care about one, so we can use eval, which returns any (but only one)
    # solution. Pass eval the bitvector you want to solve for.
    # (!)
    solution0 = solution_state.solver.eval(password0)
    solution1 = solution_state.solver.eval(password1)
    solution2 = solution_state.solver.eval(password2)
    ...

    # Aggregate and format the solutions you computed above, and then print
    # the full string. Pay attention to the order of the integers, and the
    # expected base (decimal, octal, hexadecimal, etc).
    # solution = solution0 + " " + solution1 + " " + solution2  # :string
    print("{:x} {:x} {:x}".format(solution0, solution1, solution2))
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Keywords: simulation.found[0].solver.eval(), initial_state.regs.register

04_angr_symbolic_stack

This challenge exercises our ability to construct stacks using angr.

Tips:

Since we are starting after scanf, we are skipping this stack construction step. To make up for this, we need to construct the stack ourselves. Let us start by initializing ebp in the exact same way the program does.

# This challenge will be more challenging than the previous challenges that you
# have encountered thus far. Since the goal of this CTF is to teach symbolic
# execution and not how to construct stack frames, these comments will work you
# through understanding what is on the stack.
#   ! ! !
# IMPORTANT: Any addresses in this script aren't necessarily right! Dissassemble
#            the binary yourself to determine the correct addresses!
#   ! ! !


import angr
import claripy
import sys
def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048697
  initial_state = project.factory.blank_state(addr=start_address)


  initial_state.regs.ebp = initial_state.regs.esp
  print(initial_state.regs.ebp)


  password0 = claripy.BVS('password0', 160)
  password1 = claripy.BVS('password1', 160)
  ...

  
  padding_length_in_bytes = 8   # :integer
  initial_state.regs.esp -= padding_length_in_bytes


  initial_state.stack_push(password0)  # :bitvector (claripy.BVS, claripy.BVV, claripy.BV)
  initial_state.stack_push(password1)
  ...

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b"Good" in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b"Try" in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution0 = solution_state.solver.eval(password0)
    solution1 = solution_state.solver.eval(password1)
    ...
    print(solution0, solution1)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

padding_length_in_bytes = 8

Note that the value of padding in the above script is calculated by stack.

.text:08048679                 push    ebp
.text:0804867A                 mov     ebp, esp
.text:0804867C                 sub     esp, 18h

In ida, the stack memory for this function is 0x18 bytes.

“v1” is located at the position of “ebp - 0x10” and the “v2” is located at the position of “ebp - 0xc”. So when the address is indexed by ESP, the offset is 0x18-0x10, which is exactly the position of the first parameter.

keywords: initial_state.stack_push()

05_angr_symbolic_memory

Determine the address of the global variable to which scanf writes the user input. The function ‘initial_state.memory.store(address, value)’ will write ‘value’ (a bitvector) to ‘address’ (a memory location, as an integer.) The ‘address’ parameter can also be a bitvector (and can be symbolic!).

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048601
  initial_state = project.factory.blank_state(addr=start_address)

  # The binary is calling scanf("%8s %8s %8s %8s").
  # (!)
  password0 = claripy.BVS('password0', 160)
  password1 = claripy.BVS('password1', 160)
  password2 = claripy.BVS('password2', 160)
  password3 = claripy.BVS('password3', 160)
  ...

  password0_address = 0xA1BA1C0
  password1_address = 0xA1BA1C8
  password2_address = 0xA1BA1D0
  password3_address = 0xA1BA1D8

  initial_state.memory.store(password0_address, password0)
  initial_state.memory.store(password1_address, password1)
  initial_state.memory.store(password2_address, password2)
  initial_state.memory.store(password3_address, password3)
  ...

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    solution0 = solution_state.solver.eval(password0, cast_to=bytes).decode("utf-8")
    solution1 = solution_state.solver.eval(password1, cast_to=bytes).decode("utf-8")
    solution2 = solution_state.solver.eval(password2, cast_to=bytes).decode("utf-8")
    solution3 = solution_state.solver.eval(password3, cast_to=bytes).decode("utf-8")
    ...
    print(solution0, solution1, solution2, solution3)

  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Keyword: initial_state.memory.store(address, value)

06_angr_symbolic_dynamic_memory

Instead of telling the binary to write to the address of the memory allocated with malloc, we can simply fake an address to any unused block of memory and overwrite the pointer to the data. This will point the pointer with the address of pointer_to_malloc_memory_address0 to fake_heap_address.
Be aware, there is more than one pointer! Analyze the binary to determine global location of each pointer.
Note: by default, Angr stores integers in memory with big-endianness. To specify to use the endianness of your architecture, use the parameter endness=project.arch.memory_endness. On x86, this is little-endian.

The key to this challenge is that we skipped malloc at the address at the beginning, so we have to manually allocate a fake heap address for it.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048699
  initial_state = project.factory.blank_state(addr=start_address)

  password0 = claripy.BVS('password0', 64)
  password1 = claripy.BVS('password1', 64)
  ...

  fake_heap_address0 = 0xABCC800
  pointer_to_malloc_memory_address0 = 0xABCC8A4
  initial_state.memory.store(pointer_to_malloc_memory_address0, fake_heap_address0, endness=project.arch.memory_endness)

  fake_heap_address1 = 0xABCC7e0
  pointer_to_malloc_memory_address1 = 0xABCC8AC
  initial_state.memory.store(pointer_to_malloc_memory_address1, fake_heap_address1, endness=project.arch.memory_endness)
  ...

  # Store our symbolic values at our fake_heap_address. Look at the binary to
  # determine the offsets from the fake_heap_address where scanf writes.
  # (!)
  initial_state.memory.store(fake_heap_address0, password0)
  initial_state.memory.store(fake_heap_address1, password1)
  ...

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution0 = solution_state.solver.eval(password0,cast_to=bytes).decode('utf-8')
    solution1 = solution_state.solver.eval(password1,cast_to=bytes).decode('utf-8')
    ...

    print(solution0, solution1)
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Keyword: state.memory.store(address, value, endness=project.arch.memory_endness)

07_angr_symbolic_file

This challenge could, in theory, be solved in multiple ways. However, for the

sake of learning how to simulate an alternate filesystem, please solve this

challenge according to structure provided below. As a challenge, once you have

an initial solution, try solving this in an alternate way.

Problem description and general solution strategy:

The binary loads the password from a file using the fread function. If the

password is correct, it prints “Good Job.” In order to keep consistency with

the other challenges, the input from the console is written to a file in the

ignore_me function. As the name suggests, ignore it, as it only exists to

maintain consistency with other challenges.

We want to:

\1. Determine the file from which fread reads.

\2. Use Angr to simulate a filesystem where that file is replaced with our own

simulated file.

\3. Initialize the file with a symbolic value, which will be read with fread

and propogated through the program.

\4. Solve for the symbolic input to determine the password.

Construct a bitvector for the password and then store it in the file’s

backing memory. The store method works exactly the same as the store method

you have already used. In fact, it’s the exact same method! That means that

memory.store(address, bitvector) will write bitvector to the address we

specify. In this memory, unlike our program’s memory, we want to write to

the beginning, as the Linux file will stream data from the beginning of the

file. For example, imagine a simple file, ‘hello.txt’:

Hello world, my name is John.

^ ^

^ address 0 ^ address 24 (count the number of characters)

In order to represent this in memory, we would want to write the string to

the beginning of the file:

hello_txt_contents = claripy.BVV(‘Hello world, my name is John.’, 30*8)

hello_txt_backing_memory.store(0, hello_txt_contents)

Perhaps, then, we would want to replace John with a

symbolic variable. We would call:

name_bitvector = claripy.BVS(‘symbolic_name’, 4*8)

hello_txt_backing_memory.store(24, name_bitvector)

Then, after the program calls fopen(‘hello.txt’, ‘r’) and then

fread(buffer, sizeof(char), 30, hello_txt_file), the buffer would contain

the string from the file, except four symbolic bytes where the name would be

stored.

import angr
import sys
import claripy

def main(argv):
  bin_path = argv[1]
  project = angr.Project(bin_path)

  initial_address = 0x080488D6
  initial_state = project.factory.blank_state(addr=initial_address)

  password = claripy.BVS('password', 64 * 8)
  
  filename = 'OJKSQYDP.txt'
  simfile = angr.storage.SimFile(name=filename, content=password, size=64)
  initial_state.fs.insert(filename, simfile)

  simulation = project.factory.simulation_manager(initial_state)

  def good(state):
    return b'Good' in state.posix.dumps(sys.stdout.fileno())
  
  def bad(state):
    return b'Try' in state.posix.dumps(sys.stdout.fileno())

  simulation.explore(find=good, avoid=bad)
  if simulation.found:
    print(simulation.found[0].solver.eval(password, cast_to=bytes).decode('utf-8'))
  else:
    print("not found")

if __name__ == '__main__':
  main(sys.argv)

Keyword: angr.storage.SimFile(name=filename, content=symbolic_vector, size=file_size), initial_state.fs.insert(filename, simfile)

08_angr_constraints

The binary asks for a 16 character password to which is applies a complex function and then compares with a reference string with the function checkequals[reference string]. (Decompile the binary and take a look at it!)

The source code for this function is provided here. However, the reference string in your version will be different than AABBCCDDEEFFGGHH.

The function checks if *to_check == “AABBCCDDEEFFGGHH”. Verify this yourself. While you, as a human, can easily determine that this function is equivalent to simply comparing the strings, the computer cannot. Instead the computer would need to branch every time the if statement in the loop was called (16 times), resulting in 2^16 = 65,536 branches, which will take too long of a time to evaluate for our needs.

We do not know how the complex_function works, but we want to find an input that, when modified by complex_function, will produce the string: AABBCCDDEEFFGGHH.

In this puzzle, your goal will be to stop the program before this function is called and manually constrain the to_check variable to be equal to the password you identify by decompiling the binary. Since, you, as a human, know that if the strings are equal, the program will print “Good Job.”, you can be assured that if the program can solve for an input that makes them equal, the input will be the correct password.

Source code:

#define REFERENCE_PASSWORD = "AABBCCDDEEFFGGHH";
int check_equals_AABBCCDDEEFFGGHH(char* to_check, size_t length) {
  uint32_t num_correct = 0;
  for (int i=0; i<length; ++i) {
    if (to_check[i] == REFERENCE_PASSWORD[i]) {
      num_correct += 1;
    }
  }
  return num_correct == length;
}

...

char* input = user_input();
char* encrypted_input = complex_function(input);
if (check_equals_AABBCCDDEEFFGGHH(encrypted_input, 16)) {
  puts("Good Job.");
} else {
  puts("Try again.");
}

This challenge is quite practical. I have seen many ctf challenges using this writing method: compare the input string with the target string character by character. This is easy for us humans, but not for computers. When we use symbols to execute, this challenge has 16 characters, so we have to compare 2 to the 16th power, which is obviously unbearable. In order to solve this problem, we need to manually specify constraints.

import angr
import claripy
import sys

def main(argv):
  bin_path = argv[1]
  project = angr.Project(bin_path)

  initial_state = project.factory.entry_state()
  simulation_manager = project.factory.simgr(initial_state)

  target_address = 0x0804866E
  simulation_manager.explore(find=target_address)

  if simulation_manager.found:
    solution_state = simulation_manager.found[0]

  constrained_parameter_address = 0x0804A050
  constrained_parameter_size = 16
  constrained_parameter_bitvector = solution_state.memory.load(constrained_parameter_address, constrained_parameter_size)

  solution_state.add_constraints(constrained_parameter_bitvector == 'AUPDNNPROEZRJWKB')
  print(solution_state.posix.dumps(sys.stdin.fileno()).decode('utf-8'))


if __name__ == '__main__':
  main(sys.argv)

Keyword: solution_state.memory.load(target_address, size), solution_state.add_constraints(input == target)

09_angr_hooks

This level performs the following computations:

Get 16 bytes of user input and encrypt it.

Save the result of check_equals_AABBCCDDEEFFGGHH (or similar)

Get another 16 bytes from the user and encrypt it.

Check that it’s equal to a predefined password.

The ONLY part of this program that we have to worry about is #2.We will be replacing the call to checkequals with our own version, using a hook, since checkequals will run too slowly otherwise.

import angr
import sys
import claripy

def main(argv):
  bin_path = argv[1]
  project = angr.Project(bin_path)

  initial_state = project.factory.entry_state()
  
  instrction_step_address = 0x80486B3
  jump_size = 5

  @project.hook(instrction_step_address, length=jump_size)
  def skip_check_equals(state):
    user_input_string = state.memory.load(0x0804A054, 16)
    state.regs.eax = claripy.If(
      user_input_string == 'XYMKBKUHNIQYNQXE',
      claripy.BVV(1, 32),
      claripy.BVV(0, 32)
    )

  simulation = project.factory.simulation_manager(initial_state)

  def good(state):
    return b'Good' in state.posix.dumps(sys.stdout.fileno())
  
  def bad(state):
    return b'Try' in state.posix.dumps(sys.stdout.fileno())

  simulation.explore(find=good, avoid=bad)
  
  if simulation.found:
    print(simulation.found[0].posix.dumps(sys.stdin.fileno()).decode('utf-8'))
  else:
    print('cannot find it!')


if __name__ == '__main__':
  main(sys.argv)

Detailed explanation of hook: https://docs.angr.io/extending-angr/simprocedures#user-hooks

Keyword: @project.hook(address, length) func(state): …

10_angr_simprocedures

You may be thinking:
Why can’t I just use hooks? The function is called many times, but if I hook the address of the function itself (rather than the addresses where it is called), I can replace its behavior everywhere. Furthermore, I can get the parameters by reading them off the stack (with memory.load(regs.esp + xx)), and return a value by simply setting eax! Since I know the length of the function in bytes, I can return from the hook just before the ‘ret’ instruction is called, which will allow the program to jump back to where it was before it called my hook.
If you thought that, then congratulations! You have just invented the idea of SimProcedures! Instead of doing all of that by hand, you can let the already-implemented SimProcedures do the boring work for you so that you can focus on writing a replacement function in a Pythonic way.
As a bonus, SimProcedures allow you to specify custom calling conventions, but unfortunately it is not covered in this CTF.
A SimProcedure replaces a function in the binary with a simulated one   written in Python. Other than it being written in Python, the function    acts largely the same as any function written in C. Any parameter after  'self' will be treated as a parameter to the function you are replacing.
The parameters will be bitvectors. Additionally, the Python can return in    the ususal Pythonic way. Angr will treat this in the same way it would    treat a native function in the binary returning. An example:

int add_if_positive(int a, int b) {
  if (a >= 0 && b >= 0) return a + b;
  else return 0;
}

could be simulated with...

class ReplacementAddIfPositive(angr.SimProcedure):
  def run(self, a, b):
    if a >= 0 and b >=0:
      return a + b
    else:
      return 0

Finish the parameters to the check_equals_ function. Reminder:   int check_equals_AABBCCDDEEFFGGHH(char* to_check, int length) { ...
(!)

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state()

  class ReplacementCheckEquals(angr.SimProcedure):
    def run(self, to_check, length):
      # We can almost copy and paste the solution from the previous challenge.
      # Hint: Don't look up the address! It's passed as a parameter.
      # (!)
      user_input_buffer_address = to_check
      user_input_buffer_length = length

      # Note the use of self.state to find the state of the system in a SimProcedure.
      user_input_string = self.state.memory.load(
        user_input_buffer_address,
        user_input_buffer_length
      )

      check_against_string = "ORSDDWXHZURJRBDH"
      
      # Finally, instead of setting eax, we can use a Pythonic return statement
      # to return the output of this function. 
      # Hint: Look at the previous solution.
      return claripy.If(user_input_string ==  check_against_string, claripy.BVV(1, 32), claripy.BVV(0, 32))


  # Hook the check_equals symbol. Angr automatically looks up the address 
  # associated with the symbol. Alternatively, you can use 'hook' instead
  # of 'hook_symbol' and specify the address of the function. To find the 
  # correct symbol, disassemble the binary.
  # (!)
  check_equals_symbol = "check_equals_ORSDDWXHZURJRBDH" # :string
  project.hook_symbol(check_equals_symbol, ReplacementCheckEquals())

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Good' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'Try' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    print(solution_state.posix.dumps(sys.stdin.fileno()))
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

11_angr_sim_scanf

# This time, the solution involves simply replacing scanf with our own version,
# since Angr does not support requesting multiple parameters with scanf.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state()

  class ReplacementScanf(angr.SimProcedure):
    # Finish the parameters to the scanf function. Hint: 'scanf("%u %u", ...)'.
    # (!)
    def run(self, format_string, scanf0_address, scanf1_address):
      scanf0 = claripy.BVS('scanf0', 32)
      scanf1 = claripy.BVS('scanf1', 32)
      ...

      # The scanf function writes user input to the buffers to which the parameters point.
      self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
      self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness)
      ...

      # Now, we want to 'set aside' references to our symbolic values in the
      # globals plugin included by default with a state. You will need to
      # store multiple bitvectors. You can either use a list, tuple, or multiple
      # keys to reference the different bitvectors.
      # (!)
      self.state.globals['solution0'] = scanf0
      self.state.globals['solution1'] = scanf1

  scanf_symbol = "__isoc99_scanf"
  project.hook_symbol(scanf_symbol, ReplacementScanf())

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b"Good" in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b"Try" in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # Grab whatever you set aside in the globals dict.
    stored_solutions0 = solution_state.globals['solution0']
    stored_solutions1 = solution_state.globals['solution1']
    ...

    print(solution_state.solver.eval(stored_solutions0), solution_state.solver.eval(stored_solutions1))
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

12_angr_veritesting

# When you construct a simulation manager, you will want to enable Veritesting:
# project.factory.simgr(initial_state, veritesting=True)
# Hint: use one of the first few levels' solutions as a reference.

import angr
import claripy
import sys

def main(argv):
    bin_path = argv[1]
    project = angr.Project(bin_path)
    initial_state = project.factory.entry_state()

    simulation = project.factory.simgr(initial_state, veritesting=True)
    simulation.explore(find=0x804867C)
    if simulation.found:
        print(simulation.found[0].posix.dumps(0))


if __name__ == '__main__':
    main(sys.argv)

13_angr_static_binary

Additionally, note that, when the binary is executed, the main function is not

the first piece of code called. In the _start function, __libc_start_main is

called to start your program. The initialization that occurs in this function

can take a long time with Angr, so you should replace it with a SimProcedure.

angr.SIM_PROCEDURES[‘glibc’][‘__libc_start_main’]

Note ‘glibc’ instead of ‘libc’.

# This challenge is the exact same as the first challenge, except that it was
# compiled as a static binary. Normally, Angr automatically replaces standard
# library functions with SimProcedures that work much more quickly.

# Here are a few SimProcedures Angr has already written for you. They implement
# standard library functions. You will not need all of them:
# angr.SIM_PROCEDURES['libc']['malloc']
# angr.SIM_PROCEDURES['libc']['fopen']
# angr.SIM_PROCEDURES['libc']['fclose']
# angr.SIM_PROCEDURES['libc']['fwrite']
# angr.SIM_PROCEDURES['libc']['getchar']
# angr.SIM_PROCEDURES['libc']['strncmp']
# angr.SIM_PROCEDURES['libc']['strcmp']
# angr.SIM_PROCEDURES['libc']['scanf']
# angr.SIM_PROCEDURES['libc']['printf']
# angr.SIM_PROCEDURES['libc']['puts']
# angr.SIM_PROCEDURES['libc']['exit']
#
# As a reminder, you can hook functions with something similar to: 
# project.hook(malloc_address, angr.SIM_PROCEDURES['libc']['malloc'])
#
# There are many more, see:
# https://github.com/angr/angr/tree/master/angr/procedures/libc
#
# Additionally, note that, when the binary is executed, the main function is not
# the first piece of code called. In the _start function, __libc_start_main is 
# called to start your program. The initialization that occurs in this function
# can take a long time with Angr, so you should replace it with a SimProcedure.
# angr.SIM_PROCEDURES['glibc']['__libc_start_main']
# Note 'glibc' instead of 'libc'.

import angr
import claripy
import sys

def main(argv):
    bin_path = argv[1]
    project = angr.Project(bin_path)

    project.hook(0x8048D10, angr.SIM_PROCEDURES['glibc']['__libc_start_main']())
    project.hook(0x804ED40, angr.SIM_PROCEDURES['libc']['printf']())
    project.hook(0x0804ED80, angr.SIM_PROCEDURES['libc']['scanf']())
    project.hook(0x805B450, angr.SIM_PROCEDURES['libc']['strcmp']())
    project.hook(0x804F350, angr.SIM_PROCEDURES['libc']['puts']())

    initial_state = project.factory.entry_state()
    simulation = project.factory.simgr(initial_state)
    def good(state):
        return b'Good' in state.posix.dumps(1)
    simulation.explore(find=good)

    if simulation.found:
        print(simulation.found[0].posix.dumps(0))

if __name__ == '__main__':
    main(sys.argv)

14_angr_shared_library

The shared library has the function validate, which takes a string and returns either true (1) or false (0). The binary calls this function. If it returns true, the program prints “Good Job.” otherwise, it prints “Try again.”

Note: When you run this script, make sure you run it on lib14_angr_shared_library.so, not the executable. This level is intended to teach how to analyse binary formats that are not typical executables.

# The shared library has the function validate, which takes a string and returns
# either true (1) or false (0). The binary calls this function. If it returns
# true, the program prints "Good Job." otherwise, it prints "Try again."

# Note: When you run this script, make sure you run it on 
# lib14_angr_shared_library.so, not the executable. This level is intended to 
# teach how to analyse binary formats that are not typical executables.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]

  # The shared library is compiled with position-independent code. You will need
  # to specify the base address. All addresses in the shared library will be
  # base + offset, where offset is their address in the file.
  # (!)
  base = 0x4000000
  project = angr.Project(path_to_binary, load_options={ 
    'main_opts' : { 
      'custom_base_addr' : base 
    } 
  })

  # Initialize any symbolic values here; you will need at least one to pass to
  # the validate function.
  ...

  # Begin the state at the beginning of the validate function, as if it was
  # called by the program. Determine the parameters needed to call validate and
  # replace 'parameters...' with bitvectors holding the values you wish to pass.
  # Recall that 'claripy.BVV(value, size_in_bits)' constructs a bitvector 
  # initialized to a single value.
  # Remember to add the base value you specified at the beginning to the
  # function address!
  # Hint: int validate(char* buffer, int length) { ...
  # Another hint: the password is 8 bytes long.
  # (!)
  validate_function_address = 0x6d7
  buffer_pointer = claripy.BVV(0x3000000, 32)
  initial_state = project.factory.call_state(validate_function_address, buffer_pointer, claripy.BVV(8, 32))
  password = claripy.BVS('password', 8*8)
  initial_state.memory.store(buffer_pointer, password)

  # You will need to add code to inject a symbolic value into the program at the
  # end of the function that constrains eax to equal true (value of 1) just
  # before the function returns. There are multiple ways to do this:
  # 1. Use a hook.
  # 2. Search for the address just before the function returns and then
  #    constrain eax (this may require putting code elsewhere)
  ...

  simulation = project.factory.simgr(initial_state)

  success_address =  base + 0x783
  simulation.explore(find=success_address)

  if simulation.found:
    solution_state = simulation.found[0]
  
    # Determine where the program places the return value, and constrain it so
    # that it is true. Then, solve for the solution and print it.
    # (!)
    pirnt(solution_state.solver.eval(password, cast_to=bytes))
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

15_angr_arbitrary_read


import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # You can either use a blank state or an entry state; just make sure to start
  # at the beginning of the program.
  # (!)
  initial_state = project.factory.entry_state()

  # Again, scanf needs to be replaced.
  class ReplacementScanf(angr.SimProcedure):
    # Hint: scanf("%u %20s")
    def run(self, format_string, param0, param1):
      # %u
      scanf0 = claripy.BVS('scanf0', 32)
      
      # %20s
      scanf1 = claripy.BVS('scanf1', 20*8)

      # The bitvector.chop(bits=n) function splits the bitvector into a Python
      # list containing the bitvector in segments of n bits each. In this case,
      # we are splitting them into segments of 8 bits (one byte.)
      for char in scanf1.chop(bits=8):
        # Ensure that each character in the string is printable. An interesting
        # experiment, once you have a working solution, would be to run the code
        # without constraining the characters to the printable range of ASCII.
        # Even though the solution will technically work without this, it's more
        # difficult to enter in a solution that contains character you can't
        # copy, paste, or type into your terminal or the web form that checks 
        # your solution.
        # (!)
        self.state.add_constraints(char >= 'A', char <= 'Z')

      # Warning: Endianness only applies to integers. If you store a string in
      # memory and treat it as a little-endian integer, it will be backwards.
      scanf0_address = param0
      self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
      scanf1_address = param1
      self.state.memory.store(scanf1_address, scanf1)

      self.state.globals['solutions'] = (scanf0, scanf1)

  scanf_symbol = '__isoc99_scanf'  # :string
  project.hook_symbol(scanf_symbol, ReplacementScanf())

  # We will call this whenever puts is called. The goal of this function is to
  # determine if the pointer passed to puts is controllable by the user, such
  # that we can rewrite it to point to the string "Good Job."
  def check_puts(state):
    # Treat the implementation of this function as if puts was just called.
    # The stack, registers, memory, etc should be set up as if the x86 call
    # instruction was just invoked (but, of course, the function hasn't copied
    # the buffers yet.)
    # The stack will look as follows:
    # ...
    # esp + 7 -> /----------------\
    # esp + 6 -> |      puts      |
    # esp + 5 -> |    parameter   |
    # esp + 4 -> \----------------/
    # esp + 3 -> /----------------\
    # esp + 2 -> |     return     |
    # esp + 1 -> |     address    |
    #     esp -> \----------------/
    # (!)
    puts_parameter = state.memory.load(state.regs.esp + 4, 4, endness=project.arch.memory_endness)

    # The following function takes a bitvector as a parameter and checks if it
    # can take on more than one value. While this does not necessary tell us we
    # have found an exploitable path, it is a strong indication that the 
    # bitvector we checked may be controllable by the user.
    # Use it to determine if the pointer passed to puts is symbolic.
    # (!)
    if state.se.symbolic(puts_parameter):
      # Determine the location of the "Good Job.\n" string. We want to print it
      # out, and we will do so by attempting to constrain the puts parameter to
      # equal it. (Hint: look at .rodata).
      # (!)
      good_job_string_address = 0x594e4257 # :integer, probably hexadecimal

      # Create an expression that will test if puts_parameter equals
      # good_job_string_address. If we add this as a constraint to our solver,
      # it will try and find an input to make this expression true.
      # (!)
      is_vulnerable_expression = puts_parameter == good_job_string_address # :boolean bitvector expression

      # Have Angr evaluate the state to determine if all the constraints can
      # be met, including the one we specified above. If it can be satisfied,
      # we have found our exploit!
      
      copied_state = state.copy()

      copied_state.add_constraints(is_vulnerable_expression)
       
      if copied_state.satisfiable():
        # Before we return, let's add the constraint to the solver for real,
        # instead of just querying whether the constraint _could_ be added.
        state.add_constraints(is_vulnerable_expression)
        return True
      else:
        return False
    else: # not path.state.se.symbolic(???)
      return False

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    # We are looking for puts. Check that the address is at the (very) beginning
    # of the puts function. Warning: while, in theory, you could look for
    # any address in puts, if you execute any instruction that adjusts the stack
    # pointer, the stack diagram above will be incorrect. Therefore, it is
    # recommended that you check for the very beginning of puts.
    # (!)
    puts_address = 0x8048370
    if state.addr == puts_address:
      # Return True if we determine this call to puts is exploitable.
      return check_puts(state)
    else:
      # We have not yet found a call to puts; we should continue!
      return False

  # Determine the situation in which you should avoid. Optionally, you can
  # remove the avoid parameter, although it may cause the program to run more
  # slowly.
  # (!)
  simulation.explore(find=is_successful)

  if simulation.found:
    solution_state = simulation.found[0]

    (scanf0, scanf1) = solution_state.globals['solutions']
    solution = str(solution_state.se.eval(scanf0)) + ' ' + solution_state.se.eval(scanf1,cast_to=str)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

(Not end…)