Contents

AeroCTF2021 - Dummyper writeups

Note
Use the table of contents on the right to navigate to the section that you are interested in.

Introduction

Challenge Info
  • Given files: dump.
  • Description: This stupid program has encrypted our flag. We only have a dump left.
  • Category: Reverse engineering
  • Summary: We are given some kind of dump file that looks like a valid ELF file. The task is to reverse engineer, recover it and inspect its memory to decrypt the flag that it has encrypted.
TL;DR
  1. Analyze the main function -> see 3 functions, the 1st one seems to be encrypted.
  2. Analyze the 3rd function -> learn that our given dump file is actually the .text, .data and the heap of the executing process that is dumped out.
  3. Analyze the 2nd function -> learn that it encrypts the first function by xoring it with 32 bytes that are read from an unknown stream, also there seems to be some trash data that got insert after the real key.
  4. Use the knowledge that the new ELF file format always include the endbr64 instruction at the start of each compiled function -> xor its opcode with the first 4 bytes of the encrypted code to get the first 4 bytes of the xor key -> search for it in the memory and decrypt the code.
  5. Analyze the 1st function -> learn that it reads the flag into the beginning of the data dump, then encrypts the flag using AES-CBC, the key and IV are also stored somewhere in the data dump the same as the xor key before.
  6. Use the knowledge that the code for AES-CBC actually hard copy the key and IV into a AES_ctx struct, also stored somewhere in the dump, to search for duplicates byte string that is 16 bytes in length -> find the key and IV to decrypt the flag.

Analyzing main function

Right after loading the dump file into IDA, it seems that IDA recognizes that this is a valid ELF file, not a core dump of any kind. In the main function, we can see that it makes 3 calls to 3 smaller functions:

  1. The first function seems to be encrypted.
  2. The second function has a xor loop, which I initially passed and checked the third function first.
  3. The third function looks like it is where our dump file is generated, so I start reversing this function first.

Analyzing the 3rd function

This function first read the file /proc/self/mem. In a Linux system, this is a symbolic link to /proc/<pid>/mem, where <pid> is the process id of the current running process. The file stores information about the virtual memory mapping of the current process.

Tip
You can run the command cat /proc/self/mem in a bash shell to see how this file looks like. Also, if you are a pwn player, you may recognize that it is exactly the same as what you get from the command vmmap in pwndbg.

It converts the first 12 chars of the file’s content to integer, which is the base of .text section. Then it looks for the line with the string "[heap]", then read 12 chars after - in that line, which is the end of heap section. After that, it read the content of /proc/self/mem, which contains all the actual memory sections’ content of the current running process, then read from .text’s base to heap’s end. Finally, it writes out that content to a file named dump, which is the file that is given to us.

Therefore, our given file is not the actual binary itself, but instead the memory content from that start of .text to the end of heap of the process up to the point of time in which the process is reaching the 3rd function. And because initially, I saw that the 1st function in main looks like it is encrypted, I could conclude that the 2nd function is where it encrypts the code that it has executed in the 1st function, so I continued my investigation there.

Analyzing the 2nd function

The 2nd function first uses mprotect to set the page that contains the code in .text to be a RWX page. Then it makes a call to an encrypted function at offset 0x13a9, then read 32 bytes to it. At this point of time, I had no clue what this function really is yet, but there is still some information that I could get from it:

  1. It takes 1 parameter, which looks like a size of some sort.
  2. It returns a pointer.
  3. The function then reads into the pointer from an unknown stream, the number of bytes read is the same as the parameter.

Those info bring me to a conclusion that this maybe is malloc(), or some sort of allocation function (there was no further information to be sure what kind of allocation it is). It then continues to read from that unknown stream 64 more times, with a random number of bytes. These bytes aren’t used anywhere, so I assumed that it is just some kind of distraction. Finally, it uses the first 32 bytes that it read as a XOR key to encrypt 896 bytes of code from offset 0x13a9 onwards. Therefore, the task now is to find the correct key to decrypt the code.

Decrypting the 1st function

Firstly, I wanted to find out what the unknown stream is. By looking at the strings in the file, I saw /dev/urandom, so I immediately assumed that the stream is just a file stream from opening urandom.

Because my initial suspect is that the function at 0x13a9 is malloc(), I tried to dig into the hex dump of our dump file to find any piece of data that looks like heap chunks. Unfortunately, It looks like all the random data are stored contiguously, not in separate chunks, so the function is some kind of contiguous allocation in data section and not malloc(). And probably in the 1st function, the program has already allocated a bunch of data in that section, leaving our xor key somewhere in between.

To find the xor key, the trick that I used is by using the knowledge that in new ELF file compilers, they always insert an endbr64 instruction at the beginning of each compiled function. The instruction endbr64 is 4 bytes in length (F3 0F 1E FA), giving us the first 4 bytes of plain code, we also know the first 4 bytes of the encrypted code, so we can xor them together to get the first 4 bytes of the key. Then I simply search for that 4-byte sequence in dump to find the key, which is at offset 0x4ba74 in the binary. Finally, I wrote a small piece of IDA script to decrypt and patch the encrypted code:

from malduck import *

key = open("./dump", "rb").read()[0x4ba74:0x4ba74+0x20]
encrypted = ida_bytes.get_bytes(0x13a9, 896)
decrypted = xor(key, encrypted)
ida_bytes.patch_bytes(0x13a9, decrypted)

Analyzing the 1st function

The first function is not that complicated, it allocates a 128-byte buffer, then read the flag.txt file’s content into it, then it go into the encryption routine.

The encryption routine has the following steps:

  1. Randomly allocate and read a bunch of distracting data (just like in the second function).
  2. Allocate a 32-byte buffer.
  3. Randomly allocate and read a bunch of distracting data.
  4. Allocate a 16-byte buffer.
  5. Randomly allocate and read a bunch of distracting data.
  6. Read 32 bytes into the 32-byte buffer, and 16 bytes into the 16-byte buffer.
  7. Allocate a 192-byte buffer.
  8. Randomly allocate and read a bunch of distracting data.
  9. Make 3 calls to 3 more functions.

By looking at the 3 functions at the end and using findcrypt plugin from IDA, I immediately recognized that this is the code for AES-CBC (this is because I have already encountered these codes before in the past). The author actually borrowed the exact same AES code from this github repo. That makes it extremely easy, because by comparing with the source, I could conclude that the 32-byte buffer is the key (even though the key is only the first 16 bytes), the 16-byte buffer is the IV, and the 192-byte buffer is the AES_ctx struct. The question then is how to find the key and the IV in a bunch of random trash data.

Finding key, IV and decrypting the flag

By looking at the AES code, I realized that the key and IV are deep copy into AES_ctx:

void AES_init_ctx_iv(struct AES_ctx* ctx, const uint8_t* key, const uint8_t* iv)
{
  KeyExpansion(ctx->RoundKey, key);
  memcpy (ctx->Iv, iv, AES_BLOCKLEN); /* deep copy IV */
}

static void KeyExpansion(uint8_t* RoundKey, const uint8_t* Key)
{
  unsigned i, j, k;
  uint8_t tempa[4]; // Used for the column/row operations
  
  // The first round key is the key itself.
  /* Deep copy key */
  for (i = 0; i < Nk; ++i)
  {
    RoundKey[(i * 4) + 0] = Key[(i * 4) + 0];
    RoundKey[(i * 4) + 1] = Key[(i * 4) + 1];
    RoundKey[(i * 4) + 2] = Key[(i * 4) + 2];
    RoundKey[(i * 4) + 3] = Key[(i * 4) + 3];
  }
  ...
}

Therefore, they will appear twice in the data dump. Using this knowledge, I wrote a small piece of python code to kinda bruteforce to find all the 16-byte sequences that appears more than once in the data dump. I ended up finding 2 such sequences, which is perfect because one of them must be the key, and the other is the IV. I simply tried both case to decrypt the flag (the encrypted flag can easily be found because it is just the first 128 bytes of the data dump, no random data was inserted before it).

from malduck import *

data = open("./dump", "rb").read()[0x5060:0x5b6e0]

enc_flag = data[:0x80]

candidate = b""
key_iv = []
x = 0
while x < len(data) - 0x10:
    candidate = data[x:x+0x10]
    if data.count(candidate) > 1:
        key_iv.append(candidate)
    x += 1

#print(aes.cbc.decrypt(key_iv[0], key_iv[1], enc_flag))
print(aes.cbc.decrypt(key_iv[1], key_iv[0], enc_flag))

The flag is Aero{d37fd6db2f8d562422aaf2a83dc62043}.

Appendix

The script for decrypting the code is decrypt_code.py.

The script for decrypting the flag is decrypt_flag.py.