Practical binary analysis book: CTF writeup for last level 8
  2020-09-21

The CTF challenge

The Capture The Flag challenge offered in the book consists of finding a hidden flag (a string) in a binary, without access to its source code, by using reverse engineering techniques.

Once discovered, the flag unlocks the next levels and so on and so forth.

Only basics tools like a hexeditor, gdb, objdump, nm, readelf, strings will be used, and not more complex tools like IDA, Ghidra or Binary Ninja to be sure to understand the basics first.

Everything has been completed on a Kali Linux VM or on the Linux VM provided by the book author.

This post is my solution for the final level 8.

Level 8

Lorem ipsum

From the previous level, we don’t get started with a binary executable this time, but a 2.5MB text file:

$ ls -lh lvl8
-rw-rw-r-- 1 binary binary 2.6M May  8 21:23 lvl8

$ file lvl8
lvl8: ASCII text, with very long lines

Here are the first few lines:

$ head -n 10 lvl8
lOrem iPsuM doLOr SIT AmEt, conseCTETur adipIscing elit. maecenas eget augue sed leo suscipit ultrICiES sed blandit urna. sed ut risus vitAe Ligula semper scelerisque. fusce et UlTRices telluS, non commodo elit. nullA fACilisi. inteGer pharetra eu massa et ultrIces. nunc dignISsim nisl eu nulla ultricies venenatis. fusce tincidUnT NibH risuS, IN VulputaTe libero congue A. cuRabituR eST diam, lacinia vel placeRat Eget, seMpER nec enim.

proin turpis metus, finibus in porttitor sed, tempor nec ante. nulla ornare volutpat mi, siT AMET VOlUTPAT NUNC. AENEAN VITAE JUSTO IN EX SUSCIPIT PHARETRA. NAM NULLA ERAT, CONGUE eGET QUAM NON, PLACErAT CONSEqUAT RISUs. PROIN QUIS ULTRICeS ODIO. SED AT PHAREtRA QUAM, INTERDUM CoNSECTETuR EROS. SED DOLOR LACUS, VEHICuLA ALIQUeT IACULIs SIT AMET, PRETIUM ViTAE NISI. pHASELLUs AT EX SAGITTIS, ACCUMSAN IPSuM AT, BLANdIT QUAM. MaURIS FERmENTUM VEsTIBULUM vARIUS. PElLENTESQUE CONVALlIS VITAE mI SIT AMEt EUISMOD. dONEC CONsECTETUR tELLUS FAuCIBUS ACcUMSAN EUISMOD. DONeC ULTRICiES RHONCuS ARCU, UT mOLESTIE sEM CONGUe EGET. NULlAM POSUERE LECTUS aC VENENAtIS SEMPEr. PROIN AT nIBH DOLOr.

NULLA PReTIUM MOLeSTIE CONgUE. VIVAMuS EGET LIgULA QUIS tURPIS SAgITTIS RHoNCUS NON uT LECTUS. pHASELLUs id purus ac quam aliquam viverra vitae in massa. donec quis dictum ex. suspendisse ut interdum sem. quisque cursus viverra nisi ac ultrices. in hac habitasse platea dictumst. pellentesque feugiat sodales turpis, nec pretium leo efficitur a.
(...)

It is easy to recognize the Lorem ipsum text, but also the Camel case pattern throughout the text which looks unusual, because not consistent and not following a logic like proper capitalization on start of sentences.

Given the size of the file, and the purpose of the book (binary analysis) I think an executable is somehow hidden into that text, the only catch is to find out how.

As a first try I will suppose that the binary is encoded using the case of the characters. I’ll start with this assumption ignoring all the other characters like punctuation ones, spaces, newlines, …

  • 0 for lowercased characters - abcde…z
  • 1 for uppercased characters - ABCDE…Z

For instance lOrem iPs will be read as 01000010

We know how to recognize an ELF executable by its first header magic bytes which are 7f 45 4c 46 , so before spending too much time I’ll just work on the first 4 bytes of the resulting ’encoding’ to confirm this hypothesis.

This is a quick and dirty python script to read the whole file, applying the aforementioned logic to create new binary data and display the 4 first bytes:

#!/usr/bin/python3
import string

with open('lvl8') as fh:
    buf = ''
    for line in fh.readlines():
        for c in line:
            if c in string.ascii_lowercase:
                buf += '0'
            elif c in string.ascii_uppercase:
                buf += '1'
header = buf[0:32]
n = int(header, 2)
print(n.to_bytes(4, byteorder='big'))

Unfortunately this is not the ELF header:

$ ./l8.py
b'BM\xe8\x1e'

But this BM magic number may already look familiar to you: this is (one of) the BMP file signature! There are a lot of websites referencing a lot of different file signatures, here I went to Wikipedia to confirm my thought.

Bitmap file

Next step is to write to a file the whole interpretation of the given text that can be opened as a BMP image:

#!/usr/bin/python3
import string

with open('lvl8') as fh:
    buf = ''
    for line in fh.readlines():
        for c in line:
            if c in string.ascii_lowercase:
                buf += '0'
            elif c in string.ascii_uppercase:
                buf += '1'
n = int(buf, 2)
length = (n.bit_length()+ 7 ) // 8

with open('level8.bmp', 'wb') as out:
    out.write(n.to_bytes(length, byteorder='big'))

This is the resulting image:

% file level8.bmp
level8.bmp: PC bitmap, Windows 3.x format, 300 x 300 x 24

I got stuck here for a while, after trying to find text within the file, in the metadata, looking up this image online for clues (what is this elf?), manipulate the image in Gimp, and all sort of similar dead ends.

This is while writing this post, and converting the bitmap to a png file that I noticed the size of the file:

$ ls -lh level8.bmp
-rw-rw-r-- 1 binary binary 264K May 21 13:12 level8.bmp

It seems a bit large for a 300x300 bmp file, right? After a new series of tries to patch the bmp header in different ways (width/height, offset to data, …) still stuck.

Then I decided to use ImageMagick to extract as much metadata as possible from this file to hopefully find something odd, with the identify -verbose level8.bmp command:

$ identify -verbose level8.bmp
Image: level8.bmp
  Format: BMP3 (Microsoft Windows bitmap image (V3))
  Class: DirectClass
  Geometry: 300x300+0+0
  Resolution: 28.34x28.34
  Print size: 10.5857x10.5857
  Units: PixelsPerCentimeter
  Type: Palette
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
  Channel statistics:
    Pixels: 90000
    Red:
      min: 0 (0)
      max: 255 (1)
      mean: 173.646 (0.680964)
      standard deviation: 109.84 (0.430743)
      kurtosis: -1.35726
      skewness: -0.702737
    Green:
      min: 0 (0)
      max: 255 (1)
      mean: 179.209 (0.702781)
      standard deviation: 104.566 (0.410062)
      kurtosis: -0.880643
      skewness: -0.950063
    Blue:
      min: 0 (0)
      max: 255 (1)
      mean: 143.475 (0.562649)
      standard deviation: 121.291 (0.475651)
      kurtosis: -1.85896
      skewness: -0.26817
  Image statistics:
    Overall:
      min: 0 (0)
      max: 255 (1)
      mean: 165.444 (0.648798)
      standard deviation: 112.116 (0.439672)
      kurtosis: -1.41505
      skewness: -0.644421
  Colors: 42
  Histogram:
     17643: (  0,  0,  0) #000000 black
       389: (  0,  0,  1) #000001 srgb(0,0,1)
       341: (  0,  1,  0) #000100 srgb(0,1,0)
       269: (  0,  1,  1) #000101 srgb(0,1,1)
       379: (  1,  0,  0) #010000 srgb(1,0,0)
       227: (  1,  0,  1) #010001 srgb(1,0,1)
       259: (  1,  1,  0) #010100 srgb(1,1,0)
       365: (  1,  1,  1) #010101 srgb(1,1,1)
       535: ( 58, 58, 58) #3A3A3A srgb(58,58,58)
        12: ( 58, 58, 59) #3A3A3B srgb(58,58,59)
        10: ( 58, 59, 58) #3A3B3A srgb(58,59,58)
         4: ( 58, 59, 59) #3A3B3B srgb(58,59,59)
        15: ( 59, 58, 58) #3B3A3A srgb(59,58,58)
         1: ( 59, 58, 59) #3B3A3B srgb(59,58,59)
         6: ( 59, 59, 58) #3B3B3A srgb(59,59,58)
      1433: ( 59, 59, 59) #3B3B3B grey23
       720: ( 84,170,  0) #54AA00 srgb(84,170,0)
        27: ( 84,170,  1) #54AA01 srgb(84,170,1)
        19: ( 84,171,  0) #54AB00 srgb(84,171,0)
        20: ( 84,171,  1) #54AB01 srgb(84,171,1)
      8662: ( 85,170,  0) #55AA00 srgb(85,170,0)
        22: ( 85,170,  1) #55AA01 srgb(85,170,1)
        20: ( 85,171,  0) #55AB00 srgb(85,171,0)
        14: ( 85,171,  1) #55AB01 srgb(85,171,1)
      1440: ( 94, 65,  6) #5E4106 srgb(94,65,6)
      3093: (254,254,  0) #FEFE00 srgb(254,254,0)
       169: (254,254,  1) #FEFE01 srgb(254,254,1)
      7157: (254,254,254) #FEFEFE srgb(254,254,254)
       381: (254,254,255) #FEFEFF srgb(254,254,255)
       147: (254,255,  0) #FEFF00 srgb(254,255,0)
       143: (254,255,  1) #FEFF01 srgb(254,255,1)
       357: (254,255,254) #FEFFFE srgb(254,255,254)
       287: (254,255,255) #FEFFFF srgb(254,255,255)
      6336: (255,213,176) #FFD5B0 srgb(255,213,176)
       199: (255,254,  0) #FFFE00 srgb(255,254,0)
       138: (255,254,  1) #FFFE01 srgb(255,254,1)
       441: (255,254,254) #FFFEFE srgb(255,254,254)
       286: (255,254,255) #FFFEFF srgb(255,254,255)
      1023: (255,255,  0) #FFFF00 yellow
       128: (255,255,  1) #FFFF01 srgb(255,255,1)
       281: (255,255,254) #FFFFFE srgb(255,255,254)
     36602: (255,255,255) #FFFFFF white
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Background color: white
  Border color: srgb(223,223,223)
  Matte color: grey74
  Transparent color: black
  Interlace: None
  Intensity: Undefined
  Compose: Over
  Page geometry: 300x300+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Undefined
  Orientation: Undefined
  Properties:
    date:create: 2020-05-25T21:20:38+02:00
    date:modify: 2020-05-25T21:20:38+02:00
    signature: 554206c801b93ce2d7bb94da43bbf49c9c13ce3f1dfcf408a92c32c946fa7cd8
  Artifacts:
    filename: level8.bmp
    verbose: true
  Tainted: False
  Filesize: 270KB
  Number pixels: 90K
  Pixels per second: 0B
  User time: 0.000u
  Elapsed time: 0:01.000
  Version: ImageMagick 6.8.9-9 Q16 x86_64 2017-07-31 http://www.imagemagick.org

The first parameter that looks off given this picture is the number of colors given by the colors histogram, 42 seems way more than the few ones visible on the picture to the naked eye.

I did count less than a dozen when trying to tell them apart with the Gimp color picker tool.

Using Gimp and setting the contrast to the maximum and brightness to the minimum allow us to see that something is weird around the bottom region of the picture compared to the untouched image, like totally jammed:

Which points us to… steganography!

Steganography

Hiding data unsuspiciously in a file is called steganography.

Here we have a simple BMP file that seems to be normal at first glance, however its size seems to indicate that something is off.

A common and simple steganography technique to hide data within an image file is to embed the data to be hidden by slightly modifying the RGB values of the image pixels: overwriting the Least Significant Bit - LSB of the Red, Green and Blue channels with the bit of the data we want to hide.

Extra data has been stored, but the image looks the same. Extracting that data back is as simple as reading all the LSB bits of all the pixels RGB channels.

This is already well explained in many articles on Internet so I won’t write my own flavor here, but one I did find well explained can be found here.

Besides the quick n dirty check with Gimp, there are tools to analyze and possibly spot if such techniques have been used to hide data within a file, here is a neat list of useful stego tools and resources, I chose zsteg for a quick check:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
$ zsteg level8.bmp
[?] 2 bytes of extra data after image end (IEND), offset = 0x41ee6
extradata:0         .. ["\x00" repeated 2 times]
imagedata           .. text: ":::::::::::::::::::::::::::::::;::::::::;;;::;;::::::::::::::::::::::::::::::::::::::::::::::::;::::::::;;;:::::::::::::::::::::::::::::::::::::::::::::::::::::;:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::"
b1,lsb,bY           .. file: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, interpreter *empty*, stripped
b2,msb,bY           .. text: "UUUUU]UUU"
b4,msb,bY           .. text: ["w" repeated 8 times]
b1,rgb,lsb,xY       .. file: DOS 2.0 backup id file, sequence 149
b2,r,msb,xY         .. text: "}UUUUUUUUUUU"
b2,g,msb,xY         .. text: "UUU]UUU}u"
b2,b,msb,xY         .. text: "uUUUUUWUUUUu"
b2,rgb,msb,xY       .. text: "UUUW_UUu]UU"
b3,r,lsb,xY         .. file: very old 16-bit-int big-endian archive
b3,g,msb,xY         .. file: MPEG ADTS, layer I, v2,  96 kbps, Monaural
b4,r,lsb,xY         .. file: dBase III DBT, version number 0, next free block index 4277137151
b4,r,msb,xY         .. text: ["w" repeated 22 times]
b4,g,lsb,xY         .. file: dBase III DBT, version number 0, next free block index 4009688831
b4,g,msb,xY         .. text: ["w" repeated 19 times]
b4,b,lsb,xY         .. file: dBase III DBT, version number 0, next free block index 4025483247
b4,b,msb,xY         .. file: dBase III DBT, version number 0, next free block index 4160225271
b4,rgb,msb,xY       .. file: MPEG ADTS, layer I, v2, Monaural

Check out line 5: looks like we have an ELF executable hidden!

Extract the payload to a file:

$ zsteg -E b1,lsb,bY level8.bmp > new_binary

$ file new_binary
new_binary: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=3f43c1bc1bc2d1dccc12d2fbb1cb83347e8cb3b4, not stripped

$ ./new_binary
$

The extracted ELF binary is valid and can be executed, however it does nothing and will need to be reverse engineered too to get the flag to pursue the challenge.

But before that, for the sake of fully understanding the LSB trick I’ll extract the same payload myself with a Python script, with the help of the Python Image Library - PIL to avoid wasting time with BMP pixels data row padding.

I tried to make it very explicit, simple and well commented for the sake of understanding:

import sys

# Python Pillow library https://python-pillow.org/
from PIL import Image

input_bmp = sys.argv[1]
output_file = sys.argv[2]

print(f"Reading from BMP file '{input_bmp}'")

bmp = Image.open(input_bmp)
width, height = bmp.size
print(f"BMP height:{height} width:{width}")

# LSB bits retrieved from the RGB channels
lsb_bits = []

# Based on https://wiki.python.org/moin/BitManipulation
def setBit(i, offset):
    mask = 1 << (7 - offset)
    return (i| mask)

# Iterates over all the pixels of the image the way the pixels of a BMP image are stored:
# https://en.wikipedia.org/wiki/BMP_file_format#Pixel_storage
# "Usually pixels are stored "bottom-up", starting in the lower left corner, going from left to right,
# and then row by row from the bottom to the top of the image"
for y in range(height-1, 0, -1):
    for x in range(width):
        r, g, b = bmp.getpixel((x, y))
        lsb_r  = (r & 1)
        lsb_g  = (g & 1)
        lsb_b  = (b & 1)
        lsb_bits.append(lsb_b)
        lsb_bits.append(lsb_g)
        lsb_bits.append(lsb_r)

# The byte being built from the LSB bits
byte = 0
bit_index = 0

# The new bytes assembled from the LSB bits - these are the 'hidden' data
new_bytes = []

with open(output_file, 'wb') as out:
    for i in lsb_bits:
        # No need to deal with the bits set to zero as the
        # 'byte' variable is initialized/reset to zero
        if i == 1:
            byte = setBit(byte, bit_index)
        bit_index +=1

        # We have set all the bits of a byte, append to the array to be written to disk
        if bit_index == 8:
            bit_index = 0
            new_bytes.append(byte)
            byte = 0
    # Finally, write to disk all the 'hidden' data bytes
    print(f"Written {len(new_bytes)} bytes to '{output_file}'")
    out.write(bytearray(new_bytes))

Let’s ensure we get out of it the same binary as with zsteg:

$ python3 lvl8.py level8.bmp new_binary_python

Reading from BMP file 'level8.bmp'
BMP height:300 width:300
Written 33637 bytes to 'new_binary_python'
$ sha1sum new_binary
d53ea88677bb59b1f4186fc635ca29ab96216cdc  new_binary

$ sha1sum new_binary_python
d53ea88677bb59b1f4186fc635ca29ab96216cdc  new_binary_python

Pfew! the binaries extracted from zsteg and our python script are the same, let’s move on to the reversing of that new binary.

Note: While there are a lot of steganography tools likes zsteg to extract/analyze hidden data in files, I heavily recommend to get your hands dirty with the low level stuff at least once. I personally understood a lot of things while dealing with python and an hex editor.

Another ELF binary

Overview and static analysis

Back to the extracted binary reverse engineering. From the previous file invocation we know it’s a dynamically linked executable.

strace doesn’t reveal anything worth of interest:

$ strace ./new_binary
execve("./new_binary", ["./new_binary"], 0x7ffc4341d030 /* 30 vars */) = 0
brk(NULL)                               = 0x968000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87790, ...}) = 0
mmap(NULL, 87790, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3c77988000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0n\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1839792, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3c77986000
mmap(NULL, 1852680, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3c777c1000
mprotect(0x7f3c777e6000, 1662976, PROT_NONE) = 0
mmap(0x7f3c777e6000, 1355776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f3c777e6000
mmap(0x7f3c77931000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x170000) = 0x7f3c77931000
mmap(0x7f3c7797c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7f3c7797c000
mmap(0x7f3c77982000, 13576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3c77982000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f3c77987540) = 0
mprotect(0x7f3c7797c000, 12288, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f3c779c8000, 4096, PROT_READ) = 0
munmap(0x7f3c77988000, 87790)           = 0
brk(NULL)                               = 0x968000
brk(0x98b000)                           = 0x98b000
mprotect(0x969000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x969000, 4096, PROT_EXEC)     = 0
exit_group(0)                           = ?
+++ exited with 0 +++

With ltrace :

$ ltrace ./new_binary
__libc_start_main(0x4005d6, 1, 0x7ffd1c95a0b8, 0x4006b0 <unfinished ...>
memalign(4096, 4096, 0x7ffd1c95a0c8, 0x7fb34ecb3718)                                        = 0x1089000
mprotect(0x1089000, 4096, 3, 0)                                                             = 0
memcpy(0x1089000, "\234\235\236\232\233t\315\314\314\314s\315\314\314\314\204A\371\324\314\314\314\204G\331\376\314\314\314\303\311\223"..., 83) = 0x1089000
mprotect(0x1089000, 4096, 4, 82)                                                            = 0
+++ exited (status 0) +++

Besides the usual library calls invoked when a ELF binary is executed, 2 stand out in that context:

  • memalign The obsolete function memalign() allocates size bytes and returns a pointer to the allocated memory
  • memcpy The memcpy() function copies n bytes from memory area src to memory area dest

Confirmed by the file relocation section, lines 10 and 11:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ readelf -r ./new_binary

Relocation section '.rela.dyn' at offset 0x3e0 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000600ff8  000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0

Relocation section '.rela.plt' at offset 0x3f8 contains 4 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000601018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
000000601020  000300000007 R_X86_64_JUMP_SLO 0000000000000000 memcpy@GLIBC_2.14 + 0
000000601028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 memalign@GLIBC_2.2.5 + 0
000000601030  000500000007 R_X86_64_JUMP_SLO 0000000000000000 mprotect@GLIBC_2.2.5 + 0

No low-hanging fruits (i.e a string or any other clue) in the .rodata section of the binary:

$ objdump -sj .rodata new_binary

new_binary:     file format elf64-x86-64

Contents of section .rodata:
 400730 01000200                             ....

Before moving on to dynamic analysis with gdb , we’ll try to quickly get the gist of the program still with static analysis of the disassembly code obtained from objdump

All the assembly snippets below come from the output of objdump -d -Mintel ./new_binary

Start to follow the code flow with the entry point of the program with readelf :

$ readelf -h ./new_binary_steg |grep Entry
  Entry point address:               0x4004e0

Entry point is 0x4004e0 which points, into the .text section to the _start function on line3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Disassembly of section .text:

  4004e0 <_start>:
  4004e0: 31 ed                 xor    ebp,ebp
  4004e2: 49 89 d1              mov    r9,rdx
  4004e5: 5e                    pop    rsi
  4004e6: 48 89 e2              mov    rdx,rsp
  4004e9: 48 83 e4 f0           and    rsp,0xfffffffffffffff0
  4004ed: 50                    push   rax
  4004ee: 54                    push   rsp
  4004ef: 49 c7 c0 20 07 40 00  mov    r8,0x400720
  4004f6: 48 c7 c1 b0 06 40 00  mov    rcx,0x4006b0
  4004fd: 48 c7 c7 d6 05 40 00  mov    rdi,0x4005d6
  400504: e8 87 ff ff ff        call   400490 <__libc_start_main@plt>
  400509: f4                    hlt
  40050a: 66 0f 1f 44 00 00     nop    WORD PTR [rax+rax*1+0x0]

On line 13, the first parameter for __libc_start_main() which lies in the rdi register (see x86 calling convention) is a pointer to the main function of the program, here it is 0x4005d6

Note:There is very well documented page explaining all the low level details on how ELF binary get loaded on Linux.

Below is the main function. Some interesting things can be noticed:

  • there is an interesting symbol named flag_bin_len - see # 6010b4 <flag_bin_len>

    • unfortunately its address will only by known at runtime because relative to rip
  • line 9 to 27

    • allocate 4096 bytes of memory (edi register - 0x1000) with memalign()

    • make the allocated memory readable/writable with mprotect() 0x3 parameter stored in edx being PROT_READ | PROD_WRITE

    • then copy something into that memory chunk with memcp()

  • line 29 to 44

    • there is a xor based loop starting on 0x400642 and branching on 0x400673 jb 400642 if edx != eax i.e. cmp edx,eax This suggests that something was obfuscated through the XOR cipher and is now decrypted, 4 bytes at a time.
  • line 45 and beyond

    • another mprotect() call with the flag PROD_EXEC to make the previously allocated memory range executable
    • the call rdx would lead us to think what has been decrypted previously was code, and is going to be executed, but that needs to be verified.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
  4005d6 <main>:
  4005d6: 55                    push   rbp
  4005d7: 48 89 e5              mov    rbp,rsp
  4005da: 48 83 ec 30           sub    rsp,0x30
  4005de: 89 7d dc              mov    DWORD PTR [rbp-0x24],edi
  4005e1: 48 89 75 d0           mov    QWORD PTR [rbp-0x30],rsi
  4005e5: be 00 10 00 00        mov    esi,0x1000
  4005ea: bf 00 10 00 00        mov    edi,0x1000
  4005ef: e8 bc fe ff ff        call   4004b0 <memalign@plt>
  4005f4: 48 89 45 f0           mov    QWORD PTR [rbp-0x10],rax
  4005f8: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  4005fc: ba 03 00 00 00        mov    edx,0x3
  400601: be 00 10 00 00        mov    esi,0x1000
  400606: 48 89 c7              mov    rdi,rax
  400609: e8 b2 fe ff ff        call   4004c0 <mprotect@plt>
  40060e: 8b 05 a0 0a 20 00     mov    eax,DWORD PTR [rip+0x200aa0]        # 6010b4 <flag_bin_len>
  400614: 89 c2                 mov    edx,eax
  400616: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  40061a: be 60 10 60 00        mov    esi,0x601060
  40061f: 48 89 c7              mov    rdi,rax
  400622: e8 79 fe ff ff        call   4004a0 <memcpy@plt>
  400627: 8b 05 87 0a 20 00     mov    eax,DWORD PTR [rip+0x200a87]        # 6010b4 <flag_bin_len>
  40062d: 89 c2                 mov    edx,eax
  40062f: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  400633: 48 01 d0              add    rax,rdx
  400636: c6 00 c3              mov    BYTE PTR [rax],0xc3
  400639: c7 45 ec 00 00 00 00  mov    DWORD PTR [rbp-0x14],0x0
  400640: eb 26                 jmp    400668 <main+0x92>
  400642: 8b 45 ec              mov    eax,DWORD PTR [rbp-0x14]
  400645: 48 63 d0              movsxd rdx,eax
  400648: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  40064c: 48 01 d0              add    rax,rdx
  40064f: 8b 55 ec              mov    edx,DWORD PTR [rbp-0x14]
  400652: 48 63 ca              movsxd rcx,edx
  400655: 48 8b 55 f0           mov    rdx,QWORD PTR [rbp-0x10]
  400659: 48 01 ca              add    rdx,rcx
  40065c: 0f b6 12              movzx  edx,BYTE PTR [rdx]
  40065f: 83 f2 cc              xor    edx,0xffffffcc
  400662: 88 10                 mov    BYTE PTR [rax],dl
  400664: 83 45 ec 01           add    DWORD PTR [rbp-0x14],0x1
  400668: 8b 55 ec              mov    edx,DWORD PTR [rbp-0x14]
  40066b: 8b 05 43 0a 20 00     mov    eax,DWORD PTR [rip+0x200a43]        # 6010b4 <flag_bin_len>
  400671: 39 c2                 cmp    edx,eax
  400673: 72 cd                 jb     400642 <main+0x6c>
  400675: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  400679: ba 04 00 00 00        mov    edx,0x4
  40067e: be 00 10 00 00        mov    esi,0x1000
  400683: 48 89 c7              mov    rdi,rax
  400686: e8 35 fe ff ff        call   4004c0 <mprotect@plt>
  40068b: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  40068f: 8b 15 1f 0a 20 00     mov    edx,DWORD PTR [rip+0x200a1f]        # 6010b4 <flag_bin_len>
  400695: 89 d2                 mov    edx,edx
  400697: 48 01 d0              add    rax,rdx
  40069a: 48 89 45 f8           mov    QWORD PTR [rbp-0x8],rax
  40069e: 48 8b 55 f8           mov    rdx,QWORD PTR [rbp-0x8]
  4006a2: b8 00 00 00 00        mov    eax,0x0
  4006a7: ff d2                 call   rdx
  4006a9: b8 00 00 00 00        mov    eax,0x0
  4006ae: c9                    leave
  4006af: c3                    ret

Time to switch to dynamic analysis with gdb to verify these assumptions!

Dynamic analysis with gdb

$ gdb -q --args ./new_binary
Reading symbols from ./new_binary...
(No debugging symbols found in ./new_binary)

(gdb) set disassembly-flavor intel

(gdb) b main
Breakpoint 1 at 0x4005da

(gdb) r
Starting program: /home/loic/shared/new_binary

Breakpoint 1, 0x00000000004005da in main ()
(gdb)

What has been done above:

  • Fire gdb on the binary gdb -q –args ./new_binary

  • Set personal preferences on the way the assembly will be displayed set disassembly-flavor intel

  • Set a breakpoint on main() b main

  • Start the program r

I won’t dump here again the contents of the main function, see above for reference or type x/50i main in gdb to print the 50 first instructions of main

Just a quick thing about what happen at runtime to flag_bin_len we noticed before: its memory address is 0x6010b4 and we can retrieve its value 0x53 – 4 bytes long as shown with DWORD PTR

(gdb) x/3i 0x400609
   0x400609 <main+51>:  call   0x4004c0 <mprotect@plt>
   0x40060e <main+56>:  mov    eax,DWORD PTR [rip+0x200aa0]        # 0x6010b4 <flag_bin_len>
   0x400614 <main+62>:  mov    edx,eax

(gdb) x/4bx 0x6010b4
   0x6010b4 <flag_bin_len>: 0x53  0x00  0x00  0x00

How did the memory address 0x6010b4 get computed by gdb?

It’s [rip+0x200aa0] and rip points to the next instruction to be executed which is 0x400614 , so that’s 0x200aa0 + 0x400614 = 0x6010b4

The memcpy call on 0x400622 copies something into the newly allocated chunk of memory if you follow what happen to rax right after the memalign call.

From its manpage and the x86 calling convention we know what are the parameters and the corresponding registers holding them:

  • rdi: void *dest
  • rsi: void *src
  • rdx: size_t n

Put a breakpoint just before to examine these registers:

(gdb) b *0x400622
Breakpoint 4 at 0x400622
(gdb) c
Continuing.

Breakpoint 4, 0x0000000000400622 in main ()

(gdb) p/x $rdi
$10 = 0x603000

(gdb) p/x $rsi
$11 = 0x601060

(gdb) p/x $rdx
$12 = 0x53

memcpy will copy 0x53 bytes (rdx ) from 0x601060 (rsi ) to 0x603000 (rdi ) we can examine this data, which is 0x53 bytes long:

(gdb) x/53x 0x601060
0x601060 <flag_bin>:  0x9c  0x9d  0x9e  0x9a  0x9b  0x74  0xcd  0xcc
0x601068 <flag_bin+8>:  0xcc  0xcc  0x73  0xcd  0xcc  0xcc  0xcc  0x84
0x601070 <flag_bin+16>: 0x41  0xf9  0xd4  0xcc  0xcc  0xcc  0x84  0x47
0x601078 <flag_bin+24>: 0xd9  0xfe  0xcc  0xcc  0xcc  0xc3  0xc9  0x93
0x601080 <flag_bin+32>: 0x92  0x96  0x95  0x94  0x74  0xf0  0xcc  0xcc
0x601088 <flag_bin+40>: 0xcc  0x84  0xfd  0x33  0xc3  0xc9  0xfe  0xfe
0x601090 <flag_bin+48>: 0xff  0xf9  0xad  0xfa  0xae

There is also at this address a symbol aptly named flag_bin :) looks like we are coming closer to the goal.

Now the rough xor loop outlined before starts to make sense:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  400642: 8b 45 ec              mov    eax,DWORD PTR [rbp-0x14]
  400645: 48 63 d0              movsxd rdx,eax
  400648: 48 8b 45 f0           mov    rax,QWORD PTR [rbp-0x10]
  40064c: 48 01 d0              add    rax,rdx
  40064f: 8b 55 ec              mov    edx,DWORD PTR [rbp-0x14]
  400652: 48 63 ca              movsxd rcx,edx
  400655: 48 8b 55 f0           mov    rdx,QWORD PTR [rbp-0x10]
  400659: 48 01 ca              add    rdx,rcx
  40065c: 0f b6 12              movzx  edx,BYTE PTR [rdx]
  40065f: 83 f2 cc              xor    edx,0xffffffcc
  400662: 88 10                 mov    BYTE PTR [rax],dl
  400664: 83 45 ec 01           add    DWORD PTR [rbp-0x14],0x1
  400668: 8b 55 ec              mov    edx,DWORD PTR [rbp-0x14]
  40066b: 8b 05 43 0a 20 00     mov    eax,DWORD PTR [rip+0x200a43]        # 6010b4 <flag_bin_len>
  400671: 39 c2                 cmp    edx,eax
  400673: 72 cd                 jb     400642 <main+0x6c>

The end condition of the loop is once all the 0x53 bytes (test lines 14-15) have been xor’ed (flag_bin_len) with 0xffffffcc in line 10, then the jb 400642 <main+0x6c> jump won’t be taken and continue below.

Quickly showing what happen to the 4 first bytes of data after a few iterations of this xor loop, by:

  • setting a breakpoint right before the exit condition test of the loop
  • dumping memory where the result of that operation is stored - remember we got the allocated memory address earlier
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
(gdb) x/2i 0x400671
   0x400671 <main+155>: cmp    edx,eax         ; breakpoint here to examine what happened
   0x400673 <main+157>: jb     0x400642 <main+108>

(gdb) b *0x400671
Breakpoint 4 at 0x400671
(gdb) c
Continuing.


Breakpoint 6, 0x0000000000400671 in main ()
(gdb) x/4bx 0x603000
0x603000: 0x9c  0x9d  0x9e  0x9a
(gdb) c
Continuing.

Breakpoint 6, 0x0000000000400671 in main ()
(gdb) x/4bx 0x603000
0x603000: 0x50  0x9d  0x9e  0x9a
(gdb) c
Continuing.

Breakpoint 6, 0x0000000000400671 in main ()
(gdb) x/4bx 0x603000
0x603000: 0x50  0x51  0x9e  0x9a
(gdb) c
Continuing.

Breakpoint 6, 0x0000000000400671 in main ()
(gdb) x/4bx 0x603000
0x603000: 0x50  0x51  0x52  0x9a
(gdb) c
Continuing.

Breakpoint 6, 0x0000000000400671 in main ()
(gdb) x/4bx 0x603000
0x603000: 0x50  0x51  0x52  0x56

You can see the initial values 0x9c 0x9d 0x9e 0x9a being xor’ed to 0x50 0x51 0x52 0x56 iteration after iteration, same applies to all the other 0x53 bytes.

Since these values seems to fit within the ASCII characters space, try to print them:

(gdb) x/4bc 0x603000
0x603000: 80 ‘P’ 81 ‘Q’ 82 ‘R’ 86 ‘V’

Definitively printable characters!

However this way of dumping the memory is not very practical, so I will be creating first this handy gdb macro to dump that area memory like xxd does:

1
2
3
4
5
6
7
(gdb) define xxd
Redefine command "xxd"? (y or n) y
Type commands for definition of "xxd".
End with a line saying just "end".
>dump binary memory dump.bin $arg0 $arg0+$arg1
>shell xxd dump.bin
>end

Set the breakpoint right after the jb conditional jump, where supposedly the xor loop has been completed and dump the related working memory range:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) x/3i 0x400671
   0x400671 <main+155>: cmp    edx,eax
   0x400673 <main+157>: jb     0x400642 <main+108>
=> 0x400675 <main+159>: mov    rax,QWORD PTR [rbp-0x10]


(gdb) b *0x400675
Breakpoint 7 at 0x400675
(gdb) c
Continuing.

Breakpoint 7, 0x0000000000400675 in main ()
(gdb) 

(gdb) xxd 0x603000 0x53
00000000: 5051 5256 57b8 0100 0000 bf01 0000 0048  PQRVW..........H
00000010: 8d35 1800 0000 488b 1532 0000 000f 055f  .5....H..2....._
00000020: 5e5a 5958 b83c 0000 0048 31ff 0f05 3232  ^ZYX.<...H1...22
00000030: 3335 6136 6232 3132 3334 3034 3436 3966  35a6b2123404469f
00000040: 3461 6263 6537 3162 3164 6432 3966 0020  4abce71b1dd29f.
00000050: 0000 00

Looks like we got something looking like a flag on lines 19-20! Trying it out:

1
2
3
4
5
$ ./oracle 2235a6b2123404469f4abce71b1dd29f
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
| Level 8 completed, unlocked reward.tar.gz |
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
Run oracle with -h to show a hint

That’s it, level completed! I won’t spoil the reward though :)

Even we have found the flag, I still wonder what are the instruction after the jb doing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
   0x40066b <main+149>: mov    eax,DWORD PTR [rip+0x200a43]        # 0x6010b4 <flag_bin_len>
   0x400671 <main+155>: cmp    edx,eax
   0x400673 <main+157>: jb     0x400642 <main+108>
   0x400675 <main+159>: mov    rax,QWORD PTR [rbp-0x10]
   0x400679 <main+163>: mov    edx,0x4
   0x40067e <main+168>: mov    esi,0x1000
   0x400683 <main+173>: mov    rdi,rax
   0x400686 <main+176>: call   0x4004c0 <mprotect@plt>
   0x40068b <main+181>: mov    rax,QWORD PTR [rbp-0x10]
   0x40068f <main+185>: mov    edx,DWORD PTR [rip+0x200a1f]        # 0x6010b4 <flag_bin_len>
   0x400695 <main+191>: mov    edx,edx
   0x400697 <main+193>: add    rax,rdx
   0x40069a <main+196>: mov    QWORD PTR [rbp-0x8],rax
   0x40069e <main+200>: mov    rdx,QWORD PTR [rbp-0x8]
   0x4006a2 <main+204>: mov    eax,0x0
   0x4006a7 <main+209>: call   rdx
   0x4006a9 <main+211>: mov    eax,0x0
   0x4006ae <main+216>: leave
   0x4006af <main+217>: ret

It looks code to be executed because:

  • lines 5 and 8
    • mprotect is called with PROT_EXEC to set as executable the memory range where is stored the flag + previous bytes
  • line 16
    • call code pointed by the rdx register

To investigate, from the previous running gdb session I’ll put a breakpoint on this instruction to inspect the value of the rdx register and dissasemble the 10 instructions from this address.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
(gdb) b *0x4006a7
Breakpoint 2 at 0x4006a7
(gdb) c
Continuing.

Breakpoint 2, 0x00000000004006a7 in main ()

(gdb) i r $rdx
rdx            0x603053            6303827

(gdb) x/10i 0x603053
   0x603053:  ret
   0x603054:  add    BYTE PTR [rax],al
   0x603056:  add    BYTE PTR [rax],al
   0x603058:  add    BYTE PTR [rax],al
   0x60305a:  add    BYTE PTR [rax],al
   0x60305c:  add    BYTE PTR [rax],al
   0x60305e:  add    BYTE PTR [rax],al
   0x603060:  add    BYTE PTR [rax],al
   0x603062:  add    BYTE PTR [rax],al
   0x603064:  add    BYTE PTR [rax],al

The rdx register has the value 0x603053 , and a disassembly of the instructions at this address show ret as first instruction, so basically it does nothing, just returning to the main function where it sets eax to 0 as an exit code.

As this is still a mystery, I’ll reach out to the book author and post updates here if any.

Feedback

Constructive criticism always welcome! (comments here, contact in About ) as I would be more than happy to learn more!