As mentioned in the README These are the literate notes I took while following along Zachtronics’ blog post about reverse engineering a binary files.
The post details a game known as “Yoda Stories”, and includes a binary file download to follow along what he does to reverse engineer it.
In this file, I basically follow along with the post, and implement the ideas in Common Lisp, while also paraphrasing both what he describes, as well as the things I guessed and tried to go ahead what I read. (Spoiler alert: That pretty much never worked)
So, turns out the first thing you want to do when reverse engineering a binary data file is to figure out how to “skip” data…
When you first open a file, you can bash your head against every single thing you don’t understand until you get somewhere, but it’s apparently much more fruitful to try and get an overall idea about how to navigate the file itself.
Why can you assume there’s even a way to skip data? Because it sounds like usually people don’t want to scan through a file for delimiters to navigate to portions of a file.
We can use this to our advantage!
So the first thing we likely need to write code for is to handle the data types we’ll be encountering:
;; Data typesSince we’re trying to figure out the overall structure, some basic data structure handling is good enough for our purposes here.
In the case of Yoda Stories, it looks like there’s a large number of sections in the binary file that are delimited with four-letter uppercase strings of text known as “Tags”.
Since we need to read strings, I wrote a function to extract those from a stream in Lisp:
(defun read-string (stream length)
"Reads a string from the passed in stream of the specified length"
(let ((str (loop for i upto (1- length)
collect (read-byte stream))))
(format nil "~{~A~}" (mapcar #'code-char str))))And since headers are 4 characters long:
(defun read-header (stream)
"Reads a section header, a 4-character long string denoting a
section of this binary format."
(read-string stream 4))Now that we can read them in Lisp, where are these tags supposed to be?
You can hunt out individual tags in a hex editor, but chances are that there are ways to skip around the file so you don’t need to specifically scan for things that resemble four subsequent uppercase letters–since that’s unreliable and, like mentioned earlier, inefficient.
So, how do we do this? How to we figure this out. Maybe there are some hints in the first section?
The VERS header is followed by 0002 0000, which we can infer is a version number of some style, with likely each digit opposite sides of the decimal point:
5645 5253 0002 0000 VERS....
So in this case, this is likely version 2.0 of this file format. This is small enough that there weren’t any particular hints to skip things generally, but it can also be assumed to be guaranteed to be this size.
Anyway, since this means we’ll want to read numbers, I wrote this to extract numbers from the stream:
(defun read-number (stream &key (length 4) (endian :little-endian))
"Reads a number from the passed in stream, defaults to 4 bytes and
:little-endian, but those can be changed to :big-endian, or any
number of bytes"
(let* ((nums (loop for i upto (1- length) collect (read-byte stream)))
(nums (if (eq endian :big-endian) (reverse nums) nums)))
(loop for num in nums
and i from 0
summing (ash num (* 8 i)))))This is actually the final version of the function, because I found that most numbers I was interested in were little-endian, and also 4 bytes long…but this first instance involved two big-endian numbers. (As in, the 00 02 is 2; with the numbers not recorded backwards in the binary)
So, with this function:
(defun read-version (stream)
"Reads a version number from the passed in stream"
(+ (read-number stream :length 2 :endian :big-endian)
(/ (read-number stream :length 2 :endian :big-endian) 1000.0)))And that’s it for the version!
Onwards to the next section, and possibly figuring out how to skip it!
This tag sounds a bit like “setup”, but I have no idea what that means in this context, so how does the post say we deal with this?
The answer is to skip it! (Ooh! We’re finally getting to the skipping business)
It suggests to measure the size of the data between here and the next tag, which is something Emacs’ hexl can’t do, so I wrote this function to help with that:
(defun hexl-measure-region ()
"Measure how large the active region is."
(interactive)
(if (region-active-p)
(save-excursion
(let ((point (hexl-current-address)))
(exchange-point-and-mark)
(let ((diff (abs (- point (hexl-current-address)))))
(exchange-point-and-mark)
(message "Range is %d bytes (0x%08x)" diff diff))))
(message "Current address: 0x%08x" (hexl-current-address))))It’s not especially stellar code, but it was good enough to help with my following along with Zachtronic’s stuff. So, feel free to steal it if you want to use hexl to measure parts of binaries!
Anywho, measuring the size of the data revealed a number suspiciously similar to the 4 bytes immediately following the STUP tag.
Apparently this is a common-enough pattern in binary files.
Doing it again with the next tag revealed that the next section could also be skipped like this…there’s a pattern here!
Since our “skipping” is actually going to just collect the data we don’t understand yet into a structure, the next thing we need to do is write a function that will read data and return it, specifically also to progress the file pointer:
(defun read-data (stream length)
"Reads a number of bytes into a list, or until :eof"
(loop for i upto (1- length)
for byte = (read-byte stream nil :eof)
collect byte))With this function, we can finally try and parse a lot of the file with this default style!
That means, we’re gonna start dissecting the format itself!
;; Format dissectionSince we’re working with a default that might branch for a particular type of tag, I felt like pulling out CLOS with the sole purpose of making it dispatch to methods for different headers:
(defgeneric parse-section (type stream)
(:documentation "Reads a section and returns a List describing it"))This means we need to write a dispatcher that converts each tag to a Lisp keyword so each method is easy to specialize with eql definitions:
(defun parse-stream-section (stream)
"Reads a stream header from the passed-in stream, and dispatches it
to the correct reader method."
(parse-section (intern (read-header stream) :keyword) stream))And using this, we can repeatedly call this function on the file stream to progress it:
(defun parse-yodesk (&optional (file "YODESK.DTA"))
"Parses the YODESK.DTA data file"
(with-open-file (stream file :element-type '(unsigned-byte 8))
(loop for parsed = (parse-stream-section stream)
collect parsed
until (eq :endf (first parsed)))))Of course, for this to actually work, we need to write our default parsing method, which does this pattern we saw earlier–where the first 4 bytes are an offset to skip to the next section:
;; Default fallback
(defmethod parse-section (type stream)
"Handling the general section style"
(let* ((length (read-number stream))
(data (read-data stream length)))
(list type :length length :data data)))The idea is to read the length, read that amount of data, shove it into a plist, and return it.
Since we already know that VERS works differently than this default, we can write a specialization for that too:
;; Version
(defmethod parse-section ((type (eql :vers)) stream)
"Handling the version"
(list type :length 4 :data (read-version stream)))Running the parse=yodesk function works nicely, until we hit the ZONE section. This means this will probably be a different CLOS specialization, assuming we can figure out how to skip it… Which turns out to be a little more involved.
Trying to skip through the ZONE section like we usually do jumps us into the middle of nowhere… So what can we see at the starting ZONE?
;; ZonesLooking at the section shows a bunch of IZON tags, and measuring the distance between the ZONE and the first IZON tag doesn’t look like any particularly special number that can be found in the binary data around it…
So what’s our approach here? How do we skip over ZONE?
Turns out, we just ignore it for now! Maybe if we look at the IZON data, more relevant information will present itself. After all IZON sounds like “Internal Zone”, or some other possibly related data–so let’s just do that!
So, we just hardcode a skip for the number of bytes between the ZONE and first IZON, then retry our measuring strategy for the space between the first two IZON sections, which reveals a number that’s very close to what’s recorded a few bytes before the IZON tag itself! Looks like the unknown bytes are already starting to possibly become known!
So, in reverse engineering, close actually means something! And we can use that to just assume that this number is indeed the length data we’re looking for with some offset of its own.
Testing this assumption, turns out if we measure the next IZON, the length number is also off by the same number! Bingo. This looks promising indeed!
So, using this, we can write a function to parse these IZON sections based on what we know (And don’t).
We also factor in the number that the size is off by, which appears to possibly be simply the number of bytes of the other unknown data included for each record:
(defun parse-izone (stream)
"Parses a single IZON section"
(let* ((no-idea (read-number stream :length 2))
(length (read-number stream :length 2))
(no-idea2 (read-number stream))
(header (read-header stream))
(data (read-data stream (- length 6))))
(list (intern header :keyword)
:data (list :no-idea (cons no-idea no-idea2) :data data))))So, something that becomes apparent when running this function on the stream repeatedly is that the number of IZON sections in the file is the same as the number right after the ZONE marker! More unknown bytes down, and we can also tell IZON markers are indeed nested as suspected!
In fact, this is enough to write a CLOS specialization for ZONE that uses our IZON code and skips over them all!
(defmethod parse-section ((type (eql :zone)) stream)
"Handles the \"ZONE\" section of the binary file, which includes a
bunch of nested IZONs"
(let* ((zone-count (read-number stream :length 2))
(izones (loop for i upto (1- zone-count)
collect (parse-izone stream))))
(list type :data izones)))After writing this, if we run our previous parse-yodesk function, it turns out there’s no more complaints! We can finally parse the entire file!
(mapcar #'car (parse-yodesk))The next part of the post mentioned that that one of the major goals was to extract images, and that TILE sounds suspiciously like images.
So, looking at the TILE section, we see lots of bits that seem to make pretty shapes in the hex editor… Which it details that these pieces of data are likely bitmaps!
Time to bust out ZPNG and write some image files!
;; Images
(ql:quickload :zpng)Hunting around, there seems to be no sizes, but it looks like there’s 1024 bytes, followed by 4 bytes of intermission.
I couldn’t make heads or tails of this, but according to the post, all the tiles are 32x32, which seems like the right number of bytes.
That means we can write a thing that reads 1024 bytes of data, and then spits out a PNG:
(defun write-png (tile-data name &key (width 32) (height 32))
"Writes an individual PNG file given tile data and a filename"
(with-open-file (out name :direction :output
:if-exists :supersede
:element-type '(unsigned-byte 8))
(let ((png (make-instance 'zpng:pixel-streamed-png
:width width :height height)))
(zpng:start-png png out)
(loop for pixel in tile-data
do (zpng:write-pixel (list pixel pixel pixel) png))
(zpng:finish-png png))))And then something that parses each chunk, and pushes it through the above function:
(defun write-pngs (tile-data &key (image-size-bytes 1024) (flags 4))
"Parses a tile-data into smaller images and writes them according to
how big the data is in image-size-bytes and flags"
(let ((rec-size (+ image-size-bytes flags)))
(loop for image upto (1- (/ (length tile-data) rec-size))
for start = (+ flags (* rec-size image))
for data = (subseq tile-data start (+ start image-size-bytes))
for filename = (format nil "img/img~A.png" image)
do (write-png data filename))))And the result is…good!
The resulting image files are not at all the right colors (obviously), but the shapes in them are obviously art, especially later on when I see all kinds of greyscale droids and stuff.
But how the heck do we get colors?
According to the post, this likely means that there is a palette somewhere! So I opened the file in GIMP, and used a color picker to find the hex values for a few interesting sprites and looked for them in the binary file…
One caveat I had to be careful with was that the sprites on the blog post seemed to be dithered, so I couldn’t guarantee that the colors were accurate, so I didn’t use that side to look for colors–just the hex values in the files I spat out.
I tried a few different styles of color formats and eventually gave up and kept reading.
Turns out, I had the right idea, but the palette was in the .EXE file, which wasn’t included–for good reasons I imagine. Oops.
I basically just copied the palette from the post, and macro’d it into a Lisp vector:
(defconstant PALLETE
#(#x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
#x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
#x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
#x00 #xFF #xFF #x8B #x00 #xC3 #xCF #x4B #x00 #x8B #xA3 #x1B #x00
#x57 #x77 #x00 #x00 #x8B #xA3 #x1B #x00 #xC3 #xCF #x4B #x00 #xFB
#xFB #xFB #x00 #xEB #xE7 #xE7 #x00 #xDB #xD3 #xD3 #x00 #xCB #xC3
#xC3 #x00 #xBB #xB3 #xB3 #x00 #xAB #xA3 #xA3 #x00 #x9B #x8F #x8F
#x00 #x8B #x7F #x7F #x00 #x7B #x6F #x6F #x00 #x67 #x5B #x5B #x00
#x57 #x4B #x4B #x00 #x47 #x3B #x3B #x00 #x33 #x2B #x2B #x00 #x23
#x1B #x1B #x00 #x13 #x0F #x0F #x00 #x00 #x00 #x00 #x00 #x00 #xC7
#x43 #x00 #x00 #xB7 #x43 #x00 #x00 #xAB #x3F #x00 #x00 #x9F #x3F
#x00 #x00 #x93 #x3F #x00 #x00 #x87 #x3B #x00 #x00 #x7B #x37 #x00
#x00 #x6F #x33 #x00 #x00 #x63 #x33 #x00 #x00 #x53 #x2B #x00 #x00
#x47 #x27 #x00 #x00 #x3B #x23 #x00 #x00 #x2F #x1B #x00 #x00 #x23
#x13 #x00 #x00 #x17 #x0F #x00 #x00 #x0B #x07 #x00 #x4B #x7B #xBB
#x00 #x43 #x73 #xB3 #x00 #x43 #x6B #xAB #x00 #x3B #x63 #xA3 #x00
#x3B #x63 #x9B #x00 #x33 #x5B #x93 #x00 #x33 #x5B #x8B #x00 #x2B
#x53 #x83 #x00 #x2B #x4B #x73 #x00 #x23 #x4B #x6B #x00 #x23 #x43
#x5F #x00 #x1B #x3B #x53 #x00 #x1B #x37 #x47 #x00 #x1B #x33 #x43
#x00 #x13 #x2B #x3B #x00 #x0B #x23 #x2B #x00 #xD7 #xFF #xFF #x00
#xBB #xEF #xEF #x00 #xA3 #xDF #xDF #x00 #x8B #xCF #xCF #x00 #x77
#xC3 #xC3 #x00 #x63 #xB3 #xB3 #x00 #x53 #xA3 #xA3 #x00 #x43 #x93
#x93 #x00 #x33 #x87 #x87 #x00 #x27 #x77 #x77 #x00 #x1B #x67 #x67
#x00 #x13 #x5B #x5B #x00 #x0B #x4B #x4B #x00 #x07 #x3B #x3B #x00
#x00 #x2B #x2B #x00 #x00 #x1F #x1F #x00 #xDB #xEB #xFB #x00 #xD3
#xE3 #xFB #x00 #xC3 #xDB #xFB #x00 #xBB #xD3 #xFB #x00 #xB3 #xCB
#xFB #x00 #xA3 #xC3 #xFB #x00 #x9B #xBB #xFB #x00 #x8F #xB7 #xFB
#x00 #x83 #xB3 #xF7 #x00 #x73 #xA7 #xFB #x00 #x63 #x9B #xFB #x00
#x5B #x93 #xF3 #x00 #x5B #x8B #xEB #x00 #x53 #x8B #xDB #x00 #x53
#x83 #xD3 #x00 #x4B #x7B #xCB #x00 #x9B #xC7 #xFF #x00 #x8F #xB7
#xF7 #x00 #x87 #xB3 #xEF #x00 #x7F #xA7 #xF3 #x00 #x73 #x9F #xEF
#x00 #x53 #x83 #xCF #x00 #x3B #x6B #xB3 #x00 #x2F #x5B #xA3 #x00
#x23 #x4F #x93 #x00 #x1B #x43 #x83 #x00 #x13 #x3B #x77 #x00 #x0B
#x2F #x67 #x00 #x07 #x27 #x57 #x00 #x00 #x1B #x47 #x00 #x00 #x13
#x37 #x00 #x00 #x0F #x2B #x00 #xFB #xFB #xE7 #x00 #xF3 #xF3 #xD3
#x00 #xEB #xE7 #xC7 #x00 #xE3 #xDF #xB7 #x00 #xDB #xD7 #xA7 #x00
#xD3 #xCF #x97 #x00 #xCB #xC7 #x8B #x00 #xC3 #xBB #x7F #x00 #xBB
#xB3 #x73 #x00 #xAF #xA7 #x63 #x00 #x9B #x93 #x47 #x00 #x87 #x7B
#x33 #x00 #x6F #x67 #x1F #x00 #x5B #x53 #x0F #x00 #x47 #x43 #x00
#x00 #x37 #x33 #x00 #x00 #xFF #xF7 #xF7 #x00 #xEF #xDF #xDF #x00
#xDF #xC7 #xC7 #x00 #xCF #xB3 #xB3 #x00 #xBF #x9F #x9F #x00 #xB3
#x8B #x8B #x00 #xA3 #x7B #x7B #x00 #x93 #x6B #x6B #x00 #x83 #x57
#x57 #x00 #x73 #x4B #x4B #x00 #x67 #x3B #x3B #x00 #x57 #x2F #x2F
#x00 #x47 #x27 #x27 #x00 #x37 #x1B #x1B #x00 #x27 #x13 #x13 #x00
#x1B #x0B #x0B #x00 #xF7 #xB3 #x37 #x00 #xE7 #x93 #x07 #x00 #xFB
#x53 #x0B #x00 #xFB #x00 #x00 #x00 #xCB #x00 #x00 #x00 #x9F #x00
#x00 #x00 #x6F #x00 #x00 #x00 #x43 #x00 #x00 #x00 #xBF #xBB #xFB
#x00 #x8F #x8B #xFB #x00 #x5F #x5B #xFB #x00 #x93 #xBB #xFF #x00
#x5F #x97 #xF7 #x00 #x3B #x7B #xEF #x00 #x23 #x63 #xC3 #x00 #x13
#x53 #xB3 #x00 #x00 #x00 #xFF #x00 #x00 #x00 #xEF #x00 #x00 #x00
#xE3 #x00 #x00 #x00 #xD3 #x00 #x00 #x00 #xC3 #x00 #x00 #x00 #xB7
#x00 #x00 #x00 #xA7 #x00 #x00 #x00 #x9B #x00 #x00 #x00 #x8B #x00
#x00 #x00 #x7F #x00 #x00 #x00 #x6F #x00 #x00 #x00 #x63 #x00 #x00
#x00 #x53 #x00 #x00 #x00 #x47 #x00 #x00 #x00 #x37 #x00 #x00 #x00
#x2B #x00 #x00 #xFF #xFF #x00 #x00 #xE3 #xF7 #x00 #x00 #xCF #xF3
#x00 #x00 #xB7 #xEF #x00 #x00 #xA3 #xEB #x00 #x00 #x8B #xE7 #x00
#x00 #x77 #xDF #x00 #x00 #x63 #xDB #x00 #x00 #x4F #xD7 #x00 #x00
#x3F #xD3 #x00 #x00 #x2F #xCF #x00 #x97 #xFF #xFF #x00 #x83 #xDF
#xEF #x00 #x73 #xC3 #xDF #x00 #x5F #xA7 #xCF #x00 #x53 #x8B #xC3
#x00 #x2B #x2B #x00 #x00 #x23 #x23 #x00 #x00 #x1B #x1B #x00 #x00
#x13 #x13 #x00 #x00 #xFF #x0B #x00 #x00 #xFF #x00 #x4B #x00 #xFF
#x00 #xA3 #x00 #xFF #x00 #xFF #x00 #x00 #xFF #x00 #x00 #x00 #x4B
#x00 #x00 #xFF #xFF #x00 #x00 #xFF #x33 #x2F #x00 #x00 #x00 #xFF
#x00 #x00 #x1F #x97 #x00 #xDF #x00 #xFF #x00 #x73 #x00 #x77 #x00
#x6B #x7B #xC3 #x00 #x57 #x57 #xAB #x00 #x57 #x47 #x93 #x00 #x53
#x37 #x7F #x00 #x4F #x27 #x67 #x00 #x47 #x1B #x4F #x00 #x3B #x13
#x3B #x00 #x27 #x77 #x77 #x00 #x23 #x73 #x73 #x00 #x1F #x6F #x6F
#x00 #x1B #x6B #x6B #x00 #x1B #x67 #x67 #x00 #x1B #x6B #x6B #x00
#x1F #x6F #x6F #x00 #x23 #x73 #x73 #x00 #x27 #x77 #x77 #x00 #xFF
#xFF #xEF #x00 #xF7 #xF7 #xDB #x00 #xF3 #xEF #xCB #x00 #xEF #xEB
#xBB #x00 #xF3 #xEF #xCB #x00 #xE7 #x93 #x07 #x00 #xE7 #x97 #x0F
#x00 #xEB #x9F #x17 #x00 #xEF #xA3 #x23 #x00 #xF3 #xAB #x2B #x00
#xF7 #xB3 #x37 #x00 #xEF #xA7 #x27 #x00 #xEB #x9F #x1B #x00 #xE7
#x97 #x0F #x00 #x0B #xCB #xFB #x00 #x0B #xA3 #xFB #x00 #x0B #x73
#xFB #x00 #x0B #x4B #xFB #x00 #x0B #x23 #xFB #x00 #x0B #x73 #xFB
#x00 #x00 #x13 #x93 #x00 #x00 #x0B #xD3 #x00 #x00 #x00 #x00 #x00
#x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
#x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00 #x00
#x00 #x00 #x00 #x00 #x00 #x00 #xFF #xFF #xFF #x00)
"The pallete data for bitmaps (Extraxtcted from the .exe, and not
the data file in any way...)")I shoved this into another file because it looked ugly, and just included it:
(load "palette.lisp")And using that vector, I just basically reimplemented Zachtronic’s C# function that grabs 3 indexes from the vector, and returns them in a list for zpng:
(defun get-color (index)
"Fetches a color list from the pallete"
(let ((r (elt PALLETE (+ 2 (* index 4))))
(g (elt PALLETE (+ 1 (* index 4))))
(b (elt PALLETE (* index 4))))
(list r g b)))Not too much thinking here, except noticing that the colors in this pallete seemed to invert the color values–BGR, instead of RGB.
I then I shoved that into my write-png function:
(defun write-png (tile-data name &key (width 32) (height 32))
"Writes an individual PNG file given tile data and a filename"
(with-open-file (out name :direction :output
:if-exists :supersede
:element-type '(unsigned-byte 8))
(let ((png (make-instance 'zpng:pixel-streamed-png
:width width :height height)))
(zpng:start-png png out)
(loop for pixel in tile-data
do (zpng:write-pixel (get-color pixel) png))
(zpng:finish-png png))))There were also those 4 bytes of information between each image that I didn’t know what to do with, so I just shoved them into the filename:
(defun write-pngs (tile-data &key (image-size-bytes 1024) (flags 4))
"Parses a tile-data into smaller images and writes them according to
how big the data is in image-size-bytes and flags"
(let ((rec-size (+ image-size-bytes flags)))
(loop for image upto (1- (/ (length tile-data) rec-size))
for start = (+ flags (* rec-size image))
for flag = (subseq tile-data (- start flags) start)
for data = (subseq tile-data start (+ start image-size-bytes))
for filename = (format nil "img/img~A flag~A.png" image flag)
do (write-png data filename))))I mostly did this because reversing the meaning of them according to the post required an understanding of the game I didn’t have. But honestly, I could maybe have put them in a table or something based on what you could determine by just looking at the images, since some of them were obviously background images and stuff!
But with this, we have image extraction from the data structure we parsed in the first section!
For it to work, we need to create an ”img” folder for the images to go into though:
mkdir imgAnd to run it on the right part of the structure:
(write-pngs (getf (cdr (assoc :tile (parse-yodesk))) :data))