User Tools

Site Tools


wiki:rev1

Preface

Welcome to reverse engineering! These posts will go over some of the challenges in the 2019 FlareOn CTF by FireEye. They will be beginner focused as much as possible with the intent of showing different concepts through real examples with increasing difficulty. That being said, this is a huge topic with many, many different ways of approaching it, and these tutorials will be far from exhaustive. The best way to get better at RE, is to practice. Cliche, I know, but the reality is that this is not an exact science. Your skills will come from just being more comfortable at doing, and that will only come from grinding out problems that really challenge you and force you to learn.

What is RE?

"Reverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object". That's the definition Wikipedia gives, and it's pretty spot on. In regular contexts, reverse engineering usually refers to physical objects, which is a whole 'nother ballgame, but here it will refer to the breaking down and analysis of software. The entire goal here is to take a piece of ready to execute software and understand how it works on the inside. To do this, we will use various tools, techniques, and insights in order to slowly carve our way into what isn't regularly displayed to the user. And in order to be able to do this effectively, we have to first understand how software goes from the text you write on the screen to a file you can double click on (or execute through the terminal :) ).

The Compiler

The text that turns into your program is called the source code of your program and is of high value to someone reverse engineering it. It is the highest level representation of what your program does, and if you're a responsible programmer, it even contains comments kindly explaining your thought process. If the reverse engineer has that, then that's pretty much half of their job done already. In most cases, however, this is not reality. The source code of a program is rarely released with the published program itself, but you will learn that through some clever techniques, we will a lot of the time be able to reconstruct a pretty damn good approximation. But in most of those cases, it will be a rough approximation. Gibberish variable names, weird-looking program flow, and zero comments. This is why it is absolutely essential that you know how to program with fairly low-level concepts as if it was the back of your hand. The code you get back will be a pain to understand, and if you already have a hard time understanding documented source code, this will be nearly impossible. But I digress. Back to what a compiler does.

At its most basic level, a compiler takes in your source code, does some optimizations because it knows more about speed and less about how shiny and cool code looks than you do, and then spits out some form of low-level instructions for the executor to read. If you're working with a native language, languages such as C, C++, Rust, etc. then the instructions the compiler will spit out are machine instructions. These are the commands that your CPU will directly read in and execute. The fastest, most low-level, possible form of instructions for your system. When your CPU reads them, they are in an array of unreadable bytes to humans, so we usually represent them through what's called assembly language. This is nothing but a text representation of the machine instructions, and in many cases will be the best thing you'll get when reverse engineering something. The problem is that compilers are good at what they do. They destroy your precious source into something as fast as you allow it.

Disregarding languages such as Python and JavaScript where you basically run the source code, there are also languages in the middle of the language tree, like C# and Java. These languages are "compiled", but not to machine instructions. Instead, they are compiled to what's called an immediate representation or immediate language (IL). These instructions are like machine code, but instead of being directly run by the CPU there is what's called a runtime that first reads in your program and before it can be run, spits out some machine code tailored for that system. There are arguments to be made for both sides, native versus not, but for us reverse engineers, the important distinction between the two is how close IL is to the source.

A hello world program in C.

Compiled result to x86 assembly.

A hello world program in C#.

Compiled result to CIL (Common Intermediate Language).

Pulling Source From Nothing

(WIP)

wiki/rev1.txt · Last modified: 2020/03/10 19:41 by abigpickle