Custom shellcode encoder/decoder with Intel x86
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: SLAE-975
Assignment number: #4
Github repository: https://github.com/amonsec/SLAE/tree/master/assignment-4
This post is part of my SLAE series.
You can find the previous post at this address: https://amonsec.net/training/linux-assembly-x86/2018/egghunter-with-intel-x86
The usage of an encoder is really useful in order to bypass anti virus protections or for more specifics cases to avoid bad characters. Tons of different encoder and their decoder can be found around the Web. The aim of this post is not to create a leet haxx0r encoder to bypass most advanced anti virus or IDS (Intrusion Detection System) probably because I don’t have the knowledge to do so, but to understand the concept, to understand what is an encoder, what is a decoder and how to create your owns.
- Linux distribution, in my case Ubuntu 10.04 LTS (x86);
- Python 2.7 installed;
- Nasm installed and
- A cup of coffee (we always need a good coffee)
First, we need to encode our shellcode. In my case I decided to use a basic shellcode that spawn a /bin/sh shell.
For this post we will create a basic ROT-n XOR encoder that will successively for each bytes add a given number and then execute a XOR operation with the previous shellcode’s byte. Note, we need to add an extra byte before our shellcode in order to execute the first XOR operation with something.
The key variable represent the number used for the ROT operation, the opcode array is used to store the encoded shellcode. asm variable is here only for string creation and the index is used later in the bytearray loop.
Note, we add our extra byte, here 0x66 for the first XOR operation in the opcode array.
At this point, we can use the byte variable to execute our XOR operation with the previous byte of our shellcode and after that, we push our encoded byte into our opcode array.
Note, in Python scripting language, the ^ character is used for XOR operation.
For readability purposes, we create a sweet string that contain our final opcode, in order to only have to copy/paste the output into our assembly program.
It’s a loop, so, the script will repeat the process for each bytes, independently of the length of our shellcode. Finally, our encoder look like this:
Now, let’s run this basic python script to get our final encoded shellcode.
JMP CALL POP, wot?
First, before starting to write our assembly decoder let me introduce you a really useful technique called JMP CALL POP. When we work with strings we can store in a variable our string and directly call this variable in our program, but we have a dependence and that make fail the program.
Fortunately for us, we can implement the JMP CALL POP technique in our decoder. First we will jump to a procedure, in our case, it’s a short JMP because we have less than 127 bytes between our current EIP value and the called procedure, here starter.
After that, we use the CALL instruction and the usefulness of this technique begins here. When the CALL instruction is used, first, the next offset is pushed into the stack and then the program jump to the given procedure. With real words, we store the shellcode string into the stack and then we jump for the procedure, in our case decoder.
Finally, we use the POP instruction that will store to the given register the highest offset in the stack, in our case the ESI register will point to the beginning of our shellcode.
To conclude, this technique is used because:
- That’s avoid hard-coded address because we don’t know where our shellcode is located in memory and
- That’s allow the program to dynamically figure out the address of our shellcode
The skeleton of our decoder program will look like this:
First, let’s initialise registers with our needs:
Here, the ECX register will be used later with our loop instruction, so, we move the length of our shellcode into ECX in order to parse each byte of our shellcode. Then, we move the first value (0x66) into the EAX register for the first XOR instruction and, finally, we move into the EBX register our key (0x2a) for the rotation.
We can create a new procedure, in our case called decode that will will first execute the XOR instruction.
The EAX register contains the xored value, so we need to change the current encoded value of the ESI register with our new xored value and then, rotate the ESI value with our predefined key stored in the EBX register:
We can change the value of the EAX register for the next XOR instruction with our freshly decoded byte, increment the ESI register in order to get the next shellcode byte and, finally, loop to the decode procedure until the ECX register is equal to zero.
Note, the LOOP instruction jump to the given procedure and decrements by one the ECX register, that’s why the value of the ECX register is equal to the length of our shellcode.
After the decoding process, the program jump into the decoded shellcode in order to execute it.
Our decoder assembly program should look like this:
Assemble the pieces
At this point, we can create a simple C program to execute our shellcode:
Let’s compile the program:
Finally, we execute it