Malware Analysis Series Article 1 1638935075
Malware Analysis Series Article 1 1638935075
com
1. Introduction
Welcome to the MAS (Malware Analysis Series). Being very honest, in the last four years it was a quite
difficult stopping my research job for writing an article as well as would have impossible writing a series of
articles, but I think it’s feasible now and let’s try it. Just to give an example, last time I wrote a superficial
article was in 2017 and, certainly, I didn’t even remember until a colleague talked about it recently.
The goal is to produce a series of articles on malware analysis and explain since simple malware binaries up
to most complex ones, covering a large list of topics such as unpacking, API resolving, C2 extraction, C2
emulation and, of course, reverse engineering in addition to some dynamic analysis and, maybe, use few
de-obfuscation techniques. When it’s necessary, I’ll cover other topics such as COM (Component Object
Model), cryptography, IDC/IDA Python and everything it necessary to help readers to have a better
comprehension of analysis.
Furthermore, I will also write short articles covering topics such as malicious documents (on this time I’ve
already release one: https://exploitreversing.com/2021/11/02/malicious-document-analysis-example-1/),
programming, de-obfuscation, operating system internals and so on. Nonetheless, the main focus will be
MAS (Malware Analysis Series).
As malware analysis produces extensive articles, so I’ll break them up in parts 1, 2, and so on, when it will
be necessary.
Every article will be published on my new blog (using a different font) and, at beginning of each post,
there will be a PDF version of that article.
During this series of articles, I’m going to use several tools and try to point where you can get them to
make things simpler for you.
I am not going to propose only hard samples because, in my humble opinion, this kind of approach
wouldn’t help anyone (mainly professionals that aim to learn to something) and, at end, it would be only a
waste of time (and an useless show-off). Therefore, we’ll analyze different samples, each one with a
distinguished level of difficulty, and discuss some lines of code. As I mentioned previously, the strategy is to
break up an article in different parts if it’s necessary to avoid turning the reading so exhausting.
2. Lab Setup
Explaining about the lab setup, I usually analyze all samples using one or more of the following systems:
1|Page
https://exploitreversing.com
▪ Windows 7, Windows 8.1 or Windows 10: If you need a Windows 10 virtual machine, Microsoft
continue offering one with expiration time on this website: https://developer.microsoft.com/en-
us/windows/downloads/virtual-machines/
▪ Labeless: it’s a quite recommended plugin that provides two key-features for reversers:
▪ Label, function name, global variable and comment synchronization between x64dbg and
IDA Pro.
▪ Dynamic dumping of regions from memory for a debugged process, which will be useful, for
example, for dumping the binary after its task API resolving and/or string decoding.
▪ Labeless plugin can be downloaded from: https://github.com/a1ext/labeless/releases
▪ DbgChild: it provides an automatic detection of child processes created by the debugged process
and automatically attaches a new instance of x64dbg to if the the main process forks a new
process, so saving your time in many opportunities. DbgChild is available from:
https://github.com/David-Reguera-Garcia-Dreg/DbgChild
Other useful plugins exist, but let’s wait for the appropriate moment to talk about them. On time: many
available plugins don’t have been regularly kept by their authors and maintainers, so they could not work
anytime. Be careful!
2|Page
https://exploitreversing.com
All remaining tools will be shown during our analysis and many future articles.
Last, but not least, this article (and all the following ones) certainly will have mistakes that will be fixed and
I’ll release new PDF versions reflecting all fixes.
b. Network Analysis: it is a quite useful resource (pcap files, for example) to understand and detect
non-authorized communication (C2 – command and control channel) through traffic analysis and
makes artifact gathering (for example, binary files, malicious documents and Cobalt Strike beacons).
c. Filesystem/Disk Analysis: the last frontier of any investigation, where we can analyze and detect
side effects of a adversary invasion, breaches, frauds, leaks and, of course, malware infections.
Once again, all of them are very important and must be used in all real world investigation. However, let’s
return to the key point: why should you learn about reverse engineering and, in special, malware analysis?
Simple: through malware analysis you have the opportunity to learn from the source of the evil about
intentions and objectives of the the adversary and not only its effects. In other words, you can learn
techniques, tricks, evasion strategies and, if you’re lucky, you’ll can collect important artifacts to make the
correct attribution (most of the time, it’s a hard task) and, who knows, help to arrest the bad guys.
Therefore, before starting any reversing task, we should remember that there’re many questions that we
should to consider and ask to ourselves:
▪ Is the binary packed? If it’s, so malware is using a well-known packer or a custom one?
▪ What’s the networking communication technique/API set being used by malware? From available
techniques such as Winsock2, Wininet, COM (Component Object Model) or something in a lower
level such as WSK (Winsock Kernel) or even custom implemented technique, which is being used?
▪ Is there any code injection or hooking technique being used? Which one?
▪ What are the anti-forensic techniques used? Is there any anti-debugging technique? Anti-
disassembly? Anti-VM/Sandbox?
▪ Is there any API/DLL encoding?
▪ Are strings encrypted?
▪ What synchronization primitives are being used by the malware? Sometimes they hide important
anti-debugging techniques.
3|Page
https://exploitreversing.com
[Figure 1]
According to the Figure 1, we have some important information:
4|Page
https://exploitreversing.com
[Figure 2]
There’re other tasks id related to this sample, but let’s to focus on the first one only. Details about the first
task id can be shown by executing the following command (the output was truncated):
[Figure 3]
Of course, we could obtain additional information about the sample, but it’s enough for now because we
already have possible C2 (URLs).
Our next step is to find out whether this sample is packed (as most of the malware threats) or not.
Furthermore, if it’s packed, so we will learn how to unpack it using a debugger like x64dbg.
Nonetheless, let’s review few concepts about unpacking malware on Windows systems.
5|Page
https://exploitreversing.com
6|Page
https://exploitreversing.com
zeroed, but replaced by jump instructions using RVA to the same import address, as well known as
“IAT obfuscation”.
▪ As used in shellcodes and common malware, API names are hashed.
▪ The translation from native register to virtualized register is usually one-to-one, but not always.
Furthermore, there is a context switch component that is responsible for transferring registers and
flag information into the virtual machine context.
▪ Virtual machines handlers come from data blocks.
▪ Many native APIs are redirected to stub code that forwards the call.
▪ Obfuscation techniques such as constant unfolding, pattern-based obfuscation, control
indirection, inline functions, code duplication and mainly opaque predicate are used.
Before and during the unpacking task, there’re many observations and questions that we could think
about:
▪ Is the malware really packed?
▪ What are the evidences of having a packed code?
▪ Does the malware perform self-injection or remote injection?
▪ Does the malware perform self-overwriting?
▪ Where is the payload being written?
▪ How the payload is going to be executed?
▪ What are evidenced of having an unpacked code after the unpacking procedure?
▪ Are there additional packed layers?
The first point of the list above rises a key question: how do we know whether a malware is really
packed?
There isn’t an easy and definitive answer to this question, but eventually a set of two or more evidences
could indicate that sample is packed:
▪ The binary sample has few imported DLLs and functions.
▪ There are many obfuscated strings.
▪ Existence of specific system calls.
▪ Non-standard section names.
▪ Non-common executable binary sections (only .text/.code section should be executable)
▪ Unexpected writable sections.
▪ High entropy sections (usually above 7.0, but not always – this is a weak indicator).
▪ Substantial difference between the raw size and the virtual size of a section.
▪ Zero-sized sections.
▪ Missing APIs related to network communication.
▪ Lack of essential APIs for the malware functionalities (Crypt* functions in a ransomware, for
example).
▪ Unusual file format and headers.
▪ Entry-point pointing to other section than .text/.code section.
▪ Significant size of resource section (.rsrc section) followed by LoadResource( ) function in the code.
▪ Presence of an overlay.
▪ Opening it up on IDA Pro and observing a big amount of data or unexplored code on colored bar.
7|Page
https://exploitreversing.com
It’s very relevant and suitable to highlight one point: the occurrence of only one characteristic from the
list above doesn’t determine that the malware is packed. Thus, it’s quite important to consider two or
more of them. Furthermore, there are further observations to be considered:
▪ Most samples resolve dynamically their APIs using LoadLibrary( ) followed by GetProcAddress( ),
for example (except on reflective code injection cases).
▪ Network APIs also could be dynamically resolved.
▪ Malformed headers might be a bit difficult to detect at the first analysis.
▪ Big resource section might not be relevant because it might contain only GUI artifacts and digital
certificates.
▪ There might be a mix of encrypted/obfuscated strings and plain text strings, so making a bit
harder to decide whether the binary is or not packed.
The unpacking procedure using a debugger might bring a list of challenges to be understood and
bypassed:
▪ Anti-debugging techniques (time checking, CPUID, heap checking, debugging flag checking,
NtSetInformationThread( ), and so on), so it’s recommended to use an anti-debugger plugin such
as ScyllaHide (https://github.com/x64dbg/ScyllaHide) on x64dbg/x32dbg or even StrongOD on
OllyDbg (there’re some repositories containing OllyDbg and all associated useful plugins already
built-in. Use Google for finding them).
▪ Anti-VM tricks checking for VMware, VirtualBox, Hyper-V and Qemu artifacts, for example.
▪ Filename, hostname and account checking (avoid using the hash as filename).
▪ Available disk size on virtual machine (it’s recommended 100 GB, at least)
▪ Number of processors on the testing virtual machine (two or more would be suitable)
▪ Uptime (try to keep a virtual machine snapshot with uptime above 20 minutes).
▪ Many non-sense calls (result is not used any longer) and non-existing APIs (fake APIs).
▪ Exception handlers being used as anti-debugging technique.
▪ Software breakpoints being cleared and registers (DR#) being manipulated (anti-breakpoint
techniques)
▪ Hash functions using typical algorithms (for example, crc32, conti, add_ror13,…) being used.
▪ Malicious code checking for well-known tools such Process Hacker, Process Explorer, Process
Monitor and so on (it’s recommended to rename these executable binaries before using them).
Unfortunately, anti-VM tricks and anti-debugger techniques cannot be always handled by plugins and we
will have to manage to bypass them using the debugger. In this case, we have an interesting possibility of
using a different debugger like WinDbg to manage some malware threats expecting for ring 3 debuggers
only and not kernel debuggers (a recent case is the GuLoader malware).
Even during or after unpacking procedures, we could need to fix the resulting binary because one or more
of following issues:
▪ The DOS/PE header could have been destroyed on the memory or modified by a compress library.
▪ In many cases, when you extract a binary from memory, you need to clean it up because there’s
some garbage before its DOS header (MZ signature) and PE header.
8|Page
https://exploitreversing.com
▪ The unpacked binary might have its Import table destroyed due to the fact it has been dumped and
its address refers to virtual addresses (mapped version instead of unmapped version), so showing
unaligned sections or none section.
▪ It could be hard to determine the OEP (Original Entry Point) , which usually appears after a
transition from unpacker code using an indirect call (call [eax] or jmp [eax], for example).
Additionally, existence of non-resolved APIs could be an evidence of the malicious code hasn’t
reached the OEP yet. On time: OEP is the entry point (EP) of an executable before it being
packed. After it has been packed, a new EP is associated to packer itself.
▪ Mutexes being used as a kind of “unlock key” between two unpacking layers. In this case, the
second stage of unpacking doesn’t happen without the first stage has happened, and if it’s
happened, so the mutex existence is confirmed.
▪ The first stage of unpacked code doesn’t run from any directory, but only from a specific one.
▪ You can have extracted a decoy binary. In many real cases, malware authors packs one or more
useless executables as decoy to consume time of the analyst. Thus, it would be wise not believe
you’ve unpacked the correct binary from memory at first attempt.
This list of issues is very limited and there’re endless other possible side effects on unpacked binaries. Of
course, distinct solutions for each one of these presented issues exist and they will be explained and given
examples in the next articles of this series. Anyway, few approaches for handling some issues are:
▪ Copy a good PE header from another executable (or from the own malware sample) and align
sections considering whether the unpacked binary is unmapped (.text section usually starts on
0x400) or mapped (.text section usually starts on 0x1000).
▪ Align sections of an unpacked binary (mapped addressing) by fixing its respective Raw Address
and Raw size. This action usually fixes the Import Table and makes possible to visualize imported
functions without any issues. Pay attention to possible “traps”: some unpacked binaries don’t
show its Import Table until you’ve aligned their sections. However, other malware threats don’t
have any function in the Import Table even after you having unpacked the binary, so it doesn’t
mean you made any mistake, but it does that the malware resolve all its APIs dynamically.
▪ Reconstruct the IAT and forcing the OEP (Original Entry Point).
9|Page
https://exploitreversing.com
▪ If you’re facing problems in finding the OEP, so remember that OEP likely comes after the IAT has
been resolved. In this case, one of possible approaches would be to check whether IAT is already
resolved (check for Intermodular Calls on x64dbg or OllyDbg) or setting a breakpoint on a critical
API that would be executed during a key operation of malware (CryptoAcquireContext( ) in
ransomware threats, for example) because certainly IAT will be resolved when execution reaches
these critical APIs . Afterwards, the suggestion is looking for unconditional jumps to specific
memory addresses or even indirect calls (call [eax], for example). Another interesting approach
would be using the graphical visualization of a debugger (“g” on x64dbg) and check for these
transition points (indirect calls or unconditional jumps for memory addresses) at the last “code
blocks”. Finally, a specialized tool might help you to find out the OEP. As you’ve noticed, there isn’t
a single approach to do this.
▪ Adjust the base address to match with the segment’s base address dumped from memory.
▪ In two-stage unpacking cases, the first unpacked binary might be a DLL. Therefore, depending on
the context, it might be useful to convert the DLL binary to executable, and there’re many ways to
accomplish this task, but my favorite method is editing the PE header to alter the Characteristics
field and make the entry of the exported function as an entry-point.
To visualize, handle and fix most of issues after unpacking a binary you can use the following well-know
tools such as:
▪ PEBear is an excellent tool written by Aleksandra Doniec (a.k.a Hasherezade) that’s used to
visualize details of a PE Header and fix many binary issues. You can download this tool from:
https://github.com/hasherezade/pe-bear-releases
▪ Pestudio is a great tool written by Marc Ochsenmeier and it’s mainly used to triage and collect
different information of a potential malware. The tool (free and paid versions) are available here:
https://www.winitor.com/features
▪ CFF Explorer, which makes part of Explorer Suite, it’s an well-known PE Editor that is used to
visualize and fix PE headers. The Explorer Suite can be downloaded from:
https://ntcore.com/?page_id=388
▪ pe_unmapper is another tool written by Aleksandra Doniec (a.k.a Hasherezade) that can be used
for converting a PE binary from mapped version to unmapped version, so fixing all PE alignment
issues. This tool can be downloaded from:
https://github.com/hasherezade/libpeconv/tree/master/pe_unmapper
10 | P a g e
https://exploitreversing.com
▪ Scylla is an amazing x86/x64 Import Reconstructor that is already embedded in x64dbg. If you need
the standalone version, so you can download it from: https://github.com/NtQuery/Scylla
▪ HxD is an excellent hex-editor that we could be used, for example, to check and fix PE headers
manually. It can be downloaded from: https://mh-nexus.de/en/hxd/
▪ XVI32 Hex Editor is another interesting hex-editor that is great to clean up dumped memory
regions to isolate the unpacked binary. XVI32 Hex Editor can be downloaded from:
http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm
Once again, remember that unpacking is only the first obstacle during malware analysis, and many other
hard challenges such as string de-obfuscation, API resolving, C2 configuration extraction, C2 emulation and
other topics are also in out list. This project will cover several unpacking situations and all of these
mentioned tasks in the next articles.
Now we have a minimal knowledge about unpacking process, issues and solutions, it’s time to review
different code injection techniques, which could help you to have a better comprehension about
unpacking.
11 | P a g e
https://exploitreversing.com
▪ PE Injection: in this technique a malicious code is written and, consequently, forced to be executed
in a remote process or even in the own process (self-injection). Main related APIs: OpenThread( ),
SuspendThread( ), VirtualAllocEx( ), WriteProcessMemory( ), SetThreatContext( ) and
ResumeThreat( ) | NtResumeThread( ).
▪ Reflective Injection: this technique is similar to PE Injection, but the malicious code avoid using
LoadLibrary( ) and CreateRemoteThread( ), for example. There’re many interesting derivations of
this method and, one of them (also used on Cobalt Strike) is accomplished by the following APIs:
CreateFileMapping( ), Nt/MapViewOfFile( ), OpenProcess( ), memcpy( ) and
Nt/MapViewOfSection( ). At end, code on remote process can be executed by calling
OpenProcess( ), CreateThread( ), NtQueueApcThread( ), CreateRemoteThread( ) or
RtlCreateUserThread( ). It’s interesting to note that a variant could use VirtualQueryEx( ) and
ReadProcessMemory( ) too.
▪ APC Injection: this code injection technique allows a program to execute code in a specific thread
by attaching to an APC queue. The injected code will be executed by the thread when it exits of
alertable state (originated by calls such as SleepEx( ), SignalObjectAndWait( ),
MsgWaitForMultipleObjectsEx( ), WaitForMultipleObjectsEx( ), or WaitForSingleObjectEx( )).
Therefore, it’s common to also see APIs such as CreateToolhelp32Snapshot(), Process32First( ),
Process32Next( ), Thread32First( ), Thread32Next( ), QueueUserAPC( ) and KeInitializeAPC( )
involved into this technique.
▪ Hollowing or Process Replacement: this technique, in a nutshell, is used by the malware to “drain
out” the entire content of a process and insert into it a malicious content. Some involved APIs are
CreateProcess( ), NtQueryProcessInformation( ), GetModuleHandle( ),
Zw/NtUnmapViewOfSection( ), VirtualAllocEx( ), WriteProcessMemory( ), GetThreadContext( ),
SetThreadContext ( ) and ResumeThread( ).
▪ AtomBombing: this technique is an a variant of the previous technique (APC injection) and works
by splitting the malicious payload into separated strings, creating an Atom to each given string,
copying them into a RW segment (using GlobalGetAtomName( ) and NtQueueApcThread( )) and
setting the context by using NtSetContextThread( ). Therefore, a list of further APIs are
OpenThread( ), GlobalAddAtom( ), GlobalGetAtomName( ) and QueueUserAPC( ).
12 | P a g e
https://exploitreversing.com
▪ Process Herpaderping: this technique is similar to Process Doppelgänging, but there’s a subtle
difference in its procedure. Process Herpaderping is based on that fact that security defenses
usually monitor process creation by registering a callback routine on the kernel side using
PsSetCreateProcessNotifyRoutineEx( ) or during driver’s DispatchCleanup routines
(IRP_MJ_CLEANUP), which it is invoked after a thread being created. That’s the key issue: if the an
adversary create and map a process and, afterwards, this adversary is able to modify the file image
and then create the thread, so security products are able to detect such a malicious payload.
Nonetheless, this checking order can be comprised whether the adversary is able to create
malicious binary on disk, open a handle to it, map it as an image section using NtCreateSection
function (and including the SEC_IMAGE flag), create a process using the section handle
(NtCreateProcesEx()), modify the file content to not sounds like malicious and create a thread
(NtCreateThreadEx()) using this “good image”. That the point: when the thread is created, the
process callback is triggered and the content of the file (good one) on disk is checked, so security
defenses believes that everything is fine because image on disk is not harmful, but the true
malicious is on memory. In other words, security defenses could not be effective to detect such
image on disk that is different from image on memory. Few APIs used for this technique:
CreateFile( ), NtCreateSection( ), NtCreateProcessEx( ) and NtCreateThreadEx( ).
▪ Hooking Injection: to use this technique, we will see that functions involved with hooking activities
such as SetWindowsHookEx( ) and PostThreadMessage( ) are used to inject a malicious DLL.
▪ Extra Windows Memory Injection: using this technique, malware threats injects code into the a
process by using the Extra Windows Memory (as known as EWM), whose size is up to 40 bytes and
it’s appended the instance of a class during the registration of windows classes. The trick is that the
appended spaced is enough to store a pointer that might forward the execution to a malicious
code. Some possible APIs involved to this technique are FindWindowsA( ),
GetWindowThreadProcessId( ), OpenProcess( ), VirtualAllocEx( ), WriteProcessMemory( ),
SetWindowLongPtrA( ) and SendNotify( ).
▪ Propagate Injection: this technique has been used by malware threats such as RIG Exploit Kit and
Smoke Loader to inject malicious code into explorer.exe process (medium integrity level) and other
persistent ones, and it’s based on the approach of enumerating (EnumWindows( ) →
EnumWindowsProc → EnumChildWindows( ) → EnumChildWindowsProc → EnumProps( ) →
EnumPropsProc → GetProp) windows implementing SetWindowsSubclass( ) (this further
information on https://docs.microsoft.com/en-us/windows/win32/api/commctrl/nf-commctrl-
setwindowsubclass). As you could remember, this function install a windows subclass callback and,
as you know, callbacks are interpreted as hooking methods in the security world. How does it
works? Once subclassed windows are found (checking UxSubclassInfo and/or CC32SubclassInfo,
which provide the subclass header), it’s possible to preserve the old windows procedure, but we
can also assign a new one to the window by updating CallArray field. When an event to the target
process is sent then the new procedure is called and, afterwards, the old one is also called (keeping
13 | P a g e
https://exploitreversing.com
the previous and expected behavior). Therefore, a malware inserts a malicious payload (shellcode)
into the memory and updates subclass procedure using SetPropA( ). When this new property is
invoked (through a windows message) , the execution is forwarded to the payload. Some Windows
APIs involved to this technique are FindWindow( ), FindWindowEx( ), GetProp( ),
GetWindowThreadProcessId( ), OpenProcess( ), ReadProcessMemory( ), VirtualAllocEx( ),
WriteProcessMemory( ), SetProp( ) and PostMessage( ).
This short and quick review about code injection techniques will be useful to understand how malware try
to keep undetected and also indirectly will help you to understand unpacking techniques.
A quite usual example of a code injection sequence from malware threats is shown below (a decompiled
output from IDA Pro) and, certainly, you’ll be able to identify the technique used through information
presented previously in this section:
[Figure 4]
14 | P a g e
https://exploitreversing.com
7. Unpacking Methods
It’s quite complicated to classify and, mainly, describe unpacking techniques, but in a general way there’re
few methods to unpack a malware sample such as using a debugger, an automated tool, a web service or
even writing its own unpacking code to accomplish the task statically. The chosen methods depends on
specific contexts and situations.
▪ As mentioned previously, it’s recommended to use an anti-debugging plugin and, in few cases, to
ignore all exceptions from 0x00000000 to 0xFFFFFFFF range (on x64dbg, go to Options →
Preferences → Exceptions to include this range).
▪ Sometimes ignoring exceptions could be a bad idea because malware could be them to call the
unpacking procedure. Additionally (and out of the context in this article) there are threats that use
interruptions and exceptions to call APIs.
▪ Learning about all listed APIs and their respective arguments by using MSDN is a key knowledge to
unpack malware threats successfully.
15 | P a g e
https://exploitreversing.com
▪ If you’re using VirtualAlloc( ), it’s recommended to setup the breakpoint on its exit point (ret 10).
Additionally, sometimes it is easier to follow the allocated content on dump by setting a write
memory breakpoint.
▪ In some cases, the malware extracts its payload onto memory, but it destroys the PE Header, so
you’ll have reconstruct the entire header, though it’s simple procedure using a hex editor like HxD.
▪ The extracted payload might be in mapped or unmapped format. If it’s in mapped format, so
probably the Import table is messed up and you need to fix them by realigning sections headers
manually through PEBear (favorite method) or using a tool like pe_unmapper. You might need to
fix the base address and the entry point whether it’s zeroed.
▪ To reconstruct a destroyed IAT it’s recommended to use Scylla (embedded on x64dbg). It will be
necessary to enter the OEP and one of methods to find it is by looking for code transitions given by
instructions such as jmp eax, call eax, call [eax], and so on.
▪ Few unpacked malware samples don’t have any function in the IAT, so there’re two possibilities:
either sections are misaligned (mapped version) or the unpacked malware resolves all its
functions dynamically.
▪ Using the “g” hotkey on x64dbg might be useful for visualizing the code in blocks and finding
possible transitions to OEP.
▪ Another good alternative to find OEP is through code instrumentation like PIN
(https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-
instrumentation-tool.html).
▪ In many opportunities, the unpacked code could be only the first stage of a malware, so it’s
necessary to repeat steps to unpack the next stages.
▪ Few malware sample perform self-overwriting, so you could have to set a breakpoint on the .text
section to detect the unpacked binary execution.
▪ Depending on the extracted binary (a shellcode, for example), it might not be able to run out of a
specific process context, so it’d be necessary to inject it into a running process (for example,
explorer.exe) to perform further analysis.
▪ How can you check whether the extracted malware might be the final one? There isn’t a definitive
answer and few indications might be found by looking for network functions from DLLs such as
WS2_32.dll (Winsock) and Wininet.dll, plain text strings, crypto functions (mainly whether
16 | P a g e
https://exploitreversing.com
malware is an ransomware), and many other evidences. It’s a good approach to open up the
extracted code on IDA Pro mainly after having re-aligned sections and/or reconstructed the IAT.
Her tools has a similar approach to each other, so you should run the malware in an isolated virtual
machine and execute the appropriate command, which I show some syntax examples below that can be
used for a quick approach, though all of these tools contain useful options and it’s worth to check them:
The unpacked binaries with some additional information are saved into a directory created by the tool.
d. Process Hacker
Another trivial (and limited) way to extract binaries from memory is through Process Hacker by double-
clicking on the running process, going to “Memory” tab, looking for interesting regions/base addresses
(RWX), double-clicking it and pressing “Save” button. Of course, it’s easier finding the malicious
binary/payload in case of self-injection. In case of remote injection you’ll need to reverse the malware to
understand the target process to be inject or make an “educated guess” and look for the injected code on
well-known targets like explorer.exe or svchost.exe, for example. Once again, it’s a limited and simple
approach, but sometimes can save time.
e. Using an public/paid Internet service
17 | P a g e
https://exploitreversing.com
You can use an Internet service as the amazing Unpacme (https://www.unpac.me/#/), which offers an
automated unpacking service. There’re a free and public plan (10 submissions per month) and other paid
plans that are quite interesting for researchers and companies. Furthermore, it offers an API set to
interface your customized application with the Unpacme service (https://api.unpac.me/).
Although this approach sounds being time consuming, it’s quite usual writing Python code to accomplish
unpacking mainly in shellcode cases or while handling a case which a malware threads use several anti-vm
and anti-debugging techniques. In addition, we have an advantage to automate the unpacking process
while handling similar malware cases.
Checking it using PEBear could be useful to collect first valuable information from malware:
[Figure 5]
As we can see, there isn’t any DLL (and functions) at IAT directly related to network communication,
crypto or something really interesting. Therefore, it’s a first indication that our sample might be packed.
18 | P a g e
https://exploitreversing.com
As this malware sample is a DLL, so we can learn about its exported functions because we’ll use them to
run the malware on the x64dbg/x32dbg:
[Figure 6]
Using pestudio tool we are able to figure out a meaning difference between raw size and virtual size on
.data section, which is also marked as “writable”. It’s an additional indication that our sample might be
packed, as we expected.
[Figure 7]
19 | P a g e
https://exploitreversing.com
Our malware sample has five exported functions and, without analyzing it on IDA Pro, it’s hard to guess
what they really do. However, we could try the first one named “Callrun” that, apparently, it’s a good bet.
Therefore, using x32dbg (this binary is 32-bit), we can run it by open the rundll32.exe and changing the
command line (File → Change Command Line) to "C:\Windows\SysWOW64\rundll32.exe"
C:\Users\Administrador\Desktop\sample_1.bin,#1.
After changing it, restart/reload the debugging session and set up the following breakpoints on few
classical functions (you can do it using CTRL+G or even the x32dbg command line interface) after you have
reached the entry point:
Run (F9) and, after hitting the first breakpoint (on VirtualAlloc( )), right click on EAX and pick “Follow In
Dump”. The second hit will be in the same region and on the third hit you can right-click on the EAX and
choose “Follow in Dump 2”. If you wanted (not necessary here), you might go to the “Dump 2” tab, select
the first four hex bytes, right click → Breakpoint → Memory, Write → Singleshot (it could be useful in
case where malware take a long time to write on that region). If everything goes right (I hope) you’ll see an
image similar to the following one:
[Figure 8]
20 | P a g e
https://exploitreversing.com
As you can see, first characters on ASCII representation are “M8Z”, which suggests it’s using aPLib
compression. However, if you dump this region, you’ll find out the unpacked binary at the same region.
Therefore, right-click on bytes from the “Dump 2” tab and choose “Follow in Memory Map” option:
[Figure 9]
On the gray-highlighted base address, right click it and choose “Dump Memory to File” to save the
memory region on the Desktop. For now, keep the debugger opened (you could need to fix an eventual
destroyed IAT), and open the dumped file on the XVI Editor. Press CTRL-F and look for the string “This
program” (pay attention: this search works in this case because the PE Header wasn’t destroyed):
[Figure 10]
21 | P a g e
https://exploitreversing.com
Once you found it, you should look for “MZ” characters on two or three lines before. Now, put the cursor
at byte before the “MZ” mark, go to Edit → Delete to cursor as shown in the next couple of figures:
[Figure 11]
[Figure 12]
Save it, open it on PEBear and go to Imports tab as shown below:
[Figure 13]
22 | P a g e
https://exploitreversing.com
As you can confirm, the IAT is perfect, there is one DLL related to network communication (WININET.dll)
and you also can see a slightly different Entry Point (EP). Therefore, it seems being our first stage unpacked
(you should always assume that might have further packed stages and to repeat the same analysis unless
you have an external information from other source or report). At this point, you can close the x32dbg
because we won’t need it anymore.
Now we have the unpacked binary, so let’s open it up in IDA Pro. There’re many ways to find encrypted
configuration, but certainly one of easier (and a bit inaccurate) is by looking for functions manipulating
Data or Unexplored areas of color bars on IDA Pro (actually, the unexplored are is much bigger than the
shown below):
[Figure 14]
Clicking on the start of the Unexplored area above, you’ll see the following code:
[Figure 15]
It’s quite interesting to notice that we see cross references to specific addresses in this region:
23 | P a g e
https://exploitreversing.com
From my experience analyzing malware, I already know that pbData is an important argument to a couple
of APIs from Microsoft Crypto APIs, so it’s an indicative that we are in the right track. Another pattern
found in many well-known malware sample is the structure key + encrypted data, so even I don’t have any
further indication about this case, I could suppose that “pbData” is some key (length of 8 bytes), though
sometimes it isn’t the final key because malicious threats use KDF (Key Derivation Functions) to generate a
definitive key from the provided password. Following our analysis, the “unk10004018” would be referring
to the potentially encrypted data.
Although we’re going well in the analysis, I mentioned previously this kind of “reverse thinking” is not
precise because we don’t actually know what the nature of the encrypted data that is stored at this
address location. It would be better to analyze the malware and, from important references and functions
that take these data, so we could have a better idea and context about data type involved here.
Following the cross-reference on pbData (X hotkey) we get to the subroutine sub_10001CB7. From this
point, we clearly see our data (pbData) is being pushed onto the stack as argument to subroutine
sub_10002131, as shown below:
[Figure 16]
Observing other arguments and based on my previous experience, I know that other arguments being
passed to subroutine sub_10002131 are also involved with Cryptography. Please, we should also see that
dwBytes (from esi register, which received the value of 0x2000) is being used as an argument to
subroutine sub_100011A4, which only performs buffer allocation to receive a content, as shown below:
24 | P a g e
https://exploitreversing.com
[Figure 17]
Analyzing the subroutine sub_10002131 (a bit long) in several parts, we have the following code:
[Figure 18]
25 | P a g e
https://exploitreversing.com
[Figure 19]
This block contains three APIs and we have to analyze them one by one:
a. CryptCreateHash( ):
This function creates a CSP hash object and only one of its arguments, Algid, is quite interesting
and it set to 8004h.
Searching on https://docs.microsoft.com/en-us/windows/win32/seccrypto/alg-id we are able to
learn that 0x8004h means CALG_SHA1, so this malware is manipulating SHA1 hash (160 bits / 20
bytes). The return of this function is a handle to the CSP hash object (saved into the phHash
argument, which was initially zeroed) that is going to be used in the next function.
26 | P a g e
https://exploitreversing.com
b. CryptHashData( ):
This function adds data (given by pbData argument) of an specific size (given by dwDataLen
argument) into the hash object returned by CryptCreateHash( ). Both arguments are the third and
fourth arguments of subroutine sub_10002131 (Figure 18). Following them, these arguments
comes from subroutine sub_10001CB7 (Figure 16), which shows that the size is 8 bytes and refers
to the possible key (pbData). At this point, CryptHashData( ) function is ingesting the possible key
(8 bytes) into the hash object, so generating a SHA1 hash. In other words, the malware’s code is
generating a hash output from an entry and the returned handle from CryptCreateHash( ) refers to
the CSP hash object, which holds this hashed data.
c. CryptDeriveKey( ):
This function generates session keys derived from a given seed data value and, according to its
definition, it guarantees that same sessions key will be generated using the same base data, so it’s
completely suitable for our case because we’re decrypting a configuration data. There are two
interesting arguments here: a. Algid is 0x6801 and dwFlags is 0x280011, and we need to discuss
about them.
The second argument (Algid == 0x6801) identifies the symmetric encryption algorithm and
according to the documentation (https://docs.microsoft.com/en-us/windows/win32/seccrypto/alg-
id), 0x6801 means CALG_RC4.
The fourth argument (dwFlags == 0x280011) deserves further details. According to documentation
(https://docs.microsoft.com/en-us/windows/win32/api/wincrypt/nf-wincrypt-cryptderivekey):
“The key size, representing the length of the key modulus in bits, is set with the upper 16 bits of this
parameter. Thus, if a 128-bit RC4 session key is to be generated, the value 0x00800000 is combined
with any other dwFlags predefined value with a bitwise-OR operation”. Thus, we know the RC4 key
has 40 bits (0x28), so it has 5 bytes. Another part from MSDN documentation tells that “The lower
16 bits of this parameter can be zero or you can specify one or more of the following flags by using
the bitwise-OR operator to combine them”. Taking the possible values from MSDN page and
searching for them on wincrypt.h file (ReacOS provides us a sample:
https://doxygen.reactos.org/d7/d4a/wincrypt_8h_source.html) we found that 0x11 means
CRYPT_NO_SALT + CRYPT_EXPORTABLE (both self-explaining).
Therefore, it’s really interesting to figure out that the malware’s author is using SHA1 as a KDF (Key
Derivation Function) to generate the final decryption key, which can be explained by the following
sequence:
27 | P a g e
https://exploitreversing.com
Of course, there’re much better KDF such as Bcrypt, Scrypt and Argon2 that might be used in real
applications, but we’re sure that malware’s author wasn’t concerned to aspects of using or not a
KDF resistant to FPGA, ASIC and GPU attacks.
[Figure 20]
d. CryptDecrypt( )
This function is responsible for decrypting encrypted data by using a provided handle to the key
(hKey). Other arguments provided are hHash (handle to the hash object resulting from
CryptCreateHash( )), Final (value equal to 1 because is the last and unique section being
decrypted), pbData (buffer containing the data to be decrypted) and pbwDataLen (pointer to a
DWORD that indicates the length of the pbData buffer that, in this case, is 0x2000).
Remember that pbwDataLen (0x2000) is the second argument of the function being analyzed (
sub_10002131(BYTE *rc4_encrypted_data, DWORD pdwDataLen, BYTE *pbData, DWORD
dwDataLen) ). Therefore, the final result (RC4 decrypted data) is saved into the pbData and the
respective size is saved into pdwDataLen argument.
Another interesting point is that the pbData here is NOT the 8-byte key, but the encrypted data
coming from unk_10004018 data reference that was transferred to dword_10006264 memory
defined array within the subroutine sub_10001214, which I’ve renamed its arguments (“N”
shortcut), as shown in the next figure:
28 | P a g e
https://exploitreversing.com
[Figure 21]
The remaining functions (CryptDestroyHash, CryptDestroyKey and CryptReleaseContext) have clear
meanings and we don’t need to explain them here.
Finally we have all necessary information that we are going to use in the next step to write a configuration
extractor and decryptor:
▪ initial key: C58B00157F8E9288 (after first 16 bytes of .data section)
▪ initial data address: 0x10004018
▪ data size: 0x2000
▪ hash algorithm: SHA1 (20 bytes)
▪ decryption algorithm: RC4
▪ RC4 key size: 5 bytes
Let’s proceed to the next section and write a C2 data configuration extractor and decryptor.
29 | P a g e
https://exploitreversing.com
[Figure 22]
Let’s try to explain line by line of the code, but not at the exact order of the Python 3 script:
a. Lines 1, 2, 3 and 4 perform necessary imports because code is manipulating a PE file and this
Python 3 script aims to decrypt data involved with RC4 and SHA1 algorithms, as well as it’s needed
to handle with transformations from binary to ascii, so binascii package is required too.
b. On lines 21, 22, 23 and 24 we start the main( ) definition and the code asks for user to enter the
filename of an unpacked Hancitor binary which the encrypted data is going to extracted from. The
extracted data is stored into the datasec variable as bytes.
c. The extract_data( ) (lines 6 to 13) doesn’t have anything new, but we should to highlight three
points: i.) our target section is the “.data”, where are stored the key and encrypted data (check
Figure 15 ); ii.) we are concerned to Unicode formatting and that’s the reason of being using
decode(encoding=’utf-8’); iii.) the code is stripping possible ‘0x00’ from section names (common
while handling Unicode names); iv.) we’re using PE properties to delimit where the .data section
starts and ends.
30 | P a g e
https://exploitreversing.com
d. Line 25 (datasec2 = datasec[16:]) defines datasec2 variable that contains key + all encrypted data
except the first 16 bytes, which is likely related to a campaign ID or something similar.
e. Line 26 (key = (datasec2[:8])) defines the key variable that, as we learned previously, is composed
by the first 8 bytes of datasec2.
e. Line 29 (true_key = hashed_key[:10]) collects only the first 10 hexadecimal (5 bytes) according to
learned from CryptDeriveKey( ) explanation on page 27.
g. The data_decryptor( ) (lines 15 to 19) is quite simple and basically decrypts the encrypted data
through RC4 algorithm using the given key (true_key variable from line 29).
h. Finally, on line 32 (print(c2_config.decode('utf-8'))) the result is sent to terminal. Once again, you
should note that before printing the decrypted data we need to decode it assuming to be handling
with a possible Unicode character set.
Executing this Python 3 script against our unpacked Hancitor binary sample we have:
[Figure 23]
That’s great! We managed to extract and decrypt the Hancitor C2 configuration from the unpacked
Hancitor binary. The advantage of writing a decryptor script is that we can use it against all Hancitors
sample that follow the same binary pattern.
To confirm our script, let’s look for another Hancitor sample, unpacking it, trying to extract and decrypt its
C2 data configuration. Malware Bazaar offers an endless number of Hancitor samples and we can use
Malwoverview to list and download them, as shown on the next figure:
31 | P a g e
https://exploitreversing.com
[Figure 24]
Collecting many possible Hancitor hashes is pretty easy, as shown below (the listing has been truncated):
[Figure 25]
32 | P a g e
https://exploitreversing.com
Therefore, we can pick up one of these hashes, download the respective sample from Malware Bazaar,
unzip it (password: infected) and load it into PE Bear (Figures 26 and 27):
[Figure 26]
[Figure 27]
33 | P a g e
https://exploitreversing.com
After unpacking this sample using the same method and breakpoints (on x32dbg) as shown previously, we
should verify it on PE Bear as shown below:
[Figure 28]
Using our C2 data decryptor script against this second unpacked Hancitor we have:
[Figure 29]
As we expected, everything has worked well again and we have a good Hancitor script to extract and
decrypt its C2 configuration data.
There’re two other methods that could have used to extract the C2 data configuration from Hancitor:
Both approaches are great, but they are understood as “manual” methods and we would need to work on
one sample a time. Using a debugger is quite simple because it’s enough to set a breakpoint on
CryptDecrypt( ) and, once hit, execute until its exit point and check (Follow in Dump) its output
parameter (the 5th argument) on stack. Certainly, the reader knows how to perform these steps.
34 | P a g e
https://exploitreversing.com
To use CyberChef, we need to select the bytes of the key and export it using SHIFT+E (Edit → Export Data)
and copy this data into the Input area. Pick up From Hex recipe (because exported data are in hexadecimal
format) then the SHA1 recipe because we want a SHA1 hash from the Input, as shown in the Figure 30:
[Figure 30]
To the encrypted data (gathered from 0x10004018), repeat the same procedure by exporting it (SHIFT+E)
and copying it into the Input area. Drag RC4 recipe into Recipe area and pay attention to few points: a.)
the passphrase is composed by the first 10 hexadecimal digits (5 bytes) from SHA1 output (as we learned
from CryptDeriveKey( )); b.) the Passphrase is in hex format; c.) the Input format is also in hexadecimal
format. Finally, we got the same result of our Python 3 script as shown in the Figure 31 below:
[Figure 31]
35 | P a g e
https://exploitreversing.com
11. Conclusion
In this first article I showed how to extract and decrypt the C2 data configuration from Hancitor malware
by writing a Python 3 script. Additionally, I presented several concepts and foundations such as code
injection and unpacking that will be useful in next articles of this series. Anyway, maybe it’s quite relevant
to highlight few points here:
a. I preferred this simple malware sample due my final purpose that’s to help other professionals to
take their own steps on malware analysis and, to start a series of article, certainly it was quite
useful.
b. For now, I have hidden a lot reversing engineering details about this malware on propose because
the initial goal was focusing only on C2 data configuration extraction and decryption in this article.
c. Once again, and unlike what many beginners in reverse engineering might think about, data
extraction/decryption is not something new (not even close), and I’ve been doing it from many
years, so it’s a great topic to start with. Furthermore, getting C2 configuration is one of the main
goals while analyzing a malware ( few other ones are infection’s vector, persistence, evasion and
network communication, for example).
d. I chose Python 3 as script language because I think it’s easier to understand and most of security
researchers know well about it. No doubts, we could write programs in C or Golang and, eventually
we can use them in next articles.
e. We will study several samples and contexts in the next articles and review topics such as COM,
unpacking, code injection, C2 emulation, .NET reversing, anti-analysis techniques, API resolving,
string decryption, IDC/IDA Python, IDA AppCall and so on, so let’s take it one step at a time.
Don’t forget: this is a live document and I will update it soon I find mistakes and errors.
I have been working with reverse engineering for over a decade, I’d like having started this series
previously, but it wasn’t possible, unfortunately. Therefore, now my plan is to write a book about malware
analysis to contribute to the security community and continue this series. Let’s see what will happen.
Just in case you want to keep in touch, my public contact information follows below:
▪ Twitter: @ale_sp_brazil
▪ LinkedIn: https://www.linkedin.com/in/aleborges
▪ Blog: https://exploitreversing.com
Alexandre Borges
36 | P a g e