Saturday, November 26, 2011

Advanced DLL Injection

It has been a while since my last article. Special thanks to those who decided to stay with me despite the long break and welcome to new readers!

In this article I am going to cover such a trivial (as it may seem) subject as DLL injection. For some reason, most of the tutorials on the web only give us a brief coverage of the topic, mostly limited to invocation of LoadLibraryA/W Windows API function in the address space of another process. While this is not bad at all, it gives us the least flexible solution. Meaning that all the logic MUST be hardcoded in the DLL we want to inject. On the other hand, we may incorporate all the configuration management (loading config files, parsing thereof, etc) into our DLL. This is better, but still fills it with code which is only going to run once.

Let us try another approach. What we are going to do, is write a loader (an executable what will inject our DLL into another process) and a small DLL, which will be injected. For simplicity, the loader will also create the target process. Being a Linux user, I used Flat Assembler and mingw32 for this task, but you may adjust the code for whatever environment you prefer.

A short remark for nerds before we start. The code in this article does not contain any security checks (e.g. checking correctness of the value returned by specific function) unless it is needed as an example. If you decide to try this code, you'll be doing this at your own risk.

So, let the fun begin.


Creation of target process

Let's assume, that the loader has already passed the phase of loading and parsing configuration files and is ready to start the actual job.

Windows provides us with all the tools we need to start a process. There are more then one way of doing that, but let us use the simplest and use CreateProcess API function. Its declaration looks quite frightening, but we'll make it as easy as possible:

   BOOL WINAPI CreateProcess(
      __in_opt    LPCTSTR lpApplicationName,
      __inout_opt LPTSTR lpCommandLine,
      __in_opt    LPSECURITY_ATTRIBUTES lpProcessAttributes,
      __in_opt    LPSECURITY_ATTRIBUTES lpThreadAttributes,
      __in        BOOL bInheritHandles,
      __in        DWORD dwCreationFlags,
      __in_opt    LPVOID lpEnvironment,
      __in_opt    LPCTSTR lpCurrentDirectory,
      __in        LPSTARTUPINFO lpStartupInfo,
      __out       LPPROCESS_INFORMATION lpProcessInformation
   );

We only have to specify half of the parameters when calling this function and set all the rest to NULL. This function has two variants CreateProcessA and CreateProcessW as ASCII and Unicode versions respectively. We are going to stick with ASCII all way long, so, our code would look like this (due to the fact that "CreateProcess" is rather a macro then function name, we should explicitly specify A version as some compilers tend to default to W versions):

CreateProcessA(nameOfTheFile, NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &startupInfo, &processInformation);

Don't forget to set the cb field of startupInfo to (DWORD)sizeof(STARTUPINFO), otherwise it would not work.

If the function succeeds, we get all the information about the process (handles and IDs) in the processInformation structure, which has the following prototype:

typedef struct _PROCESS_INFORMATION
{
   HANDLE hProcess;    //Handle to the process
   HANDLE hThread;     //Handle to the main thread of the process
   DWORD  dwProcessId; //ID of the new process
   DWORD  dwThreadId;  //ID of the main thread of the process
}PROCESS_INFORMATION, *LPPROCESS_INFORMATION;

By now, the process has been created, but it is suspended. Meaning that it has not started its execution yet and will not until we call ResumeThread(processInformation.dwThreadId) telling the operating system to resume the main thread of the process, but this is going to be the last action performed by our loader.


Lancet

One may call it a shellcode, but it has nothing to do with the viral payload or any other malicious intent (unless, someone would say that breaking into address space of another process is malicious by definition). It is the code, that we are going to inject into the target process. It, theoretically, may be written in any language as long as it may be position independent and compiled into native instructions (in our case x86 instructions), but I prefer to do such things in Assembly language.

It is always a good idea, to think of what your code is intended to do before writing a single line of it, in this case it is a golden idea. The code needs to be small, preferably fast and stable as it  is a bit of a headache to debug once it has been injected.

There are two basic tasks that you would want to assign to this code: 
  • Load our DLL
  • Call the initialization procedure exported by our dLL
and one unavoidable condition - it has to be a function declared as ThreadProc callback, due to the fact that we are going to use the CreateRemoteThread function in order to launch it. The prototype of a ThreadProc callback function looks like this:

DWORD WINAPI ThreadProc( __in LPVOID lpParameter);

which means that it has to return a value of type DWORD (which is actually unsigned int). It accepts one parameter, which may either be an actual value (but you have to cast it to LPVOID type) or a pointer to an array of parameters. One more thing about this function (the last but not the least!) it is an stdcall function - WINAPI macro is defined as __declspec(stdcall). This means that our function has to take care of cleaning the stack before return. In our case it is quite easy, simply use ret 0x04 (assuming that size of LPVOID is 4 bytes).

Another important thing to mention - you will, obviously need to know how many bytes your function occupies in order to correctly allocate memory in the address space of the target process and move your code there. In addition to allocation of one block of executable memory for our function, you will also need to allocate one block for data - configuration settings to be passed to the injected DLL. It is easy to pass the address of the parameters as an argument to our ThreadProc.

The skeleton of the function would look like this:

lancet:
   push  ebp
   mov   ebp, esp
   sub   esp, as_much_space_as_you_need_for_variables
   push  registers_you_are_planning_to_use

   ;function body

   pop   registers_you_used
   mov   esp, ebp
   pop   ebp
   ret   0x04
lancet_size = $-lancet

The last line gives us the exact size of the function in bytes. The following is the source file template:


format MS COFF ;as we are going to link this file with our loader
public lancet as '_lancet'
section '.text' readable executable
lancet:
   ;our function goes here
   ;followed by data
   loadLibraryA  db 'LoadLibraryA',0
   init          db 'name_of_the_initialization_function',0
   ourDll        db 'name_of_our_dll',0
   kernel32      db 'kernel32.dll',0
lancet_size = $-lancet
public lsize as '_lancet_size'
section '.data' readable writeable
   lsize         dd lancet_size




So, what are we going to insert into the "function body"? First of all, as our code, once it is injected, has no idea of where in the memory it is, we should save our "base address" and calculate all the offsets relative to that address. This is done in a simple manner. We call the next address and pop the return address into our local variable.


   call @f
  @@:
   pop  dword [ebp-4]
   sub  dword [ebp-4], @b-lancet


that's it. Now the variable at [ebp-4] contains our "base address". Each time we want to call another function or access our data (strings with names, remember?) we should do the following:


   mov  ebx, [ebp-4]
   add  ebx, ourDll-lancet
   push ebx
   mov  ebx, [ebp-8] ;assume that we stored the address of LoadLibraryA at [ebp-8]
   call dword ebx


The code above is an equivalent of LoadLibraryA("name_of_our_dll") .


Now about the execution itself. Although, we now know where we are, we have no idea of what the address of LoadLibraryA is. There are, at least, two ways to get that address nicely. First has been described in my "Stealth Import of Windows API" article. The second is also interesting - PEB. Yes, we are going to access the Process Environment Block, find the LDR_MODULE structure which refers to KERNEL32.DLL and get its base address (which is also a handle to the library). Some may say that this way is not reliable, not stable and even dangerous, but I will say, that statements like these are not serious. We are not going to change anything in those structures. We are only going to parse them.


How do we find the PEB? This is quite simple. It is located at [FS:0x30]. Once we have it, we are on our way to PEB_LDR_DATA address, which is at PEB+0x0C.  In order to parse the PEB_LDR_DATA structure, we should declare the following in our Assembly code:


struc list_entry
{
   .flink dd ?   ;pointer to next list_entry structure
   .blink dd ?   ;pointer to previous list_entry structure
}


struc peb_ldr_data
{
   .length      dd ?
   .initialized db ?
                db ?
                db ?
                db ?
   .ssHandle    dd ?
   .inLoadOrderModuleList list_entry ;we are going to use this list
   .inMemoryOrderModuleList list_entry
   .inInitializationOrderModuleList list_entry
}


struc ldr_module
{
   .inLoadOrderModuleList list_entry ;pointers to previous and next modules in list
   .inMemoryOrderModuleList list_entry
   .inInitializationOrderModuleList list_entry
   .baseAddress   dd ?           ;This is what we need!
   .entryPoint    dd ?
   .sizeOfImage   dd ?
   .fullDllName   unicode_string ;full path to the module file
   .baseDllName   unicode_string ;name of the module file
   .flags         dd ?
   .loadCount     dw ?
   .tlsIndex      dw ?
   .hashTable     list_entry
   .timeDateStamp dd ?
}


I leave the implementation of the module list parsing function up to you. You just have to keep in mind that the string you are going to check are represented by the UNICODE_STRING structure (described in the article referenced above). Another thing to remember, is that it is better to implement case insensitive string comparison function.


Once you find the LDR_MODULE wich baseDllName is "kernel32.dll" you have its handle (simply in the baseAddress field). You may use the _get_proc_address function from the same article (mentioned above) in order  to get the address of the LoadLibraryA function. Having that address, you are ready to load your DLL (do the actual injection). Personal suggestion - do not put lots of code into the DllMain function. 


LoadLibraryA returns a handle to the newly loaded DLL, which you can use in order to locate you initialization function (remember it has to be exported by your DLL and preferably use the stdcall convention). After you _get_proc_address of your initialization function, call it and pass the address of the data block as a parameter (it was passed to our lancet function as a parameter on stack):


   push dword [ebp+8]  ;parameter passed to lancet is here
   call dword [ebp-12] ;assume that you stored the address of the initialization 
                       ;function here


That's it. Your code may now return. The DLL has been injected and initialized.


Injection
somehow, we have missed the exciting process of injection of our lancet code. Don't worry, I have not forgotten about it.


As I have mentioned above, we have to allocate two blocks - for code and data. This can be done by calling the VirtualAllocEx function, which allows memory allocations in the address space of another process.


LPVOID WINAPI VirtualAllocEx(
   __in     HANDLE hProcess,
   __in_opt LPVOID lpAddress,
   __in     SIZE_T dwSize,
   __in     DWORD  flAllocationType,
   __in     DWORD  flProtect
);


Use MEM_COMMIT as flAllocationType and PAGE_EXECUTE_READWRITE and PAGE_READWRITE for allocation of code and data block respectively. This function returns the address of allocated block in the address space of the specified process or NULL.


The WriteProcessMemory API function is used to copy your code and data into the address space of the target process.


BOOL WINAPI WriteProcessMemory(
   __in  HANDLE  hProcess,
   __in  LPVOID  lpBaseAddress,
   __in  LPCVOID lpBuffer,
   __in  SIZE_T nSize,
   __out SIZE_T*lpNumberOfBytesWritten
);


Once you have copied both the data and the code, you will want to call your thread function. The only way to call a function which resides in the memory of another process is by calling the CreateRemoteThread API.


HANDLE WINAPI CreateRemoteThread(
   __in  HANDLE hProcess, //the handle to our process
   __in  LPSECURITY_ATTRIBUTES lpThreadAttributes, //may be NULL
   __in  SIZE_T dwStackSize, //may be 0
   __in  LPTHREAD_START_ROUTINE, //the address of our code block
   __in  LPVOID lpParameter, //the address of our data block
   __in  DWORD  dwCreationFlags, //may be 0
   __out LPDWORD lpThreadId  //may be NULL
);


This function returns a handle to the remote thread, which, in turn, may be passed to the WaiForSingleObject API function, so that we can get notification on its return.

I decided not to cover the possibilities of what your DLL can do while attached to the target process and leave this completely up to you.




I hope this article was not too muddled and, may be, even helpful.


Have fun coding and see you at the next post.