转载

发表于 2015年08月09日
浏览 (744)
评论 (0)

【JVM】模板解释器–如何根据字节码生成汇编码？

1、背景

仅针对JVM的模板解释器：

如何根据opcode和寻址模式，将bytecode生成汇编码。

本文的示例中所使用的字节码和汇编码，请参见上篇博文：按值传递还是按引用？

2、寻址模式

本文不打算深入展开寻址模式的阐述，我们聚焦Intel的IA32-64架构的指令格式：

【JVM】模板解释器–如何根据字节码生成汇编码？

简要说明下，更多的请参考intel的手册：

– Prefixes ：用于修饰操作码Opcode，赋予其lock、repeat等的语义.

– REX Prefix ：

—- Specify GPRs and SSE registers.

—- Specify 64-bit operand size.

—- Specify extended control registers.

– Opcode ：操作码,如mov、push.

– Mod R/M ：寻址相关，具体见手册。

– SIB ：和Mod R/M结合起来指定寻址。

– Displacement ：配合Mod R/M和SIB指定寻址。

– Immediate ：立即数。

对上面的Opcode、Mod R/W、SIB、disp、imm如果不明白，看句汇编有个概念：

%mov %eax , %rax,-0x18(%rcx,%rbx,4)

如果这句汇编也不太明白，那么配合下面的：

– Base + (Index ∗ Scale) + Displacement – Using all the addressing components together allows efficient

indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size.

3、合法的值（64位）

关注下这4个参数的合法取值：

• Displacement — An 8-bit, 16-bit, or 32-bit value.

• Base — The value in a 64-bit general-purpose register.

• Index — The value in a 64-bit general-purpose register.

• Scale factor — A value of 2, 4, or 8 that is multiplied by the index value.

4、Mod R/M（32位寻址）

我们在后文将会用到Mod R/M字节，所以将32位寻址的格式贴在这里：

【JVM】模板解释器–如何根据字节码生成汇编码？

上表的备注，其中第1条将在我们的示例中用到，所以这里留意下：

The [--][--] nomenclature means a SIB follows the ModR/M byte.
The disp32 nomenclature denotes a 32-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is
added to the index.
The disp8 nomenclature denotes an 8-bit

5、SIB（32位寻址）

同样，因为用到了Mod R/M字节，那么SIB字节也可能要用到：

【JVM】模板解释器–如何根据字节码生成汇编码？

6、示例

6.1、准备工作

来看个实际的例子。

下面的代码是生成mov汇编码：

void Assembler::movl(Address dst, Register src) {   InstructionMark im(this);   prefix(dst, src);   emit_int8((unsigned char)0x89);   emit_operand(src, dst); }

prefix(dst,src) 就是处理prefix和REX prefix，这里我们不关注。

emit_int8((unsigned char) 0x89) 顾名思义就是生成了一个字节，那字节的内容0×89代表什么呢？

先不急，还有一句 emit_operand(src,dst) ，这是一段很长的代码，我们大概看下：

void Assembler::emit_operand(Register reg, Register base, Register index,                  Address::ScaleFactor scale, int disp,                  RelocationHolder const& rspec,                  int rip_relative_correction) {   relocInfo::relocType rtype = (relocInfo::relocType) rspec.type();    // Encode the registers as needed in the fields they are used in    int regenc = encode(reg) << 3;   int indexenc = index->is_valid() ? encode(index) << 3 : 0;   int baseenc = base->is_valid() ? encode(base) : 0;    if (base->is_valid()) {     if (index->is_valid()) {       assert(scale != Address::no_scale, "inconsistent address");       // [base + index*scale + disp]       if (disp == 0 && rtype == relocInfo::none  &&           base != rbp LP64_ONLY(&& base != r13)) {         // [base + index*scale]         // [00 reg 100][ss index base]         /**************************   * 关键点：关注这里        **************************/          assert(index != rsp, "illegal addressing mode");         emit_int8(0x04 | regenc);         emit_int8(scale << 6 | indexenc | baseenc);       } else if (is8bit(disp) && rtype == relocInfo::none) {         // ...       } else {         // [base + index*scale + disp32]         // [10 reg 100][ss index base] disp32         assert(index != rsp, "illegal addressing mode");         emit_int8(0x84 | regenc);         emit_int8(scale << 6 | indexenc | baseenc);         emit_data(disp, rspec, disp32_operand);       }     } else if (base == rsp LP64_ONLY(|| base == r12)) {       // ...      } else {        // ...      }   } else {     // ...    } }

上面的代码的关注点已经标出，这里我们将其抽出，并将前文中的 emit_int8((unsigned char) 0x89) 结合起来：

emit_int8((unsigned char) 0x89) emit_int8(0x04 | regenc); emit_int8(scale << 6 | indexenc | baseenc);

最终其生成了如下的汇编代码（64位机器）：

mov    %eax,(%rcx,%rbx,1)

好了，问题来了：

上面这句汇编怎么得出的？

6.2、计算过程

我们给个下面的值：

regenc = 0x0，scale << 6 | indexenc | baseenc = 25

进行简单的运算就可以得到：

emit_int8((unsigned char) 0x89) //得到0x89 emit_int8(0x04 | regenc); //得到0x04 emit_int8(scale << 6 | indexenc | baseenc); //得到0x19

合起来就是三个字节：

0x89 0x04 0x19

1、0×89对应什么？

【JVM】模板解释器–如何根据字节码生成汇编码？

从上表可以看出因为JVM工作在64位下，所以需要配合REX.W来“起头”，不过在我们这个例子中，其恰好是0。

主要看那个89/r：

MOV r/m64,r64 //64位，将寄存器中的值给到寄存器或者内存地址中

2、0×04代表什么？

现在我们要用到上面的Mod R/M表和SIB表了。

用第二个字节0×04查Mod R/M表，可知源操作数是寄存器EAX，同时可知寻址类型是[--][--]类型，含义为：

The [--][--] nomenclature means a SIB follows the ModR/M byte.

3、0×19代表什么？

继续查SIB表，对应字节0×19的是：

base = ECX scaled index = EBX

4、汇编代码：

//32位 mov %eax,%(ecx,ebx,1)  //64位 mov %rax,%(rcx,rbx,1)

7、结语

本文简要探讨了：

如何根据opcode和寻址模式，将bytecode生成汇编码。

终。

正文到此结束

所属分类：编程技术

本文标签： struct ip 参数 XEN value CTO 代码 UI cat
版权声明： 本文为互联网转载文章，出处已在文章中说明(部分除外)。如果侵权，请联系本站长删除，谢谢。
本文海报： 生成海报一生成海报二

热门推荐

openfire数据库安装指南

浏览(14,961) 评论(0)
Caffe 深度学习框架上手教程

浏览(11,237) 评论(0)
ReactiveCocoa入门教程：第一部分

浏览(12,041) 评论(0)
开源HIDS-OSSEC使用实例:监测CC攻击

浏览(11,942) 评论(0)
Decorators in ES7

浏览(16,557) 评论(4)
用Electron（Atom编辑器的兄弟项目）开发桌面应用

浏览(29,531) 评论(0)
Windows下JetBrains CLion中文输出乱码的解决方法

浏览(12,976) 评论(1)
同步-@synchronized, NSLock, pthread, OSSpinLock性能比较

浏览(11,890) 评论(0)
【开班了】JAVA培训班正式招生

浏览(8,039) 评论(12)
Seaweedfs之Volume读请求重定向

浏览(26,190) 评论(3)

相关文章

阿里云首购8折

Loading...

其他链接

关于本站

本站定位：个人技术类博客

本站作用：写博客、记日志、闲聊扯淡鼓捣技术。

问题交流

[HBLOG]公众号

HBLOG

HBLOG