转载

科普向——使用BrainFuck语言编写简单网页

今天花了半个小时用BrainFuck语言编写了一个只有一行字的网页(同样的事情如果用C语言大概需要花5分钟,用PHP只需要20秒钟),大概没人比我更无聊了吧。地址是 http://qing.su/cgi-bin/brainfuck.cgi

BrainFuck是世界上最精致的图灵完备的计算机语言(其编译器仅有240bytes)。它由八个字符构成:<>+-.,[]分别代表了左右位移、增减变量、输出输入以及循环开闭。如此有限的字符库决定了其编写过程的繁琐和冗长、易读性极差,几乎无法成为真正生产使用的计算机语言。或许,偶尔编写一个BrainFuck程序烧一烧脑子是不错的选择。下面介绍一下用BrainFuck语言编写网页的方式。

开始编写网页之前,需要了解一下CGI编程规范。任何一种语言编写的程序都可以成为网页。如PHP, JSP之类的程序可以通过对应的脚本解释器转换为HTML标签格式,直接呈现在浏览器上供人们访问。而如果使用其他非主流语言,比如之前提到的C语言(参考 http://qing.su/article/93.html )或者正在使用的BrainFuck语言,则可以通过CGI的方式访问,让服务器将程序转化为HTML标签提供给客户端浏览器识别。按照CGI的要求,输出到浏览器上的程序需要首先提交头信息,比如,Content-type: text/html, 并且在头信息下部有一空行。因此,只要遵循这一规范,我们就可以用任何语言的程序编写网页。

首先,编写一个BrainFuck语言程序,如下。

++++++++++   * 变量第零位+10, 储存循环次数  [>+++++++++++   * 变量第一位+11, 10*11=110 == asc('n')  >++++++++++++>+++>++++++>+++++++>++++++++>+>++++   * 类似上一步,设置8个变量方便输出字符  <<<<<<<<-]   * 循环,每次循环第一位变量-1, 直至0  >>>>>---.   * 第五位变量-3, 输出10*7-3=67 == asc('C')  <<<<+.   * 第一位变量+1, 输出10*11+1=111 == asc('o')  -.   * 第一位变量-1, 110 'n'  >----.   * 第二位变量-4, 10*12-4=116 == asc('t')  <---------.   * 第一位变量-9, 101 'e'  +++++++++.>.>>>>>>+++++.  <<<<<<.+++++.<++.-----------.  >>>--.<++.<-----.<.>++++.----.  >>>>>>++.<<<<<<<+++.>.<+++++.-.  >>>>>>..<<+++++.--<+++++++.>>..  +++++++++.<<<.>>++++++++.--<++++.++++++++++++++++++.  ------------------>>--<<<.  >>>++.<<.----.>++++++.<<+.    * 继续之前的输出

这个程序做了两件事:1,向服务器输出Content-type: text/html/n/n. 2,向服务器输出需要显示在屏幕上的句子,HAPPY NEW YEAR!

编写完毕后,我们在服务器上将其编译为可执行程序。编译器为汇编源码(链接为:http://www.muppetlabs.com/~breadbox/software/tiny/bf.asm.txt),可以用nasm程序将其编译成可执行程序。新建文件bf.asm将源码保存在其中:

;; bf.asm: Copyright (C) 1999 Brian Raiter   ;; Licensed under the terms of the GNU General Public License, either  ;; version 2 or (at your option) any later version.  ;;  ;; To build:  ;;  nasm -f bin -o bf bf.asm && chmod +x bf  ;; To use:  ;;  bf < foo.b > foo && chmod +x foo    BITS 32    ;; This is the size of the data area supplied to compiled programs.    %define arraysize   30000    ;; For the compiler, the text segment is also the data segment. The  ;; memory image of the compiler is inside the code buffer, and is  ;; modified in place to become the memory image of the compiled  ;; program. The area of memory that is the data segment for compiled  ;; programs is not used by the compiler. The text and data segments of  ;; compiled programs are really only different areas in a single  ;; segment, from the system's point of view. Both the compiler and  ;; compiled programs load the entire file contents into a single  ;; memory segment which is both writeable and executable.    %define TEXTORG     0x45E9B000  %define DATAOFFSET  0x2000  %define DATAORG     (TEXTORG + DATAOFFSET)    ;; Here begins the file image.            org TEXTORG    ;; At the beginning of the text segment is the ELF header and the  ;; program header table, the latter consisting of a single entry. The  ;; two structures overlap for a space of eight bytes. Nearly all  ;; unused fields in the structures are used to hold bits of code.    ;; The beginning of the ELF header.            db  0x7F, "ELF"     ; ehdr.e_ident    ;; The top(s) of the main compiling loop. The loop jumps back to  ;; different positions, depending on how many bytes to copy into the  ;; code buffer. After doing that, esi is initialized to point to the  ;; epilog code chunk, a copy of edi (the pointer to the end of the  ;; code buffer) is saved in ebp, the high bytes of eax are reset to  ;; zero (via the exchange with ebx), and then the next character of  ;; input is retrieved.    emitputchar:    add esi, byte (putchar - decchar) - 4  emitgetchar:    lodsd  emit6bytes: movsd  emit2bytes: movsb  emit1byte:  movsb  compile:    lea esi, [byte ecx + epilog - filesize]          xchg    eax, ebx          cmp eax, 0x00030002     ; ehdr.e_type    (0x0002)                          ; ehdr.e_machine (0x0003)          mov ebp, edi        ; ehdr.e_version          jmp short getchar    ;; The entry point for the compiler (and compiled programs), and the  ;; location of the program header table.            dd  _start          ; ehdr.e_entry          dd  proghdr - $$        ; ehdr.e_phoff    ;; The last routine of the compiler, called when there is no more  ;; input. The epilog code chunk is copied into the code buffer. The  ;; text origin is popped off the stack into ecx, and subtracted from  ;; edi to determine the size of the compiled program. This value is  ;; stored in the program header table, and then is moved into edx.  ;; The program then jumps to the putchar routine, which sends the  ;; compiled program to stdout before falling through to the epilog  ;; routine and exiting.    eof:        movsd               ; ehdr.e_shoff          xchg    eax, ecx          pop ecx          sub edi, ecx        ; ehdr.e_flags          xchg    eax, edi          stosd          xchg    eax, edx          jmp short putchar       ; ehdr.e_ehsize    ;; 0x20 == the size of one program header table entry.            dw  0x20            ; ehdr.e_phentsize    ;; The beginning of the program header table. 1 == PT_LOAD, indicating  ;; that the segment is to be loaded into memory.    proghdr:    dd  1           ; ehdr.e_phnum & phdr.p_type                          ; ehdr.e_shentsize          dd  0           ; ehdr.e_shnum & phdr.p_offset                          ; ehdr.e_shstrndx    ;; (Note that the next four bytes, in addition to containing the first  ;; two instructions of the bracket routine, also comprise the memory  ;; address of the text origin.)            db  0           ; phdr.p_vaddr    ;; The bracket routine emits code for the "[" instruction. This  ;; instruction translates to a simple "jmp near", but the target of  ;; the jump will not be known until the matching "]" is seen. The  ;; routine thus outputs a random target, and pushes the location of  ;; the target in the code buffer onto the stack.    bracket:    mov al, 0xE9          inc ebp          push    ebp         ; phdr.p_paddr          stosd          jmp short emit1byte    ;; This is where the size of the executable file is stored in the  ;; program header table. The compiler updates this value just before  ;; it outputs the compiled program. This is the only field in the two  ;; headers that differs between the compiler and its compiled  ;; programs. (While the compiler is reading input, the first byte of  ;; this field is also used as an input buffer.)    filesize:   dd  compilersize        ; phdr.p_filesz    ;; The size of the program in memory. This entry creates an area of  ;; bytes, arraysize in size, all initialized to zero, starting at  ;; DATAORG.            dd  DATAOFFSET + arraysize  ; phdr.p_memsz    ;; The code chunk for the "." instruction. eax is set to 4 to invoke  ;; the write system call. ebx, the file handle to write to, is set to  ;; 1 for stdout. ecx points to the buffer containing the bytes to  ;; output, and edx equals the number of bytes to output. (Note that  ;; the first byte of the first instruction, which is also the least  ;; significant byte of the p_flags field, encodes to 0xB3. Having the  ;; 2-bit set marks the memory containing the compiler, and its  ;; compiled programs, as writeable.)    putchar:    mov bl, 1           ; phdr.p_flags          mov al, 4          int 0x80            ; phdr.p_align    ;; The epilog code chunk. After restoring the initialized registers,  ;; eax and ebx are both zero. eax is incremented to 1, so as to invoke  ;; the exit system call. ebx specifies the process's return value.    epilog:     popa          inc eax          int 0x80    ;; The code chunks for the ">", "<", "+", and "-" instructions.    incptr:     inc ecx  decptr:     dec ecx  incchar:    inc byte [ecx]  decchar:    dec byte [ecx]    ;; The main loop of the compiler continues here, by obtaining the next  ;; character of input. This is also the code chunk for the ","  ;; instruction. eax is set to 3 to invoke the read system call. ebx,  ;; the file handle to read from, is set to 0 for stdin. ecx points to  ;; a buffer to receive the bytes that are read, and edx equals the  ;; number of bytes to read.    getchar:    mov al, 3          xor ebx, ebx          int 0x80    ;; If eax is zero or negative, then there is no more input, and the  ;; compiler proceeds to the eof routine.            or  eax, eax          jle eof    ;; Otherwise, esi is advanced four bytes (from the epilog code chunk  ;; to the incptr code chunk), and the character read from the input is  ;; stored in al, with the high bytes of eax reset to zero.            lodsd          mov eax, [ecx]    ;; The compiler compares the input character with ">" and "<". esi is  ;; advanced to the next code chunk with each failed test.            cmp al, '>'          jz  emit1byte          inc esi          cmp al, '<'          jz  emit1byte          inc esi    ;; The next four tests check for the characters "+", ",", "-", and  ;; ".", respectively. These four characters are contiguous in ASCII,  ;; and so are tested for by doing successive decrements of eax.            sub al, '+'          jz  emit2bytes          dec eax          jz  emitgetchar          inc esi          inc esi          dec eax          jz  emit2bytes          dec eax          jz  emitputchar    ;; The remaining instructions, "[" and "]", have special routines for  ;; emitting the proper code. (Note that the jump back to the main loop  ;; is at the edge of the short-jump range. Routines below here  ;; therefore use this jump as a relay to return to the main loop;  ;; however, in order to use it correctly, the routines must be sure  ;; that the zero flag is cleared at the time.)            cmp al, '[' - '.'          jz  bracket          cmp al, ']' - '.'  relay:      jnz compile    ;; The endbracket routine emits code for the "]" instruction, as well  ;; as completing the code for the matching "[". The compiler first  ;; emits "cmp dh, [ecx]" and the first two bytes of a "jnz near". The  ;; location of the missing target in the code for the "[" instruction  ;; is then retrieved from the stack, the correct target value is  ;; computed and stored, and then the current instruction's jmp target  ;; is computed and emitted.    endbracket: mov eax, 0x850F313A          stosd          lea esi, [byte edi - 8]          pop eax          sub esi, eax          mov [eax], esi          sub eax, edi          stosd          jmp short relay    ;; This is the entry point, for both the compiler and its compiled  ;; programs. The shared initialization code sets ecx to the beginning  ;; of the array that is the compiled program's data area, and edx to  ;; one. (This also clears the zero flag for the relay jump below.) The  ;; registers are then saved on the stack, to be restored at the end.    _start:          mov ecx, DATAORG          inc edx          pusha    ;; At this point, the compiler and its compiled programs diverge.  ;; Although every compiled program includes all the code in this file  ;; above this point, only the three instructions directly above are  ;; actually used by both. This point is where the compiler begins  ;; storing the generated code, so only the compiler sees the  ;; instructions below. This routine first modifies ecx to contain  ;; TEXTORG, which is stored on the stack, and then offsets it to point  ;; to filesize. edi is set equal to codebuf, and then the compiler  ;; enters the main loop.    codebuf:          mov ch, (TEXTORG >> 8) & 0xFF          push    ecx          mov cl, filesize - $$          lea edi, [byte ecx + codebuf - filesize]          jmp short relay    ;; Here ends the file image.    compilersize    equ $ - $$

执行:

yum install nasm -y  nasm -f bin -o bf_compiler bf.asm  chmod +x ./bf_compiler

将上面的BrainFuck程序保存在brainfuck.bf文件,在SSH中执行:

./bf_compiler < brainfuck.bf > brainfuck.cgi  chmod +x ./brainfuck.cgi  ./brainfuck.cgi

如果这时能够看到我们之前说的那两行输出,说明网页编写成功。然后,将这个文件复制到cgi-bin下面,通过浏览器就可以访问了。如果出现HTTP 500错误,请查看Apache日志。

毕竟是一个比较麻烦的事情,我就不再继续用BrainFuck做更多功能的网页了。大家有什么问题可以在下面留言问我。

原文  http://qing.su/article/119.html
正文到此结束
Loading...