这是我对实现一个脚本语言的一点心得和体会。 该脚本语言的实现是使用java作为实现语言 该语言的学习是根据http://www.craftinginterpreters.com/scanning.html教程实现 知识前置 学习该知识需要你有一定的java基础知识,了解oop特性,以及学过编译原理。 lox是一门脚本语言,这意味着他的代码直接运行,而不是转换成汇编代码。 复制代码
让我们开始吧
public class Lox { public static void main(String[] args) throws IOException { if (args.length > 1) { System.out.println("Usage: jlox [script]"); System.exit(64); } else if (args.length == 1) { runFile(args[0]); } else { runPrompt(); } } } 复制代码
实际上,有两种方法可以运行某些代码。如果您从命令行启动jlox并为其提供文件路径,它将读取文件并执行该文件:
private static void runFile(String path) throws IOException { byte[] bytes = Files.readAllBytes(Paths.get(path)); run(new String(bytes, Charset.defaultCharset())); } 复制代码
如果你想不带任何参数去启动他,做到交互的话,
private static void runPrompt() throws IOException { InputStreamReader input = new InputStreamReader(System.in); BufferedReader reader = new BufferedReader(input); for (;;) { System.out.print("> "); run(reader.readLine()); } } 上述两种运行方法的实现,其实都是依赖于对run方法的包装 private static void run(String source) { Scanner scanner = new Scanner(source); List<Token> tokens = scanner.scanTokens(); // For now, just print the tokens. for (Token token : tokens) { System.out.println(token); } 复制代码
}
好了,到现在我们已经完成编译器工作的第一步,这只是开始,我们一起加油。
接下来是错误处理,当输入的tokens出现错误时,编译器如何识别,并报出一个错误。
static void error(int line, String message) { | report(line, "", message); } private static void report(int line, String where, String message) { System.err.println( "[line " + line + "] Error" + where + ": " + message); hadError = true; } 复制代码
我们继续在lox类中创建一个静态error函数,静态调用report函数,report函数传入line 参数,error发生位置,错误信息。 这是一个最低要求的错误显示函数。当出现bug时候他会通知我们
Error: Unexpected "," *somewhere* in your program. Good luck finding it! 复制代码
但是这样的函数,还是无法让我们满意,我们希望一个错误显示函数,能告知我们错误在哪个地方
Error: Unexpected "," in argument list. 15 | function(first, second,); ^-- Here. 复制代码
我们在low类中定义一个boolean变量hadError,
public class Lox { static boolean hadError = false; 复制代码
然后我们在runfile函数里
private static void runFile(String path) throws IOException { byte[] bytes = Files.readAllBytes(Paths.get(path)); run(new String(bytes, Charset.defaultCharset())); //增加一个判断,出现问题就退出 if (hadError) System.exit(65); } 复制代码
runPrompt函数
private static void runPrompt() throws IOException { InputStreamReader input = new InputStreamReader(System.in); BufferedReader reader = new BufferedReader(input); for (;;) { System.out.print("> "); run(reader.readLine()); hadError = false; } } 复制代码
接下来就是词素工作 关键字是语言文法分析中一件很重要的工作, 我们的词法分析使用的是词法比较方式,这种方式其实非常缓慢,但是为了我们学习起来足够简单,我们将采用这种方法。
enum TokenType { // Single-character tokens. //单字符串 LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE, COMMA, DOT, MINUS, PLUS, SEMICOLON, SLASH, STAR, //> ,< ,==,!= // One or two character tokens. BANG, BANG_EQUAL, EQUAL, EQUAL_EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL, //string number // Literals. IDENTIFIER, STRING, NUMBER, // 关键字 // Keywords. AND, CLASS, ELSE, FALSE, FUN, FOR, IF, NIL, OR, PRINT, RETURN, SUPER, THIS, TRUE, VAR, WHILE, EOF } 复制代码
我们再创建一个token类,将上面那个枚举装进来
class Token { final TokenType type; final String lexeme; final Object literal; final int line; Token(TokenType type, String lexeme, Object literal, int line) { this.type = type; this.lexeme = lexeme; this.literal = literal; this.line = line; } public String toString() { return type + " " + lexeme + " " + literal; } } 复制代码
虽然在词法扫描中,正则表达式很重要,但为了入门考虑 该教程不尝试实现正则表达式功能
接下来,我们要实现一个字符串的scanner类
import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static com.craftinginterpreters.lox.TokenType.*; class Scanner { private final String source; private final List<Token> tokens = new ArrayList<>(); Scanner(String source) { this.source = source; } List<Token> scanTokens() { while (!isAtEnd()) { // We are at the beginning of the next lexeme. start = current; scanToken(); } tokens.add(new Token(EOF, "", null, line)); return tokens; } } 复制代码
这个类中我们实现scantokens函数,这个函数将不断的扫描token
然后,为了我们能够有效的定位字符的扫描,我们需要在类Scanner中添加
private int start = 0; private int current = 0; private int line = 1; 复制代码
建立一个函数,帮助我们判断是否扫描完成
private boolean isAtEnd() { return current >= source.length(); } 复制代码
接下来就是scanner中的核心函数,扫描功能的实现
private void scanToken() { char c = advance(); switch (c) { case '(': addToken(LEFT_PAREN); break; case ')': addToken(RIGHT_PAREN); break; case '{': addToken(LEFT_BRACE); break; case '}': addToken(RIGHT_BRACE); break; case ',': addToken(COMMA); break; case '.': addToken(DOT); break; case '-': addToken(MINUS); break; case '+': addToken(PLUS); break; case ';': addToken(SEMICOLON); break; case '*': addToken(STAR); break; case '!': addToken(match('=') ? BANG_EQUAL : BANG); break; case '=': addToken(match('=') ? EQUAL_EQUAL : EQUAL); break; case '<': addToken(match('=') ? LESS_EQUAL : LESS); break; case '>': addToken(match('=') ? GREATER_EQUAL : GREATER); break; case '/': if (match('/')) { // A comment goes until the end of the line. while (peek() != '/n' && !isAtEnd()) advance(); } else { addToken(SLASH); } break; case ' ': case '/r': case '/t': // Ignore whitespace. break; case '/n': line++; break; case '"': string(); break; case 'o': if (peek() == 'r') { addToken(OR); } break; default: if (isDigit(c)) { number(); } else if (isAlpha(c)) { identifier(); } else { Lox.error(line, "Unexpected character."); } break; } } 复制代码
然后我们需要实现几个辅助函数
private char advance() { current++; return source.charAt(current - 1); } private void addToken(TokenType type) { addToken(type, null); } private void addToken(TokenType type, Object literal) { String text = source.substring(start, current); tokens.add(new Token(type, text, literal, line)); } 复制代码
除了这些,我们还是需要考虑一些问题,如果用户传入一些计算机不能识别的字符,如#^@这些,我们的扫描器应该就要报错 为此,我们的tokenscan方法中,我们就需要
default: if (isDigit(c)) { number(); } else if (isAlpha(c)) { identifier(); } else { Lox.error(line, "Unexpected character."); } break; 复制代码
我们tokenscan函数中,不仅对字符进行了扫描,还对操作符进行了扫描
case '-': addToken(MINUS); break; case '+': addToken(PLUS); break; case ';': addToken(SEMICOLON); break; case '*': addToken(STAR); break; case '!': addToken(match('=') ? BANG_EQUAL : BANG); break; case '=': addToken(match('=') ? EQUAL_EQUAL : EQUAL); break; case '<': addToken(match('=') ? LESS_EQUAL : LESS); break; case '>': addToken(match('=') ? GREATER_EQUAL : GREATER); break; 复制代码
我们在tokenscan函数后面添加一个 match函数
private boolean match(char expected) { if (isAtEnd()) return false; if (source.charAt(current) != expected) return false; current++; return true; } 复制代码
还有
private boolean isDigit(char c) { return c >= '0' && c <= '9'; } 复制代码
以及判断数字的函数
private void number() { while (isDigit(peek())) advance(); // Look for a fractional part. if (peek() == '.' && isDigit(peekNext())) { // Consume the "." advance(); while (isDigit(peek())) advance(); } addToken(NUMBER, Double.parseDouble(source.substring(start, current))); } 复制代码
以及判断字符串的函数
private void identifier() { while (isAlphaNumeric(peek())) advance(); addToken(IDENTIFIER); } 添加一个辅助函数 private boolean isAlpha(char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_'; } 下面这个函数旁段字符串是数字还是字符 private boolean isAlphaNumeric(char c) { return isAlpha(c) || isDigit(c); } 复制代码
干完这些,我们就能把字符串进行扫描,那现在是不是大功告成了呢?还没有,我们还没有把关键字进行扫描 在scanner类中我们加入
private static final Map<String, TokenType> keywords; static { keywords = new HashMap<>(); keywords.put("and", AND); keywords.put("class", CLASS); keywords.put("else", ELSE); keywords.put("false", FALSE); keywords.put("for", FOR); keywords.put("fun", FUN); keywords.put("if", IF); keywords.put("nil", NIL); keywords.put("or", OR); keywords.put("print", PRINT); keywords.put("return", RETURN); keywords.put("super", SUPER); keywords.put("this", THIS); keywords.put("true", TRUE); keywords.put("var", VAR); keywords.put("while", WHILE); } 复制代码
在identifed方法中
private void identifier() { // See if the identifier is a reserved word. String text = source.substring(start, current); TokenType type = keywords.get(text); if (type == null) type = IDENTIFIER; addToken(type); } 复制代码
我们实现了扫描功能
最后,为了大家理解,将完整源码发出
启动器
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.nio.charset.Charset; import java.nio.file.Files; import java.nio.file.Paths; import java.util.List; public class Lox { static boolean hadError = false; public static void main(String[] args) throws IOException { if (args.length > 1) { System.out.println("Usage: jlox [script]"); System.exit(64); } else if (args.length == 1) { runFile(args[0]); } else { runPrompt(); } } private static void runFile(String path) throws IOException { byte[] bytes = Files.readAllBytes(Paths.get(path)); run(new String(bytes, Charset.defaultCharset())); if (hadError) System.exit(65); } private static void runPrompt() throws IOException { InputStreamReader input = new InputStreamReader(System.in); BufferedReader reader = new BufferedReader(input); for (;;) { System.out.print("> "); run(reader.readLine()); hadError = false; } } private static void run(String source) { Scanner scanner = new Scanner(source); List<Token> tokens = scanner.scanTokens(); // For now, just print the tokens. for (Token token : tokens) { System.out.println(token); } } static void error(int line, String message) { report(line, "", message); } private static void report(int line, String where, String message) { System.err.println( "[line " + line + "] Error" + where + ": " + message); hadError = true; } } 复制代码
枚举类
enum TokenType { LEFT_PAREN, RIGHT_PAREN, LEFT_BRACE, RIGHT_BRACE, COMMA, DOT, MINUS, PLUS, SEMICOLON, SLASH, STAR, // One or two character tokens. BANG, BANG_EQUAL, EQUAL, EQUAL_EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL, // Literals. IDENTIFIER, STRING, NUMBER, // Keywords. AND, CLASS, ELSE, FALSE, FUN, FOR, IF, NIL, OR, PRINT, RETURN, SUPER, THIS, TRUE, VAR, WHILE, EOF } 复制代码
扫描器
import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import static com.craftinginterpreters.lox.TokenType.*; class Scanner { private static final Map<String, TokenType> keywords; static { keywords = new HashMap<>(); keywords.put("and", AND); keywords.put("class", CLASS); keywords.put("else", ELSE); keywords.put("false", FALSE); keywords.put("for", FOR); keywords.put("fun", FUN); keywords.put("if", IF); keywords.put("nil", NIL); keywords.put("or", OR); keywords.put("print", PRINT); keywords.put("return", RETURN); keywords.put("super", SUPER); keywords.put("this", THIS); keywords.put("true", TRUE); keywords.put("var", VAR); keywords.put("while", WHILE); } private final String source; private final List<Token> tokens = new ArrayList<>(); // private final List<Token> tokens = new ArrayList<>(); private int start = 0; private int current = 0; private int line = 1; Scanner(String source) { this.source = source; } List<Token> scanTokens() { while (!isAtEnd()) { // We are at the beginning of the next lexeme. start = current; scanToken(); } tokens.add(new Token(EOF, "", null, line)); return tokens; } private void scanToken() { char c = advance(); switch (c) { case '(': addToken(LEFT_PAREN); break; case ')': addToken(RIGHT_PAREN); break; case '{': addToken(LEFT_BRACE); break; case '}': addToken(RIGHT_BRACE); break; case ',': addToken(COMMA); break; case '.': addToken(DOT); break; case '-': addToken(MINUS); break; case '+': addToken(PLUS); break; case ';': addToken(SEMICOLON); break; case '*': addToken(STAR); break; case '!': addToken(match('=') ? BANG_EQUAL : BANG); break; case '=': addToken(match('=') ? EQUAL_EQUAL : EQUAL); break; case '<': addToken(match('=') ? LESS_EQUAL : LESS); break; case '>': addToken(match('=') ? GREATER_EQUAL : GREATER); break; case '/': if (match('/')) { // A comment goes until the end of the line. while (peek() != '/n' && !isAtEnd()) advance(); } else { addToken(SLASH); } break; case ' ': case '/r': case '/t': // Ignore whitespace. break; case '/n': line++; break; case '"': string(); break; case 'o': if (peek() == 'r') { addToken(OR); } break; default: if (isDigit(c)) { number(); } else if (isAlpha(c)) { identifier(); } else { Lox.error(line, "Unexpected character."); } break; } } private void identifier() { while (isAlphaNumeric(peek())) advance(); // See if the identifier is a reserved word. String text = source.substring(start, current); TokenType type = keywords.get(text); if (type == null) type = IDENTIFIER; addToken(type); } private void number() { while (isDigit(peek())) advance(); // Look for a fractional part. if (peek() == '.' && isDigit(peekNext())) { // Consume the "." advance(); while (isDigit(peek())) advance(); } addToken(NUMBER, Double.parseDouble(source.substring(start, current))); } private char peekNext() { if (current + 1 >= source.length()) return '/0'; return source.charAt(current + 1); } private boolean isAlpha(char c) { return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_'; } private boolean isAlphaNumeric(char c) { return isAlpha(c) || isDigit(c); } private void string() { while (peek() != '"' && !isAtEnd()) { if (peek() == '/n') line++; advance(); } // Unterminated string. if (isAtEnd()) { Lox.error(line, "Unterminated string."); return; } // The closing ". advance(); // Trim the surrounding quotes. String value = source.substring(start + 1, current - 1); addToken(STRING, value); } private boolean match(char expected) { if (isAtEnd()) return false; if (source.charAt(current) != expected) return false; current++; return true; } private char peek() { if (isAtEnd()) return '/0'; return source.charAt(current); } private boolean isDigit(char c) { return c >= '0' && c <= '9'; } private boolean isAtEnd() { return current >= source.length(); } private char advance() { current++; return source.charAt(current - 1); } private void addToken(TokenType type) { addToken(type, null); } private void addToken(TokenType type, Object literal) { String text = source.substring(start, current); tokens.add(new Token(type, text, literal, line)); } } 复制代码
下面一张是文法分析,我们将构造词法分析树,用递归向下的方式生产语法树。