Parser Design Decisions
The Vondel parser is designed to parse the Vondel programming language, which is built on top of the Monkey language. The parser follows the Pratt parsing approach to handle expressions and statements in the program.
Pratt Parser Explanation
Pratt parsing, also known as TDOP, is a parsing technique that uses a Recursive Descent Approach to parse expressions based on their precedence and associativity.
The main idea behind Pratt parsing is to associate each token or operator with two parsing functions: a prefix parsing function and an infix parsing function. The prefix parsing function is responsible for parsing expressions that start with the token, while the infix parsing function handles expressions that involve the token as an operator.
Let's illustrate how that bitch works with this simple expression (-5 + 3) * 2
-
Tokenization: The expression is tokenized as follows:
(,-,5,+,3,),*,2 -
Precedence and Associativity: Let's assign the following precedence levels:
*(highest),+(lower),-(lower) -
Prefix Parsing: The prefix parsing function for the
(token creates a grouping expression. -
Prefix Parsing for
-: The parser encounters the-token and checks if it's a prefix operator. Since it is, the parser creates a unary expression for it. -
Infix Parsing: The parser encounters the
)token and skips it since it doesn't have an associated infix parsing function. -
Right Binding Power: The right binding power for
*is the same as its precedence level. -
Right-hand Side Expression: The parser invokes the infix parsing function for
*and creates a binary expression with the*operator and the right-hand side expression. -
Recursive Descent: The parser examines the next token (
2) and creates an integer expression for it. -
Parsing Complete: The parsing process is complete, resulting in a fully parsed expression.
Here's the visualization of the Pratt parsing process for the expression (-5 + 3) * 2:
*
/ \
+ 2
/ \
- 3
|
5
Error Handling
The parser uses the anyhow crate for error handling. It defines a custom ParserError enum to represent different parsing errors that can occur.
The ErrorWithCtx struct is used to associate the error message with the corresponding context in the source code.
Precedence
The parser defines the Precedence enum to represent the precedence levels of different operators in the language.
The precedence_of function assigns a precedence level to each token type used in Pratt parsing.
Statement Types
The parser defines the StatementType enum to represent different types of statements in the program.
This includes Let statements for variable declarations, Return statements, Expression statements, and Block statements for code blocks.
Expression
The Expression module provides the definition and functionality for handling different types of expressions in the code.
PrefixOp
The PrefixOp enum represents the prefix operators available in the language, including Bang and Minus.
InfixOp
The InfixOp enum represents the infix operators available in the language, such as Plus, Minus, Asterisk, Slash, Equal, NotEqual, LessThan, and GreaterThan. It provides methods to retrieve the operator as a string representation.
Expression
The Expression enum represents various types of expressions, including Identifier, Integer, Prefix, Infix, Boolean, If, FunctionLiteral, and Call. It also provides methods to get the type of the expression as a string and constructors for creating specific types of expressions.
Constructors
new_ident(ident: &String): Creates a newIdentifierexpression with the given identifier.new_integer(int: &String) -> Result<Self>: Creates a newIntegerexpression with the parsed integer value from the input string. Returns an error if the parsing fails.new_prefix(op: &TokenType, exp: Expression) -> Result<Self>: Creates a newPrefixexpression with the specified operator and right-hand side expression. Returns an error if the operator is not allowed.new_infix(left: Expression, op: &TokenType, right: Expression) -> Result<Self>: Creates a newInfixexpression with the specified left-hand side, operator, and right-hand side expressions. Returns an error if the operator is not allowed.new_boolean(t: &TokenType) -> Result<Self>: Creates a newBooleanexpression with the specified boolean value. Returns an error if the boolean token is not allowed.new_if(cond: Expression, cons: StatementType, alt: Option<StatementType>): Creates a newIfexpression with the condition, consequence, and optional alternative statements.new_function(params: Vec<Expression>, block: StatementType): Creates a newFunctionLiteralexpression with the specified parameters and block statement.new_call(func: Expression, args: Vec<Expression>): Creates a newCallexpression with the specified function expression and argument expressions.
Program Structure
The parser uses the Program struct to store the parsed statements and errors. The get_deez_program method is responsible for parsing the entire program and populating the Program struct.
Lexer Integration
The parser expects a slice of TokenType tokens as input, which are generated by a lexer. This allow us to do some kind of decoupling between lexing and parsing.
Usage
To use the parser follow these steps:
-
Create a vector of tokens by hand or using a lexer.
#![allow(unused)] fn main() { let input = "let tubias = 123;" let toks = Lexer::new(input).get_deez_tokens(); } -
Create a new
Parserand pass this tokens as parameters.#![allow(unused)] fn main() { let mut parser = Parser::new(&toks); let program = parser.get_deez_program(); } -
Do whatever you want with these statements or errors.
#![allow(unused)] fn main() { // Access the parsed statements and handle errors if needed for statement in program.statements { // Process each statement... } // Handle parsing errors, if any for error in program.errors { // Handle each error... } }
Testing
The approach for testing this parser was decoupling it from the lexer. So that future internal changes of the current lexer can't interfere with our parse,
unless we change the contract between then that is TokenType
Conclusion
In conclusion, the Vondel parser, based on the Pratt parsing approach, provides efficient and accurate parsing of expressions and statements in the Vondel programming language. With its error handling capabilities, well-defined precedence levels, and support for various statement types, the parser offers a solid foundation for the development and expansion of the Vondel language, ensuring reliable and effective parsing of code structures.