Vondel

Build Status Issues Pull requests License

A simple computer architecture, ISA, Assembler and Interpreter build with Rust.

Usage

First off all, we need to build our binaries using:

cargo build -r --bins

Then for testing the demonstration programs we need to assemble it and run with our uarch.

We can accomplish this by just using the following commands:

Running Factorial

cargo run -r --bin assembler -- -i programs/factorial_hardware.asm && \
cargo run -r --bin uarch -- --rom a.rom --ram a.ram

Running Power

cargo run -r --bin assembler -- -i programs/power_hardware.asm && \
cargo run -r --bin uarch -- --rom a.rom --ram a.ram

Running CSW

cargo run -r --bin assembler -- -i programs/csw.asm && \
cargo run -r --bin uarch -- --rom a.rom --ram a.ram

Running Div

cargo run -r --bin assembler -- -i programs/div_hardware.asm && \
cargo run -r --bin uarch -- --rom a.rom --ram a.ram

File Structure

The Vondel file structure follows a convetional Rust program, with a src folder that handles all of our source code.

More info on the chapter File Structure

Interpreter

The Vondel Interpreter was built on top of the Monkey Language using the book Writing an Interpreter with GO but on Rust.

More info on the chapter Interpreter

Assembler

The Vondel Assembler was based on RISC V assemblers with our custom taste for Mnemonics and operations that our Uarch can accept

More info on the chapter Assembler

Microarchitecture

The basic idea behind Vondel's design is to follow some tips from the book Structured Computer Organization - Andrew S. Tanenbaum while implementing the ability to perform operations even if the main clock is in a non-edge level in order to reduce clock cycles.

More info on the chapter Microarchitecture

File Structure of Vondel

The Vondel file structure follows a convetional Rust program, with a src folder that handles all of our source code.

vondel
├── Cargo.lock
├── Cargo.toml
├── docs
│  ├── book.toml
│  ├── diagrams
│  └── src
├── LICENSE
├── programs
│  ├── csw.asm
│  ├── div.asm
│  ├── div_hardware.asm
│  ├── factorial.asm
│  ├── factorial_hardware.asm
│  ├── power.asm
│  └── power_hardware.asm
├── README.md
└── src
   ├── assembler
   ├── assembler.rs
   ├── bin
   ├── inter
   ├── inter.rs
   ├── lib.rs
   ├── main.rs
   └── uarch

Comparision with the Proposed File Structure

The main difference between our project and the proposed one it's the amount of complexity.

For example, our assembler has it's own folder, where each step has it's own file, instead of a single file to handle lexing, parsing and enconding.

Our memory, alu, clock and computer are on a single module called uarch, this module contains all the business logic for handling instructions.

The disk module was embedded on both uarch and assembler, that are on the bin folder.

Docs

The place where all documentation of this program lives

Here we split chapters into folders where mdbook, the software responsible for this site, can generate it for us automatically

src
├── assembler
│  ├── comparision.md
│  ├── evaluator.md
│  ├── lexer.md
│  ├── parser.md
│  ├── README.md
│  └── specs.md
├── chapter_1.md
├── files.md
├── interpreter
│  ├── environment.md
│  ├── evaluator
│  ├── language.md
│  ├── lexer.md
│  ├── object.md
│  ├── parser.md
│  └── README.md
├── README.md
├── SUMMARY.md
└── uarch
   ├── implementations.md
   ├── README.md
   └── uinstruction.md

Programs

The place where lives all of our assembly code for this lecture.

The ones that has _hardware implements functions that are built on hardware on the uarch, so they are faster than iteractive versions.

.
├── csw.asm
├── div.asm
├── div_hardware.asm
├── factorial.asm
├── factorial_hardware.asm
├── power.asm
└── power_hardware.asm

Src

The source code for generating our interpreter, assembler and uarch

Each module has a unique folder that has all files for creating a library crate.

The bin folder is responsible for creating our binaries that will be executed on someone's PC

.
├── assembler
│  ├── cli.rs
│  ├── evaluator.rs
│  ├── lexer.rs
│  ├── parser.rs
│  ├── sections.rs
│  └── tokens.rs
├── assembler.rs
├── bin
│  ├── assembler.rs
│  ├── interpreter.rs
│  └── uarch.rs
├── inter
│  ├── ast
│  ├── ast.rs
│  ├── cli.rs
│  ├── doc
│  ├── environment.rs
│  ├── evaluator
│  ├── evaluator.rs
│  ├── lexer.rs
│  ├── object.rs
│  ├── repl.rs
│  └── tokens.rs
├── inter.rs
├── lib.rs
├── main.rs
└── uarch
   ├── alu.rs
   ├── cli.rs
   ├── mem.rs
   └── mod.rs

Interpreter

The Vondel Interpreter was built on top of the Monkey Language using the book Writing an Interpreter with GO but on Rust 😂

Monkey is a simple programming language designed for educational purposes, focusing on simplicity and ease of understanding. This interpreter allows you to run Monkey code and see the results in real-time.

Language Specs

The Vondel Language is simpler than a common Monkey Lang implementation because we need to implement a custom evaluator using our own microarchitecture. And doing Strings, Structs and Vectors on this UArch will not be a good experience for us.

Basically, what we implemented in the Vondel Language, which includes integers, booleans, functions, and lambda functions, you have a foundation for building a variety of programs.

More info on the sup-chapter Language Specifications.

Lexer

Our lexer don't allow UTF-8 enconding for processing text and just rely on common ASCII.

It's a simple lexer that use a lot of String allocations instead of string slices for performance optimizations. It also uses one of Rust most powerfull features enums for processing non Identifier tokens so that we can rely on Rust strong type system.

More info on the sup-chapter Lexer

Parser

The parser for the Vondel language utilizes a powerful parsing technique called Pratt Parsing. It allows for efficient and flexible parsing of expressions with varying levels of precedence so that we can parse complex expressions without too much hassle.

More info on the sup-chapter Parser

Object

Because Vondel is a dynamic language we use Objects for an intermediate representation and evaluations

More info on the sup-chapter Object

Environment

It allows for the storage and retrieval of variables and their corresponding values in a flexible and efficient manner. The module supports nested environments, enabling scoping and variable lookup in outer environments.

More info on the sup-chapter Environment

Evaluator

For the evaluation module we just define a EvaluationError enum for handling most evaluation and runtime errors. A Evaluator trait for creating custom evaluators and evaluate_buffer fn that handles receives anything that implements Evaluator, a Program and return a prints the result of this evaluation

We also define two differente evaluators for our users:

  1. Rust Evaluator - that uses Rust for doing all evaluations
  2. Custom Evaluator - that uses our custom microarchitecture for evaluting these ASTs

More info on the sup-chapter Evaluator

Vondel Language Specification

This specification outlines the syntax and behavior of the Vondel language, a simple programming language that supports arithmetic operations, functions, and lambda functions.

Basics

  • Vondel is a dynamically-typed language, meaning that variable types are inferred at runtime.
  • All statements in Vondel end with a semicolon (;).
  • Vondel is whitespace insensitive, meaning that spaces and line breaks are ignored except where necessary for separating tokens.

Data Types

Vondel Suported Data Types:

  • Integers: Represented by whole numbers without fractional or decimal parts.
  • Booleans: Represented by the keywords true and false.
  • Null: Represented by the keyword null.

Variables

Variables in Vondel are dynamically typed and declared using the let keyword.

let x = 10;
let name = "Alice";
let flag = true;

Arithmetic Operations

Vondel supports the following arithmetic operations:

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Modulo: %
  let x = 10 + 5;
  let y = x * 2 - 3;
  let z = (x + y) / 2;

Functions

Vondel allows the definition and invocation of functions.

Function Definition

Functions are defined using the fn keyword followed by the parameter list in parentheses and the function body in curly braces.

let add = fn(x, y) {
  return x + y;
};

Function Invocation

To invoke a function, use the function name followed by arguments in parentheses.

let result = add(3, 4); // result = 7

Return Statement

The return statement is used to exit a function and optionally return a value.

let multiply = fn(x, y) {
  return x * y;
};

Lambda Functions

Vondel supports lambda functions, also known as anonymous functions or function literals.

Lambda Function Definition

Lambda functions are defined using the fn keyword followed by the parameter list in parentheses and the function body in curly braces. They can be assigned to variables.

let multiply = fn(x, y) {
  return x * y;
};

let square = fn(x) {
  return multiply(x, x);
};

Lambda Function Invocation

To invoke a lambda function, use the function name followed by arguments in parentheses.

let result = square(5); // result = 25

Higher-Order Functions

Vondel supports higher-order functions, which are functions that can accept other functions as arguments or return functions as results.

let applyFunc = fn(func, x, y) {
  return func(x, y);
};

let result = applyFunc(multiply, 3, 4); // result = 12

Examples

let add = fn(x, y) {
  return x + y;
};

let result = add(3, 4); // result = 7

let multiply = fn(x, y) {
  return x * y;
};

let square = fn(x) {
  return multiply(x, x);
};

let area = square(5); // area = 25
let fibonacci = fn(x) {
  if (x == 0) {
    0
  } else {
    if (x == 1) {
      1
    } else {
      fibonacci(x - 1) + fibonacci(x - 2);
    }
  }
};

fibonacci(9) // 34

Limitations

The Vondel language described in this specification is a simplified version of a programming language and has several limitations, including but not limited to:

  • Limited data types and operations.
  • Lack of control flow statements such as loops and conditionals.
  • Limited built-in functions and standard library.
  • The language is intentionally designed to be minimalistic and educational, focusing on core concepts and syntax.

References

The Monkey programming language: monkey lang site

Lexer Design Decision

The lexer provided is responsible for tokenizing input strings into a sequence of tokens. This README explains the design decisions made in implementing the lexer.

Overall Design

The lexer is implemented as a struct called Lexer with the following fields:

  • ch: Represents the current character being processed.
  • input: Represents the input string as a vector of bytes.
  • read_position: Tracks the current position while reading the input.
  • position: Tracks the position of the current character being processed.

The Lexer struct also includes several methods to perform different tasks:

  • new: Initializes a new Lexer instance with the given input string.
  • skip_whitespace: Skips whitespace characters until a non-whitespace character is encountered.
  • peek_char: Returns the next character in the input without consuming it.
  • read_char: Reads the next character from the input and updates the lexer's internal state.
  • next_token: Returns the next token from the input.
  • parse_operator: Parses an operator token based on the current and next characters in the input.
  • tokenizer: Tokenizes the current character based on its type.
  • read_name: Reads an identifier token from the input.
  • read_number: Reads an integer number token from the input.
  • get_deez_toks: Returns a vector of all tokens found in the input.

The design follows a simple and straightforward approach to tokenizing the input string.

Tokenization

The tokenizer method is responsible for tokenizing the current character based on its type. It uses pattern matching to identify different types of characters and returns the corresponding token type.

  • Whitespace characters are skipped using the skip_whitespace method.
  • Single-character tokens like commas, semicolons, parentheses, and squirlies are identified directly.
  • Operator tokens are parsed by the parse_operator method, which takes into account both single and double-character operators.
  • Identifiers are recognized by checking if the current character is alphabetical or an underscore. The read_name method is called to read the complete identifier.
  • Integer numbers are recognized by checking if the current character is a digit. The read_number method is called to read the complete number.
  • Any other character is considered illegal and is represented by an Illegal token type.

Usage

To use the lexer, follow these steps:

  1. Import the Lexer struct and the TokenType enum from the appropriate module or file.

  2. Create a new instance of the Lexer by calling the new method and passing the input string as a parameter.

    #![allow(unused)]
    fn main() {
    let input = String::from("let x = 5 + 3;");
    let mut lexer = Lexer::new(input);
    }
  3. Retrieve tokens from the lexer by calling the next_token method. It returns the next token in the input string.

    #![allow(unused)]
    fn main() {
    let token = lexer.next_token();
    
    }
  4. Continue calling next_token to retrieve subsequent tokens until you reach the end of the input. The end of input is indicated by the TokenType::Eof token.

    #![allow(unused)]
    fn main() {
    loop {
        let token = lexer.next_token();
        if token == TokenType::Eof {
            break;
            }
        // Process the token as needed
    }
    }
  5. Or you can use get_deez_toks method to return all tokens at once as a Vec<TokenType>

    #![allow(unused)]
    fn main() {
    let all_toks = lexer.get_deez_toks();
    }
  6. Use the retrieved tokens for further processing or analysis based on your specific requirements.

Please note that this example assumes you have the appropriate code structure and dependencies in place for the lexer to work correctly. Adjust the code snippets based on your specific implementation.

Testing

The lexer includes a set of unit tests implemented using the #[cfg(test)] attribute and the mod tests block. These tests ensure the correctness of the lexer implementation by verifying the tokenization of different input strings.

The tests cover various scenarios, including skipping whitespace, reading identifiers and numbers, handling operators, and tokenizing complex input strings with multiple tokens.

Conclusion

The lexer design follows a modular and intuitive approach to tokenize input strings. It provides flexibility to extend the lexer's functionality if required. The unit tests provide confidence in the correctness of the lexer implementation.

Parser Design Decisions

The Vondel parser is designed to parse the Vondel programming language, which is built on top of the Monkey language. The parser follows the Pratt parsing approach to handle expressions and statements in the program.

Pratt Parser Explanation

Pratt parsing, also known as TDOP, is a parsing technique that uses a Recursive Descent Approach to parse expressions based on their precedence and associativity.

The main idea behind Pratt parsing is to associate each token or operator with two parsing functions: a prefix parsing function and an infix parsing function. The prefix parsing function is responsible for parsing expressions that start with the token, while the infix parsing function handles expressions that involve the token as an operator.

Let's illustrate how that bitch works with this simple expression (-5 + 3) * 2

  1. Tokenization: The expression is tokenized as follows: (, -, 5, +, 3, ), *, 2

  2. Precedence and Associativity: Let's assign the following precedence levels: * (highest), + (lower), - (lower)

  3. Prefix Parsing: The prefix parsing function for the ( token creates a grouping expression.

  4. Prefix Parsing for -: The parser encounters the - token and checks if it's a prefix operator. Since it is, the parser creates a unary expression for it.

  5. Infix Parsing: The parser encounters the ) token and skips it since it doesn't have an associated infix parsing function.

  6. Right Binding Power: The right binding power for * is the same as its precedence level.

  7. Right-hand Side Expression: The parser invokes the infix parsing function for * and creates a binary expression with the * operator and the right-hand side expression.

  8. Recursive Descent: The parser examines the next token (2) and creates an integer expression for it.

  9. Parsing Complete: The parsing process is complete, resulting in a fully parsed expression.

Here's the visualization of the Pratt parsing process for the expression (-5 + 3) * 2:

       *
      / \
     +   2
    / \
   -   3
    |
   5

Error Handling

The parser uses the anyhow crate for error handling. It defines a custom ParserError enum to represent different parsing errors that can occur. The ErrorWithCtx struct is used to associate the error message with the corresponding context in the source code.

Precedence

The parser defines the Precedence enum to represent the precedence levels of different operators in the language. The precedence_of function assigns a precedence level to each token type used in Pratt parsing.

Statement Types

The parser defines the StatementType enum to represent different types of statements in the program. This includes Let statements for variable declarations, Return statements, Expression statements, and Block statements for code blocks.

Expression

The Expression module provides the definition and functionality for handling different types of expressions in the code.

PrefixOp

The PrefixOp enum represents the prefix operators available in the language, including Bang and Minus.

InfixOp

The InfixOp enum represents the infix operators available in the language, such as Plus, Minus, Asterisk, Slash, Equal, NotEqual, LessThan, and GreaterThan. It provides methods to retrieve the operator as a string representation.

Expression

The Expression enum represents various types of expressions, including Identifier, Integer, Prefix, Infix, Boolean, If, FunctionLiteral, and Call. It also provides methods to get the type of the expression as a string and constructors for creating specific types of expressions.

Constructors

  • new_ident(ident: &String): Creates a new Identifier expression with the given identifier.
  • new_integer(int: &String) -> Result<Self>: Creates a new Integer expression with the parsed integer value from the input string. Returns an error if the parsing fails.
  • new_prefix(op: &TokenType, exp: Expression) -> Result<Self>: Creates a new Prefix expression with the specified operator and right-hand side expression. Returns an error if the operator is not allowed.
  • new_infix(left: Expression, op: &TokenType, right: Expression) -> Result<Self>: Creates a new Infix expression with the specified left-hand side, operator, and right-hand side expressions. Returns an error if the operator is not allowed.
  • new_boolean(t: &TokenType) -> Result<Self>: Creates a new Boolean expression with the specified boolean value. Returns an error if the boolean token is not allowed.
  • new_if(cond: Expression, cons: StatementType, alt: Option<StatementType>): Creates a new If expression with the condition, consequence, and optional alternative statements.
  • new_function(params: Vec<Expression>, block: StatementType): Creates a new FunctionLiteral expression with the specified parameters and block statement.
  • new_call(func: Expression, args: Vec<Expression>): Creates a new Call expression with the specified function expression and argument expressions.

Program Structure

The parser uses the Program struct to store the parsed statements and errors. The get_deez_program method is responsible for parsing the entire program and populating the Program struct.

Lexer Integration

The parser expects a slice of TokenType tokens as input, which are generated by a lexer. This allow us to do some kind of decoupling between lexing and parsing.

Usage

To use the parser follow these steps:

  1. Create a vector of tokens by hand or using a lexer.

    #![allow(unused)]
    fn main() {
    let input =  "let tubias = 123;"
    let toks = Lexer::new(input).get_deez_tokens();
    }
  2. Create a new Parser and pass this tokens as parameters.

    #![allow(unused)]
    fn main() {
    let mut parser = Parser::new(&toks);
    let program = parser.get_deez_program();
    }
  3. Do whatever you want with these statements or errors.

    #![allow(unused)]
    fn main() {
    // Access the parsed statements and handle errors if needed
    for statement in program.statements {
        // Process each statement...
    }
    
    // Handle parsing errors, if any
    for error in program.errors {
        // Handle each error...
    }
    }

Testing

The approach for testing this parser was decoupling it from the lexer. So that future internal changes of the current lexer can't interfere with our parse, unless we change the contract between then that is TokenType

Conclusion

In conclusion, the Vondel parser, based on the Pratt parsing approach, provides efficient and accurate parsing of expressions and statements in the Vondel programming language. With its error handling capabilities, well-defined precedence levels, and support for various statement types, the parser offers a solid foundation for the development and expansion of the Vondel language, ensuring reliable and effective parsing of code structures.

Object

The Object module in the Vondel language is designed to handle various types of objects used in the language, including integers, booleans, return values, null values, and functions. It provides a Object enum with associated data for each object type. The module defines methods to inspect objects and retrieve their string representation. The Object enum implements the Debug, PartialEq, and Clone traits for debugging, equality comparison, and cloning, respectively. Additionally, it provides a type_as_string() method to retrieve the type of the object as a string for error handling.

Object Enum

The Object enum represents the different types of objects in the Vondel language. It has the following variants:

  • Integer(i64): Represents an integer value.
  • Boolean(bool): Represents a boolean value.
  • ReturnValue(Box<Object>): Represents a return value from a function.
  • Null: Represents a null value.
  • Function { params: Vec<Expression>, body: StatementType, env: Rc<RefCell<Environment>> }: Represents a function object, storing the function's parameters, body, and environment.

Object Methods

The Object module provides the following methods:

  • inspect() -> String: Returns a string representation of the object.
  • type_as_string() -> &'static str: Returns the type of the object as a string for error handling.

Display Formatting

The Object enum implements the fmt::Display trait to provide a formatted string representation for display purposes. The Display implementation formats the object based on its variant, converting it to a string representation.

#![allow(unused)]
fn main() {
impl fmt::Display for Object {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        // Format the object based on its variant
        // ...
    }
}

}

Environment

The Environment module provides functionality for storing variables and their corresponding values.

Features

  • Variable Storage: The Environment struct uses a HashMap to store variables and their values.
  • Nested Environments: The module supports nested environments, allowing for scoping and variable lookup in outer environments.
  • Argument Matching: The set_arguments_to_env function sets arguments to the environment, matching them with corresponding parameters.
  • Error Handling: The module uses the anyhow crate for error handling, providing detailed error messages for various scenarios.

Usage

To create a new environment, use the Environment::new() function. To create an environment with an outer environment, use Environment::new_with_outer(outer).

To set arguments to the environment, use the set_arguments_to_env(args, params) function, providing a vector of arguments and parameters.

To retrieve the value of a variable, use the get(name) function, providing the variable name. It automatically looks up the variable in outer environments if it's not found in the current environment.

To set the value of a variable, use the set(name, value) function, providing the variable name and value.

Examples

use std::{cell::RefCell, collections::HashMap, rc::Rc};
use anyhow::{bail, Result};

use crate::inter::{ast::Expression, evaluator::EvaluationError};

use super::object::Object;

/// Represents an environment that stores variables and their corresponding values.
#[derive(Debug, PartialEq, Clone)]
pub struct Environment {
    store: HashMap<String, Object>,
    outer: Option<Rc<RefCell<Environment>>>,
}

impl Environment {
    // ... Implementation details ...
}

// Example usage
fn main() {
    let mut environment = Environment::new();

    environment.set(
        &Expression::Identifier("x".to_string()),
        Object::Integer(42),
    ).unwrap();

    let result = environment.get("x").unwrap();
    println!("Value of 'x': {}", result);
}

Evaluator

The evaluator module provides the definition and functionality for evaluating code using different evaluators. This document outlines the design decisions and considerations made in the implementation of the evaluator module.

EvaluationError Enum

The EvaluationError enum represents possible errors that can occur during evaluation. It includes various error variants such as MissingIntegerToInvert, MismatchedTypesInfix, UnallowedBooleanComparisonOperator, IdentifierNotFound, ExpectedIdentifier, and WrongNumberOfArguments. These errors cover different scenarios where evaluation can fail and provide descriptive error messages.

Evaluator Trait

The Evaluator trait defines the behavior of an evaluator. It includes a single method, eval, which takes an AST node and an environment and returns the resulting Object or an error if evaluation fails. Implementations of this trait provide the specific evaluation logic for different languages or custom evaluators.

Evaluator Implementations

The evaluator module includes two sub-modules, custom and rust , which contain different implementations of the Evaluator trait. These implementations can be used to evaluate code written in different languages or using custom evaluation logic.

evaluate_buffer Function

The evaluate_buffer function is a utility function that simplifies the evaluation process. It takes a boxed trait object implementing the Evaluator trait and an input string to evaluate. The function performs the lexing, parsing, and evaluation steps, and prints the resulting object or error message. It can be used as a convenient way to evaluate code using different evaluators.

Usage

To use our evalutor follow theses steps:

  1. Create a Custom Custom evaluator or just use our custom and rust builtin evaluators
    #![allow(unused)]
    fn main() {
    use my_evaluator::{Evaluator, evaluate_buffer};
    // Define a custom evaluator
    struct MyEvaluator;
    impl Evaluator for MyEvaluator {
        // Implement the `eval` method
        // ...
    }
    }
  2. Create a new Evaluator instance
    #![allow(unused)]
    fn main() {
    let my_eval = Box::new(MyEvaluator::new());
    let rust_eval = Box::new(RustEvaluator::new());
    let custom_eval = Box::new(CustomEvaluator::new();
    }
  3. Use evaluater_buffer fn or create your own logic for parsing Program
    #![allow(unused)]
    fn main() {
    let input  = String::from("777 * 777");
    println!("my_eval res: {}",  evaluate_buffer(my_eval, input)?;
    println!("rust_eval res: {}",  evaluate_buffer(rust_eval, input)?;
    println!("custom_eval res: {}",  evaluate_buffer(custom_eval, input)?;
    }

Testing

The Evaluator module provides a set of tests to ensure the correctness of the evaluators and their functionalities. The tests cover various scenarios, including error handling, function calls, and decoupling from the parser and lexer.

Conclusion

In conclusion, the Evaluator module plays a crucial role in interpreting and evaluating Rust code. It provides an essential component for executing programs, performing expressions, handling errors, and enabling function calls. By decoupling from the parser and lexer, the evaluators ensure flexibility and compatibility with various parsing and lexing implementations. The comprehensive test suite included in the module verifies the correctness of the evaluators, covering error handling, function calls, and specific behaviors of the RustEvaluator and CustomEvaluator implementations. With the Evaluator module, developers can confidently evaluate Rust code and rely on its robustness and accuracy in producing the expected results.

Rust Evaluator

The RustEvaluator is an implementation of the Evaluator trait specifically designed to evaluate Rust code. This document outlines the design decisions and considerations made in the implementation of the RustEvaluator struct.

RustEvaluator Struct

The RustEvaluator struct represents an evaluator for Rust code. It provides methods for evaluating different types of expressions and statements in Rust.

Usage

To use the RustEvaluator follow these steps:

  1. Import the necessary modules and structs in your Rust file:

    #![allow(unused)]
    fn main() {
    use std::{cell::RefCell, rc::Rc};
    use anyhow::{Result};
    use crate::inter::{
        ast::expression::{InfixOp, PrefixOp},
        environment::Environment,
        object::Object,
        // Import other necessary modules as needed
    };
    use super::ast::*;
    // Import the RustEvaluator module
    use super::RustEvaluator;
    }
  2. Create an instance of the RustEvaluator:

    #![allow(unused)]
    fn main() {
    let evaluator = RustEvaluator::new();
    }
  3. Set up the environment and input for evaluation:

    #![allow(unused)]
    fn main() {
    let mut environment = Environment::new();
    let input = "1 + 2";
    }
  4. Pass the program to the eval() method of the RustEvaluator instance to evaluate it:

    #![allow(unused)]
    fn main() {
    let result = evaluator.eval(&program, &mut environment);
    }
  5. Handle the evaluation result using Result:

    #![allow(unused)]
    fn main() {
    match result {
        Ok(object) => {
            // Handle the evaluated object
            // ...
        }
        Err(error) => {
            // Handle the evaluation error
            // ...
        }
    }
    }
  6. Repeat steps 4 and 5 as needed to evaluate multiple programs or statements.

By following these steps, you can integrate the RustEvaluator into your Rust project and evaluate Rust code.

Feel free to modify the code snippets or the instructions to fit your specific use case.

Methods

new()

The new() method creates a new instance of the RustEvaluator.

map_boolean()

The map_boolean() method maps a boolean value to the corresponding Object value used in the evaluator.

eval_bang_operator()

The eval_bang_operator() method evaluates the logical NOT (!) operator on the right operand.

eval_minus_operator()

The eval_minus_operator() method evaluates the arithmetic minus (-) operator on the right operand.

eval_prefix_expression()

The eval_prefix_expression() method evaluates a prefix expression (e.g., -, !) by applying the corresponding operator on the right operand.

eval_infix_expression()

The eval_infix_expression() method evaluates an infix expression (e.g., +, -, *, /, <, >, ==, !=) by performing the operation on the left and right operands.

is_truthy()

The is_truthy() method checks if an object is truthy, meaning it evaluates to true.

eval_if_expression()

The eval_if_expression() method evaluates an if expression by evaluating the condition and executing the appropriate branch based on the result.

eval_let_statement()

The eval_let_statement() method evaluates a let statement by evaluating the value expression and storing it in the environment with the specified name.

eval_expressions()

The eval_expressions() method evaluates a list of expressions within the given environment.

eval_fn_call()

The eval_fn_call() method evaluates a function call expression by evaluating the function expression and the argument expressions, and executing the function with the provided arguments.

eval_expression()

The eval_expression() method evaluates an expression by dispatching to the appropriate evaluation method based on the expression type.

eval_block_statements()

The eval_block_statements() method evaluates a block of statements by iterating over each statement and evaluating them in order.

eval_statements()

The eval_statements() method evaluates a statement by dispatching to the appropriate evaluation method based on the statement type.

Implementation of Evaluator Trait

The RustEvaluator struct implements the Evaluator trait, which defines the behavior of an evaluator. The eval() method is implemented to evaluate a program by iterating over each statement and evaluating them in order.

Conclusion

The RustEvaluator provides a comprehensive implementation of an evaluator for Rust code. It handles different types of expressions and statements and provides accurate evaluation results.The design decisions made in the implementation ensure flexibility and extensibility for future enhancements or modifications.

Assembler

The Vondel Assembler was based on RISC V assemblers with our custom taste for Mnemonics and operations that our Uarch can accept

This assembler produce 2 files a .ram and a .rom that can be used for our microarchitecture needs where:

  • .ram: It's the ram dump produced by .data sections
  • .rom: It's the firmware made by .text sections

Usage

To use the Vondel Assembler you can open your terminal and type

cargo run -r assembler --  -i input.asm -o output

This commmand will generate lex, parse and evaluate the input.asm and therefore produce output.ram and output.rom for further usage

If the output argument isn't provided the default output file will begin with a

Language Specifications

Our language specs are similar to RISC V, but with some tweaks

If you came from there you'll be in home

.data
    tubias: .word 420
    test: .byte 69
    res: .word 42069

.text
main:
    addi s0 <- test
    lui s1 <- 254
    add a0,a1,a2,a3 <- t1, t2
    beq t1, a0, done


done:
    read s2 <- tubias
    write res <- t1
    halt

More info on the sup-chapter Language Specs

Lexer

Just a common lexer that we'll ignore comments and parse all necessary tokens for further steps

More info on the sup-chapter Lexer

Parser

Here we define our syntax that is described on Language Specs, if any step here fail we should the user the error and context

More info on the sup-chapter Parser

Evaluator

We generate our ROM and RAM for our microarchitecture, it handles the logic of our firmware and all variables that are beeing used

More info on the sup-chapter Evaluator

Comparision

The comparision between our language and assembler with the ones that our teacher proposed at class

The key difference between then are:

  1. Our assembler is easier to extend and define a fine grained syntax than the proposed one.
  2. We have a error handling with more context for the user
  3. We build the microinstruction itself while our teacher use opcode to navegate on the firmware own microinstructions
  4. We support immediates that can speed the process a lot

More info on the sup-chapter Comparision

Registers

The assembler supports 25 registers. These registers can be used for general-purpose operations or not.

RegisterSuggested UsageA BusB BusC Bus
MarMemory AddressOutput
MdrMemory DataInputInputOutput
MbrMemory BufferInput
MbruMemory BufferInput
Mbr2Memory BufferInput
Mbr2uMemory BufferInput
PcProgram CounterInputOutput
CppCall PointerInputInputOutput
LvLocal VariableInputInputOutput
RaReturn AddressInputInputOutput
T0TemporaryInputInputOutput
T1TemporaryInputInputOutput
T2TemporaryInputInputOutput
T3TemporaryInputInputOutput
S0SavedInputInputOutput
S1SavedInputInputOutput
S2SavedInputInputOutput
S3SavedInputInputOutput
S4SavedInputInputOutput
S5SavedInputInputOutput
S6SavedInputInputOutput
A0Function ArgInputInputOutput
A1Function ArgInputInputOutput
A2Function ArgInputInputOutput
A3Function ArgInputInputOutput

Sections

The assembler supports two sections: .text and .data. The sections are defined as follows:

.text Section

The .text section contains the executable instructions.

.data Section

The .data section is used for declaring and initializing data.

Instruction Format

Most vondel instructions in follow the format: opcode dest_regs <- source1, source2. Here's a breakdown of the components:

  • opcode: Specifies the operation to be performed.
  • dest_regs: The registers where the result will be stored.
  • source1 and source2: Registers or immediate values used as operands.

Supported Instructions

The assembler supports the following instructions:

Add

Adds two registers and save the result into 1 up to 20 registers that are allowed in the C_bus

add t0, t1, s0, a2, ra <- a0, a1

Sub

Subtract x - y and store on registers

sub t0, t1, s0, a2, ra <- a0, a1

Mul

Multiplies the value of x and y and store on registers

mul s0, a2, ra <- a0, a1

WARNING: Mul cannot be used with t* registers

Mul2

Multiplies the value of x and y and store on registers

mul s0, a2, ra <- a0, a1

INFO: More performatic and can use t* registers

Div

Divides x by y and store on registers

div t0, t1, s0, a2, ra <- a0, a1

Mod

Gets the remainder of x by y and store on registers

mod t0, t1, s0, a2, ra <- a0, a1

And

Makes a bitwise and with x and y

and t0, t1, s0, a2, ra <- a0, a1

Or

Makes a bitwise or with x and y

or t0, t1, t2, s0, s1, a2, ra <- a0, a1

Xor

Makes a bitwise xor with x and y

xor t0, t1, t2, s0, s1, a2, ra <- a0, a1

Not

Stores the result of !x on the registers

not t0, t1, t2, s0, s1, a2, ra <- a0

It uses only a single register as source

Sll

Stores the result of x << 8 on the registers

sll t0, t1 <- a0

It uses only a single register as source

Sla

Stores the result of x << 1 on the registers

sla t0, t1 <- a0

It uses only a single register as source

Sra

Stores the result of x >> 1 on the registers

sra t0, t1 <- a0

It uses only a single register as source

Read

Load a value from memory on the address addr into a register

read t0, t1, t2 <- addr
read t0, t1, t2 <- 77

Addr can be both a label referencing a variable in .data section or a immediate with value of 0 to 255

Write

Store a value from a register x into memory on address addr

write addr <- t1
write 77 <- ra

It allows only a single register as source

Addr can be both a label referencing a variable in .data section or a immediate with value of 0 to 255

Jal

Jump inconditionally to a label

jal loop

Beq

Branch if equal (jump to a label if x and y are equal)

beq t0, t1, done

Bne

Branch if not equal (jump to a label if x and y are not equal)

bne t0, t1, done

Blt

Branch if less then (jump to a label if x is less then y)

blt t0, t1, done

Bgt

Branch if greater then (jump to a label if x is greater then y)

bgt t0, t1, done

Lui

Load upper immediate imm on registers

lui t0, t1,t2 <- imm
lui t0, t1,t2 <- 77

It takes only a single immediate as parameter

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Addi

Adds a register x and a immediate imm and save this value on registers

addi t0, t1,t2 <- t0, imm
addi t0, t1,t2 <- a2, 77

The first argument must be a register and the second a immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Muli

Multiplies a register x and an immediate imm and save this value on registers

muli t0, t1,t2 <- t0, imm
muli t0, t1,t2 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Divi

Divides a register x by an immediate imm and save this value on registers

divi t0, t1,t2 <- t0, imm
divi t0, t1,t2 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Modi

Get the remainder of a div between a register x by an immediate imm and save this value on registers

modi t0, t1,t2 <- t0, imm
modi t0, t1,t2 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Subi

Subtracts from register x the value of an immediate imm and save this value on registers

subi t0, t1 <- t0, imm
subi t0, t1 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Andi

Makes a bitwise and of register x and an immediate imm, then saves this value on registers

andi t0, t1 <- t0, imm
andi t0, t1 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Ori

Makes a bitwise or of register x and an immediate imm, then saves this value on registers

ori t0, t1 <- t0, imm
ori t0, t1 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Xori

Makes a bitwise xor of register x and an immediate imm, then saves this value on registers

xori t0, t1 <- t0, imm
xori t0, t1 <- a2, 77

The first argument must be a register and the second an immediate

Imm can be both a label referencing a .byte in .data section or an immediate with value of 0 to 255

Mov

Store the value of the register x on the registers

mov t0, t1, t2 <- a1
mov ra, s1, s2, s3 <- t0

It supports only a single register as parameter

Halt

Stop the program execution

halt

Just halt

Data Declaration Instructions (.data section)

  • .byte: Declare a byte-sized data item
  • .word: Declare a word-sized (4 bytes) data item

Labels and Branching

Labels can be defined and used as targets for branching instructions. Here are the guidelines for labels and branching:

  • Labels are defined by placing a colon (:) after the label name (e.g., label:).
  • Branching instructions can use labels as targets (e.g., beq t1, t2, label).

Comments

Comments start with a semicolon (;) or (#) and continue until the end of the line. Comments are ignored during assembly.

Addressing Modes

The assembler supports two addressing modes: immediate and register addressing. Here's how they are used:

  • Immediate values can be specified directly in the instruction.
  • Registers have the ABI form of RISC V

This specification provides a general outline for the assembler language. Additional details, such as specific opcode mappings and assembly directives, can be added as needed.

Lexer

This lexer is specifically designed for a custom microarchitecture and includes RISC-V opcodes. It provides tokenization functionality for assembly code.

Usage

To utilize the Lexer in your project, follow these steps:

  1. Include the lexer code in your project's source files.
  2. Import the necessary modules and dependencies, such as std::{iter::Peekable, rc::Rc, str::Chars}.
  3. Instantiate a Lexer object, passing the input source code as a string to the new function.
  4. Use the provided methods to tokenize the source code
    • next_token: Retrieves the next token in the source code.
    • next_with_ctx: Retrieves the next token with context information, including line and column numbers.
    • get_deez_toks: Retrieves all tokens in the source code as a vector.
    • get_deez_toks_w_ctx: Retrieves all tokens with context information as a vector.

Example

#![allow(unused)]
fn main() {
use crate::assembler::tokens::{AsmToken, TokWithCtx};

let input_code = "...";  // Provide your assembly code here
let mut lexer = Lexer::new(&input_code);

loop {
    let token = lexer.next_token();
    if token == AsmToken::Eof {
        break;
    }
    println!("Token: {:?}", token);
}
}

Parser

This is a parser designed for the Vondel Microarchitecture. It provides functionality to parse assembly code written for the Vondel Microarchitecture.

Usage

To use the parser, follow these steps:

  1. Create an instance of the Parser struct by providing the input tokens as a Rc<[TokWithCtx]> parameter.
  2. Use the new function of the Parser struct to initialize it with the provided tokens.
  3. Use the various parsing functions provided by the Parser struct to parse the assembly code.

Example

Using our lexer

#![allow(unused)]
fn main() {
let input = r"
.data
    dividend:   .word 10    # Dividend
    divisor:    .word 3     # Divisor
    quotient:   .word 0     # Quotient
    remainder:  .word 0     # Remainder
    address:    .byte 77    # Address
";

let mut l = Lexer::new(input);
let toks = l.get_deez_toks_w_ctx();
let rc_slice = Rc::from(toks.into_boxed_slice());
let mut p = Parser::new(rc_slice);
p.get_deez_program()
}

Using a Vector of TokWithCtx

#![allow(unused)]
fn main() {
let toks = toks_from_api()?;
let mut p = Parser::new(rc_slice);
p.get_deez_program()
}

Parser Structure

The parser is implemented using the following structs:

Program

The Program struct represents the parsed assembly program. It contains the following fields:

  • sections: A vector of Sections representing the different sections of the program.
  • errors: A vector of Error representing any parsing errors encountered during the parsing process.

Parser

The Parser struct is the main parser implementation. It contains the following fields:

  • toks: A shared reference to the input tokens.
  • cur_tok: The current token being processed.
  • peek_tok: The next token to be processed.
  • idx: The index of the current token in the toks vector.
  • cur_line: The current line number.
  • cur_column: The current column number.

The Parser struct provides the following methods:

  • new(toks: Rc<[TokWithCtx]>): Creates a new instance of the Parser struct with the provided input tokens.
  • next_token(): Moves to the next token in the input.
  • expect_peek(expected: AsmToken) -> Result<()>: Checks if the next token matches the expected token.
  • curr_token_is(expected: AsmToken) -> bool: Checks if the current token matches the expected token.
  • peek_token_is(expected: AsmToken) -> bool: Checks if the next token matches the expected token.
  • get_label() -> Result<Rc<str>>: Retrieves the label from the current token.
  • get_number() -> Result<Rc<str>>: Retrieves the number from the current token.
  • get_register() -> Result<Rc<Register>>: Retrieves the register from the current token.
  • get_opcode() -> Result<Rc<Opcode>>: Retrieves the opcode from the current token.
  • get_pseudo_op() -> Result<Rc<PseudoOps>>: Retrieves the pseudo operation from the current token.
  • guard_a_bus(reg: Rc<Register>) -> Result<Rc<Register>>: Checks if the provided register can be used in the A bus.
  • guard_b_bus(reg: Rc<Register>) -> Result<Rc<Register>>: Checks if the provided register can be used in the B bus.
  • guard_c_bus(reg: Rc<Register>) -> Result<Rc<Register>>: Checks if the provided register can be used in the C bus.
  • parse_data_to_write() -> Result<DataWrited>: Parses the data to be written.
  • parse_data_directive() -> Result<Sections>: Parses the data directive section.
  • get_dest_regs() -> Result<Vec<Rc<Register>>>: Retrieves the destination registers.
  • parse_instruction_til_rs1() -> Result<(Vec<Rc<Register>>, Rc<Register>)>: Parses the instruction until the RS1 register.
  • get_instruction() -> Result<Instruction>: Retrieves the instruction.
  • parse_labeled_section() -> Result<TextSegment>: Parses the labeled section.
  • `parse_text_directive() ->.

AsmEvaluator is a custom microarchitecture evaluator implemented in Rust. It provides functionality for evaluating assembly code written in a custom instruction set architecture.

Usage

To use the AsmEvaluator, follow these steps:

  1. Create a new instance of AsmEvaluator using the new method:
#![allow(unused)]
fn main() {
let mut evaluator = AsmEvaluator::new();
}
  1. Call the evaluate_buffer method to evaluate a buffer containing assembly code:
#![allow(unused)]
fn main() {
let buffer = r"
.text
main:
    lui t1 <- 77
";
let result = evaluator.evaluate_buffer(buffer);
}

The evaluate_buffer method returns a Result containing the control store (CtrlStore) and a reference to the evaluated memory (&[u32]). You can handle the result accordingly.

  1. Alternatively, you can evaluate a Program directly by calling the eval_program method:
#![allow(unused)]
fn main() {
let program = get_program(); // Obtain the program somehow
let result = evaluator.eval_program(program);
}

The eval_program method follows a similar pattern as evaluate_buffer but takes a Program as input.

  1. You can also evaluate individual sections of the program using the eval method:
#![allow(unused)]
fn main() {
let sections = get_sections(); // Obtain the sections somehow
let control_store = evaluator.eval(&sections);
}

The eval method takes a reference to the Sections enum, which can contain either a TextSection or a DataSection. It returns the CtrlStore generated from the evaluation.

Structure

The AsmEvaluator struct has the following fields:

  • values: A HashMap mapping labels to values (u8).
  • addr: A HashMap mapping labels to addresses (u8).
  • ram: A vector representing the random access memory.
  • unreachable: A vector containing tuples of unreachable labels, their corresponding control store addresses, and microinstructions. The struct implements the Default trait using the derive attribute.

The AsmEvaluator struct provides the following methods:

  • new: Creates a new instance of AsmEvaluator.
  • evaluate_buffer: Evaluates a buffer of assembly code and returns the control store and evaluated memory.
  • eval_program: Evaluates a Program and returns the control store and evaluated memory.
  • eval: Evaluates individual sections of the program and returns the control store. Additionally, there are internal helper methods for evaluating data and text segments, resolving unreachable instructions, and evaluating different types of instructions.

Error Handling

If there are errors while parsing the program, the evaluate_buffer method will print the errors and return a Result with an error message. It's important to handle this case to ensure proper error reporting and handling.

Comparision with proposed Assembler and Language

Output

The first and more obvious difference between our project and the proposed, is that we generate the Microinstruction itself and not a tuple (Opcode, Byte_addr) that can be used by the microarchitecture.

We choose this method because of the poor performance that is accomplished using fetch operation for getting the next instruction

A common operation of goto or jal can be perfomed on a single instruction in Vondel because we know the next address without having to use fetch

IndexVondelProposed
00b000000010_100_00000000_00000000000000000000_000_11111_11111_000000000b00001110_000_00110101_001000_001_001
10b00000000_100_00010100_001000_001_010

Another difference is the way that we interact with RAM, on Vondel we produce a .ram file that it's separated from the firmware file, this, by itself, can reduce early gotos that are common in the proposed one. Because all words can be placed early and stop the program without executing anything

Combinations

Vondel has more than 700000000 possibilities for creating instructions. This is because of our 23 register that can be combined in any way on C_Bus, 24 options for A_bus and 19 for B_bus. Where every register of A or B bus can be replaced with a immediate or an address to a label.

On the proposed assembler, we only have operations with register X and everything in centered around him, limiting the way a programmer can interact with a microarchitecture.

Extensibility

Adding new Mnemonics, Registers, PseudoOps, Sections or updating previous one is extremelly easy on Vondel.

Because of Rust powerful type-system, any new changes that you make to this project will be captured by both our unit tests and Rust itself, leading to a more robust project.

The proposed one was written in python without any tests

Fine Grained Syntax

On Vondel, we can define the syntax of our Language in any way that you want. Because we created a parsing stage, our syntax will be defined there.

A syntax pattern that would be hard to do in the proposed one is our write function.

write 123 <- t0
write addr <- a1

Most Vondel instructions can receive up to 23 registers as destination, but this one only receives label or immediate for storing the result and it must have an register as input

Implementing this on the proposed one would require to remake the entire bussiness logic that consists of opcode, dest, source

On Vondel we just create a arm on the match statement and it's done

Error Handling and Diagnostics

One of the most important features of a interpreter/compiler/assembler is how well it handle errors and show this information for a user.

On the proposed one, if something goes wrong the parser still tries to generate a file and just prints that have happened an error like this:

Error of syntax on line 15

We improved this error system using Rust powerful type-system that cries until we handle it on a good manner using this approachs:

  1. Show not only a line but also which token is required and at which column
  2. If some operation is forbidden like writing to MBR we send an error message to the user
Expected "Assign" found "Comma"
Line 13, column 9

Cannot write to MAR on C Bus
Line 18, column 9

Immediate Support

On Vondel we support Immediates, that are a type of data used in microarchitecture to provide immediate values or constants as operands in instructions. They allow for the inclusion of constant values directly within the instruction itself, without the need to load them from memory or registers.

It's not implemented on the proposed one

Microarchitecture

The basic idea behind Vondel's design is to follow some tips from the book Structured Computer Organization - Andrew S. Tanenbaum while implementing the ability to perform operations even if the main clock is in a non-edge level in order to reduce clock cycles.

The implementation of this design is provided by the uarch module.

Datapath structure

The way that a datapath is structured is not far from the usual three-bus design plus a IFU, the techniques involved a rather simple for now as you can see the diagram below:

Data path diagram

This implementation provides 24 registers in total: 5 memory registers, 3 system registers (to manage function calls and variables) and 16 general 16 general purpose registers (from R0 to R15). The register list is as following:

  • Memory
    • MAR (Memory Address Register): 20 bits
    • MDR (Memory Data Register): 32 bits
    • PC (Program Counter): 20 bits
    • MBR (Memory Buffer Reader): 8 bits
    • MBR2 (Memory Buffer Reader 2): 16 bits
  • System
    • SP (Stack Pointer): 20 bits
    • LV (Local Variables): 20 bits
    • CPP (Constant Pool Pointer): 20 bits
  • General Purpose
    • R0: 32 bits
    • ...
    • R15: 32 bits

Each cycle the datapath is driven by a Microinstruction provided by the Control Store. For more info see the next chapter.

Data parallelism

For achieve data parallel processing we will use two datapaths with one ALU each, but the clock trigger of one is the opposite of the other, that is, ALU1 is falling-edge triggered and ALU2 is rising-edge triggered. Furthermore, the clock signal received in ALU2 is a function of the clock signal of ALU1.

Let's say that ALU1 has a clock α1 and the delayed version of this signal is α1'. So the ALU2 must have a clock α2 = (α1 ∧ α1') plus a delay of δ1 as shown below:

Clock relation diagram

In other words, ALU1 will start its operation cycle on the falling-edge of α1 and end the operation on the rising-edge, while ALU2 will start its operation cycle on the rising-edge of α2 and end the operation on the falling-edge, taking advantage of the main clock (α1) even when it is at high level.

The advantage of this method is that we can share the control store and all the registers without big cost in the Misconstruction size, external hardware components and additional logic steps that may cost some clock cycles. The way that the components are shared is shown below:

Shared components diagram

Those two datapaths will become a thread in a future design of this Microarchitecture that is planned to have two task parallel threads.

Microinstruction

Microinstructions are stored in the control store and are fetched with the address stode at MPC (Microprogram Counter).

A Vondel microinstruction has 62 bits the following format:

NEXTJAMALUC BUSMEMABIMMEDIATE
9 bits3 bits9 bits20 bits3 bits5 bits5 bits8 bits

You can find a more detailed version of this diagram here

NEXT

The NEXT field stores the next microinstruction address. In other words, the NEXT field stores the next value to be stored at MPC.

JAM

The JAM field stores in which conditions the program should jump. A Jump is a change in the natural flow of the microprogram that represents that the next instruction should be changed is some way.

We have 3 possible condition to jump: if the value returned by ALU is 0 (JAMZ), if is negative (JAMN) and if we want to change the next instruction using something stored at MBR (JMPC). The first bit (MSB) from JAM is JMPC, the second is JAMN and the last is JAMZ.

On JAMN and JAMZ a jump is, in fact, a OR operation on the most sigficant bit of MPC and the condition. In other words, if the condition is true, which happens when its bit is 1, that 1 is bitwise ORed with the MSB of MPC, so if the value of MPC is 000001010 the jump position is 100001010. But on JMPC, a jump is a bitwise or with MBR and the 8 LSB's from MPC.

ALU

The ALU field actually controls 2 devices: the ALU itself and the shifter connected to its autput.

The ALU has two input, A and B (that come from the A and B bus, respectively) and it's controled by the 6 LSB's from the ALU field, they are:

  • F0, F1 and F2 (Controls the ALU function)
  • ENA and ENB (Enables the input from A and B bus respectively)
  • INVA (Inverts the bits of A)
  • INC (Increments 1 to the ALU result)

from MSB to LSB. The logic and arithmetic functions that ALU can operate are managed by F0, F1 and F2 like this:

F0F1F2Function
000A AND B
001A OR B
010NOT B
011A + B
100A XOR B
101A * B
110A / B
111A % B

Where AND, OR and XOR are bitwise operations and +, *, / and % are the addition, multiplication, division and remainder arithmetic operations respectively.

Some useful combinations of ALU signal can be found below:

F0F1F2ENAENBINVAINCFunction
0011000A
0010100B
0011010not A
0101100not B
0111100A + B
0111101A + B + 1
0111001A + 1
0110101B + 1
0111111B − A
0110110B − 1
0111011−A
0001100A AND B
0011100A OR B
00100000
01100011
0110010−1
1001100A XOR B
1011100A * B
1101100A / B
1111100A % B

The shifter has 2 inputs: The value of the ALU operation (let's call it X) and the operation opcode that are which are the 2 MSB's from the ALU field. The operations and its opcode are as following:

OpcodeOperationOutput
0b00NoneX
0b01Shift Right 1 bitX >> 1
0b10Shift Left 8 bitsX << 8
0b11Shift Left 1 bitX << 1

C BUS

The C BUS field represents which registers gonna be writen with the value of the C BUS (which is the shifter output). This field has 20 bits because there are 20 registers connected to the C bus, each bit 1 represents that the register represented by that bit should be writen by the C BUS.

The relation between a bit n and which register it's represents is shown below (from MSB to LSB):

BitRegister
1MDR
2MAR
3PC
4LV
5R0
6R1
7R2
8R3
9R4
10R5
11R6
12R7
13R8
14R9
15R10
16R11
17R12
18R13
19R14
20R15

MEM

The memory field represents which memory operations gonna happen in the cycle. The bit 1 (MSB), 2 and 3 represents the operations of WRITE, READ and FETCH respectively.

Each bit 1 in the field informs that the memory operation related to that field will be executed. What is actually done in operations can be found below:

  • READ: Reads the memory word (32 bit) of the address stored in MAR into MDR.
  • WRITE: Writes the memory word stored in MDR into the memory address stored in MAR.
  • FETCH: Reads the memory word of the address stored in PC into MBR (8-bit) and MBR2 (16-bit).

The FETCH operation uses the IFU to store the remainder bytes in cache to only access memory when needed.

A and B

A and B are fields that control which register writes to the A and B bus respectively, but, since the A and B bus are connected directly to the A and B inputs of the ALU, they can also be viewed as which register goes as a entry to the ALU operation.

The values of A and B and which register they enable (NONE, represents that none of them writes to the respective BUS, so the value of the bus is 0) is shown below:

Output to BUS A:

IDBINRegister
000000MDR
100001PC
200010MBR
300011MBRU
400100MBR2
500101MBR2U
600110LV
700111CPP
801000IMMEDIATE
901001R0
1001010R1
1101011R2
1201100R3
1301101R4
1401110R5
1501111R6
1610000R7
1710001R8
1810010R9
1910011R10
2010100R11
2110101R12
2210110R13
2310111R14
2411000R15
.....NONE

Output to BUS B:

IDBINRegister
000000MDR
100001LV
200010CPP
300011IMMEDIATE
400100R0
500101R1
600110R2
700111R3
801000R4
901001R5
1001010R6
1101011R7
1201100R8
1301101R9
1401110R10
1501111R11
1610000R12
1710001R13
1810010R14
1910011R15
.....NONE

IMMEDIATE

The immediate field allows us to send a arbitrary 8 bit number to the A or B bus, i.e if we set 0x08 in the IMMEDIATE field and enable the immediate input on the A and/or B bus, that 0x08 gonna be loaded in the corresponding bus.

Assembly

Some details about the relation between the assembly code and the microarchitecture.

Register nomeclature

In the assembly code the general purpose registers were renamed for convenience in when writing code, the rename table is as following:

RegisterAssembly Name
R0Ra
R1T0
R2T1
R3T2
R4T3
R5S0
R6S1
R7S2
R8S3
R9S4
R10S5
R11S6
R12A0
R13A1
R14A2
R15A3

WARNING: The registers T0 - T3 can be used in instructions like mul to store temporary values, so the value of those can be changed in the instruction implementations, therefore they are not guaranteed to have the value that you expect using instruction like lui or read.

Assembly Implementations

You can find some implementatinos examples of assembly code into the microarchitecture in the next chapter.

Instruction Implementations

In this section we will show how some of the microinstruction of the Vondel Language are implemented in the microprogram.

Operations Summary

HALT

HALT is a special microinstruction to indicate that the program has ended. In this design this opcode is achieved by seting all 64 bits of the microinstruction to 1.

This constant is avalible as a shortcut in the CtrlStore struct.

MOV

Syntax: mov r0, ..., rn <- x.

Action: Store the value of the register x on the registers r0, ..., rn.

High Level

mov r15, r13 <- r14

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001100000000000000000000101000011101111100000000
111111111111111111111111111111111111111111111111111111111111111

ADD

Syntax: add r0, ..., rn <- x, y.

Action: Stores the result of x + y on the registers r0, ..., rn.

High Level

add r0, r1 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000011110000001100000000000000000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

SUB

Syntax: sub r0, ..., rn <- x, y.

Action: Stores the result of x - y on the registers r0, ..., rn.

High Level

sub r0 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000011111100001000000000000000000011000011000000000
111111111111111111111111111111111111111111111111111111111111111

AND

Syntax: and r0, ..., rn <- x, y.

Action: Stores the result of x & y on the registers r0, ..., rn.

High Level

and r1 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000000110000000100000000000000000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

OR

Syntax: or r0, ..., rn <- x, y.

Action: Stores the result of x | y on the registers r0, ..., rn.

High Level

or r0, r1, r14, r15 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001110000001100000000000011000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

XOR

Syntax: xor r0, ..., rn <- x, y.

Action: Stores the result of x ^ y on the registers r0, ..., rn.

High Level

xor r0, r1, r14, r15 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000100110000001100000000000011000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

NOT

Syntax: not r0, ..., rn <- x.

Action: Stores the result of !x on the registers r0, ..., rn.

High Level

not r0, r15 <- r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001101000001000000000000001000011001111100000000
111111111111111111111111111111111111111111111111111111111111111

SLL

Syntax: sll r0, ..., rn <- x.

Action: Stores the result of x << 8 on the registers r0, ..., rn.

High Level

sll r0, r15 <- r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100010001100000001000000000000001000011001111100000000
111111111111111111111111111111111111111111111111111111111111111

SLA

Syntax: sla r0, ..., rn <- x.

Action: Stores the result of x << 1 on the registers r0, ..., rn.

High Level

sla r0, r15 <- r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100011001100000001000000000000001000011001111100000000
111111111111111111111111111111111111111111111111111111111111111

SRA

Syntax: sra r0, ..., rn <- x.

Action: Stores the result of x >> 1 on the registers r0, ..., rn.

High Level

sra r0, r15 <- r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100001001100000001000000000000001000011001111100000000
111111111111111111111111111111111111111111111111111111111111111

READ

Syntax: read r0, ..., rn <- [addr].

Action: Stores the memory word at addr on the registers r0, ..., rn.

High Level

read r0, r15 <- 0x00008

Microprogram

Precondition: 0x00008 should be stored at MAR.

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000000000000000000000000000000010111111111100000000
100000001000000001100000001000000000000001000000001111100000000
211111111111111111111111111111111111111111111111111111111111111

WRITE

Syntax: write [addr] <- x.

Action: Stores the memory value of x at addr on the memory.

High Level

write 0x00001 <- r14

Microprogram

Precondition: 0x00001 should be store at MAR.

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001100010000000000000000000000011101111100000000
100000001000000000000000000000000000000001100111111111100000000
211111111111111011111111111111111111111111111111111111111111111

JAL

Syntax: jal [addr].

Action: Jumps to the instruction of id (address) addr.

High Level

jal 0x0A

Microprogram

JMPC bit is used to nconditional branch, so we have to find some way of storing 0x0A into MBR and then use JMPC to jump to that address. For that the number 10 (0x0A) must be stored at memory and its address should be stored in PC. So, let's store 10 in the current address of the PC, by default the PC is 0, so let's store it there.

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000000000000000000000000000000010111111111100000000
100000001010000000000000000000000000000000000111111111100000000
---------
1011111111111111111111111111111111111111111111111111111111111111

In a nutshell the operations are the following:

  1. Fetch the memory.
  2. Set JMPC to 1.

BEQ

Syntax: beq x, y.

Action: Jump if the value of the register x is equal to the value of the register y.

High Level

beq r14, r15

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100100011111100000000000000000000000101111001100000000
---------
25711111111111111111111111111111111111111111111111111111111111111

BNE

Syntax: bne x, y.

Action: Jump if the value of the register x is not equal to the value of the register y.

High Level

bne r14, r15

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000101000011111100000000000000000000000101111001100000000
100000001001000011111100000000000000000000000110001001000000000
---------
25711111111111111111111111111111111111111111111111111111111111111
25811111111111111111111111111111111111111111111111111111111111111

BLT

Syntax: blt x, y.

Action: Jump if the value of the register x is less than the value of the register y.

High Level

blt r14, r15

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000101000011111100000000000000000000000110001001000000000
---------
25711111111111111111111111111111111111111111111111111111111111111

BGT

Syntax: bgt x, y.

Action: Jump if the value of the register x is greater than the value of the register y.

High Level

bgt r14, r15

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000101000011111100000000000000000000000101111001100000000
---------
25711111111111111111111111111111111111111111111111111111111111111

MUL

Syntax: mul r0, ..., rn <- x, y.

Action: Multiply the value of x by y .

High Level

mul r13 <- r6, r7

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000101000011111100000000000000000000000100000101000000000
100000001000000001100000001000000000000000000100001111100000000
200000001100000011111100000100000000000100000011111111100000000
300000010000100011011000001000000000000000000010011111100000000
400000001100000011110000000000000000000100000101100010100000000
500000001100000011111100000100000000000100000100001111100000000
611111111111111111111111111111111111111111111111111111111111111
---------
25700000010100000001100000001000000000000000000011111111100000000
---------
26000000011000000000000000000000000000000000000000000000000000000

In a nutshell the operations are the following:

  1. Jump to 0b100000001 (257) if r6 - r7 is negative.
  2. Case 1 (has not jumped), therefore r6 >= r7
    1. Store the value of r7 on r0
    2. Store the value of r6 on r1 and r13
    3. Store the value of r0 - 1 into r0 or jump to the TERMINATE code (260) if equals to 0.
    4. Stores the value of r13 + r1 into r13.
  3. Case 2 (has jumped), therefore r6 < r7
    1. Store the value of r6 on r0
    2. Store the value of r7 on r1 and r13 and goto the step 3 of the case 1.

In another words, the smallest number is stored at r0 and the greater is stored in r1 and we aways make r0 * r1. This optimization is necessary because a multiplication is computed as sequential additions, and making less additions save some clock cycles.

In the real assembly implementation the registers used to store the temporary values are T0 (to store the minimum), T1 (to store the sum) and T2 (to store the maximum), so be caution when using this registers at the same time as the mul operation.

MUL2

Syntax: mul2 r0, ..., rn <- x, y.

Action: Multiply the value of x by y .

The difference between mul2 and mul are that mul is a software addition, in other words, a sequential sum of a number, on the other hand mul2 is hardware implemented (uses a multiplication circuit).

High Level

mul2 r0, r1 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000101110000001100000000000000000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

DIV

Syntax: div r0, ..., rn <- x, y.

Action: Divide the value of x by y .

High Level

div r0, r1 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000110110000001100000000000000000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

MOD

Syntax: mod r0, ..., rn <- x, y.

Action: Calculate the remainder of the division of x by y.

High Level

mod r0, r1 <- r2, r3

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000111110000001100000000000000000010110110000000000
111111111111111111111111111111111111111111111111111111111111111

LUI

Syntax: lui r0, ..., rn <- [byte].

Action: Store the the immediate value of byte on the registers r0, ..., rn.

High Level

lui r15 <- 0xFF

This program stores 0xFF (255) to r15, but 255 is not stored in any register (it is a immediate value).

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001100000000000000000000001000010001111111111111
111111111111111111111111111111111111111111111111111111111111111

ADDI

Syntax: addi r0, ..., rn <- x, [byte].

Action: Stores the value of x + [byte] on the registers r0, ..., rn.

High Level

addi r14 <- r14, 0x05

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000011110000000000000000000010000010001001000000101
111111111111111111111111111111111111111111111111111111111111111

SUBI

Syntax: subi r0, ..., rn <- x, [byte].

Action: Stores the value of x - [byte] on the registers r0, ..., rn.

High Level

subi r14 <- r14, 0x05

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000011111100000000000000000010000010001001000000101
111111111111111111111111111111111111111111111111111111111111111

ANDI

Syntax: andi r0, ..., rn <- x, [byte].

Action: Stores the value of x & [byte] on the registers r0, ..., rn.

High Level

andi r14 <- r14, 0xFF

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000000110000000000000000000010000010001001011111111
111111111111111111111111111111111111111111111111111111111111111

ORI

Syntax: ori r0, ..., rn <- x, [byte].

Action: Stores the value of x | [byte] on the registers r0, ..., rn.

High Level

ori r14 <- r14, 0x00

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000001110000000000000000000010000010001001000000000
111111111111111111111111111111111111111111111111111111111111111

XORI

Syntax: xori r0, ..., rn <- x, [byte].

Action: Stores the result of x ^ [byte] on the registers r0, ..., rn.

High Level

xori r0, r1, r14, r15 <- r2, 0x02

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000100110000001100000000000011000010000011000000010
111111111111111111111111111111111111111111111111111111111111111

MULI

Syntax: muli r0, ..., rn <- x, [byte].

Action: Stores the result of x * [byte] on the registers r0, ..., rn.

High Level

muli r0, r1, r14, r15 <- r2, 0x02

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000101110000001100000000000011000010000011000000010
111111111111111111111111111111111111111111111111111111111111111

DIVI

Syntax: divi r0, ..., rn <- x, [byte].

Action: Stores the result of x / [byte] on the registers r0, ..., rn.

High Level

divi r0, r1, r14, r15 <- r2, 0x02

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000110110000001100000000000011000010000011000000010
111111111111111111111111111111111111111111111111111111111111111

MODI

Syntax: modi r0, ..., rn <- x, [byte].

Action: Stores the result of x % [byte] on the registers r0, ..., rn.

High Level

modi r0, r1, r14, r15 <- r2, 0x02

Microprogram

IDNEXTJAMALUC BUSMEMABIMMEDIATE
000000000100000111110000001100000000000011000010000011000000010
111111111111111111111111111111111111111111111111111111111111111

Hardware

Some hardware details about some relevant components of the microarchitecture that differ from the provided components from the book Structured Computer Organization.

ALU

Vondel's ALU have some optimizations oven the Arithmetic Logic Unit provided by the book, the main difference is that the function inputs are 3-bit lenght instead of 2-bit, in other words we have 23 possible operations that are:

  • Logic And
  • Logic Or
  • Logic Xor
  • Logic Not
  • Addition
  • Multiplication
  • Division
  • Remainder

For that we change a little bit the hardware provide a circuit to any given possible operation. The ALU model can be found below:

ALU hardware info

Adder

The adder is a simple 32-bit full-adder that uses a combination of sequential 1-bit full-adders that conects to each other by the carry, in other words, the Co of one is the Ci of the next adder.

Multiplier

The multiplier circuit uses a standard single cycle multiplier to multilpy A by B, for that we need to set the inputs and output width to 32-bit lenght.

32-bit x 32-bit sigle cycle multiplier diagram

Divider and Remainder

The divider circuit uses the division algorithm logic implemented on digital logic to divide A by B, the basic idea is:

  1. Compare the divisor with the selected bits of the dividend.
    1. If is less, then perform the subtraction, that results on a Q = 1 (Q is the Quotient).
    2. If is greater, do not perform the subtraction (Q = 0), go to step 2.
  2. Add the next bit of the dividend to the result and shift the divisor to right by 1 bit, go to step 1.
  3. Repeat the procedure until all the bits of the dividend are covered.

This circuit uses a Process Unit (PU) to perform the comparison logic part of this algorithm, this simple circuit is ilustrated below:

Divider process unit

Combining a matrix of PU's which each row determine a single Quotient bit we can buid a simple 32-bit x 32-bit adder.

32-bit by 32-bit divider

You may notice that in the first row we feed the PU inputs only with B and zeros, the reason is that in the standard divider circuit the dividend lenght is aways 2 times the divisor lenght, that way whe fill the 32 MSB's of the pseudo 64-bit input with zeros, that way we can achieve a 32-bit x 32-bit division.

You can simulate a smaller sample of this circuit here.

IMM

The IMM circuit provides pre functionality of write the IMMEDIATE field of the microinstruction in the A and/or the B buses. The circuit is fairly simple and looks like this:

IMM circuit diagram

Others

Other integrated circuits of the microarchitecture like the Logic Unit, O and High Bit are exactly the same as illustrated in the book.

Chapter 1