Skip to content

This repository hosts the source code and documentation for the MicroJava Compiler, a Java-based compiler for the MicroJava programming language. MicroJava is a small, educational programming language, and this compiler is designed to convert MicroJava source code into bytecode that can be executed on the MicroJava Runtime Machine.

zarkobabic/Microjava-compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MicroJava Compiler

Untitled

Navigation


Overview

This repository hosts the source code and documentation for the MicroJava Compiler, a Java-based compiler for the MicroJava programming language. MicroJava is a small, educational programming language, and this compiler is designed to convert MicroJava source code into bytecode that can be executed on the MicroJava Runtime Machine.


Project Components

The MicroJava Compiler consists of four primary components:

  1. Lexer:

    • Implemented using jFlex, the lexer analyzes the source code and tokenizes it based on the MicroJava language specification. It identifies and classifies language constructs, such as keywords, operators, and identifiers. This lexer is a crucial initial step in the compilation process.
  2. Parser:

    • Implemented using CUP, the parser takes the tokenized output from the lexer and constructs an Abstract Syntax Tree (AST) following the MicroJava language specification. It enforces the language's syntax rules, ensuring that the source code adheres to the correct structure and order of statements and expressions.
  3. Syntax Analysis:

    • In this phase, the AST generated by the parser is analyzed to enforce the constraints of the MicroJava language. Syntax analysis ensures that the code follows the correct grammar, handles operator precedence, type checking, and validates the program's structure.
  4. Code Generation:

    • The final phase of the compiler is responsible for generating executable bytecode for the MicroJava Runtime Machine. This process translates the validated AST into instructions that can be executed on the MicroJava virtual machine, enabling the execution of MicroJava programs.

Lexing in General

Lexing, also known as tokenization, is the process of breaking down the source code into a stream of tokens. Tokens are the smallest units of a programming language and include keywords, identifiers, operators, and literals. The lexer's role is to recognize and classify these tokens based on the language's grammar and rules.


Parsing in General

Parsing is a crucial step in the compilation process that follows lexing. It involves the analysis of the tokenized source code to determine its structure and relationships between different elements. In this phase, a parser constructs an Abstract Syntax Tree (AST) that represents the program's syntax according to the language's grammar. It enforces the correct order of statements, expressions, and the overall program structure.


Syntax Analysis in General

Syntax analysis, also known as parsing, is an essential part of any compiler. It ensures that the source code adheres to the rules and grammar of the target programming language. During this phase, the compiler constructs a structured representation of the code (such as an AST) and performs various checks to ensure correctness. Common tasks in syntax analysis include identifying and reporting syntax errors, handling operator precedence, and building a representation of the program that can be used for further processing and code generation.


Code Generation in General

Code generation is the final phase of a compiler, responsible for producing executable code from the parsed and analyzed source code. During this phase, the compiler generates code that can be executed on the target platform, such as machine code or bytecode. The generated code should adhere to the rules and specifications of the target platform and be semantically equivalent to the original source code.


Usage

The MicroJava Compiler can be used by students and educators as a learning tool for understanding the principles of compiler construction. To build and use the compiler, please refer to the instructions provided in the repository's documentation.


Documentation

Appendix A: MicroJava Programming Language

This appendix describes the MicroJava programming language. Microjava is similar to Java but much simpler.

A.1 General Features of the Language

  • A MicroJava program starts with a keyword program and has static fields, static methods, and inner classes that can be used as (user) data types.
  • The main method of a MicroJava program is always called main(). When a MicroJava program is called, that method is executed.
  • Since:
    • Integer, sign, and logical constants (int, char, bool).
    • Basic types: int, bool, char (ASCII).
    • Variables: global (static), local, class (fields).
      • Variables of basic types contain values.
      • Structured/reference types: one-dimensional arrays as in Java and inner classes with fields and methods.
    • Variables of reference types represent references (they contain addresses that cannot be changed explicitly).
  • Static methods in the program.
  • There is no garbage collector (allocated objects are only deallocated after the end of the program).
  • There is class inheritance and polymorphism.
  • There is a redefinition of methods.
  • Methods of inner classes are bound to the instance and have an implicit parameter this (a reference to the instance of the class for which the method was called).
    • The reference "this" is implicitly declared in the methods of inner classes as the first formal argument of the reference type to the class to which the method belongs.
    • Within instance methods, the field name refers to the instance field of the current object, assuming the field is not hidden by a method parameter. If it is hidden, we can access the instance field via this.fieldName.
  • Pre-declared procedures: ord, chr, len.
  • Method print prints the values of all basic types.
  • Control structures include conditional branching (if-else) and cycle (do-while).

A.2 Syntax

## Program Structure
Program = "program" ident {ConstDecl | VarDecl | ClassDecl } "{" {MethodDecl} "}".

## Constant Declaration
ConstDecl = "const" Type ident "=" (numConst | charConst | boolConst) {, ident "=" (numConst | charConst | boolConst)} ";".

## Variable Declaration
VarDecl = Type ident ["[" "]"] {"," ident ["[" "]"]} ";".

## Class Declaration
ClassDecl = "class" ident ["extends" Type] "{" {VarDecl} ["{" {ConstructorDecl} {MethodDecl} "}"] "}".

## Constructor Declaration
ConstructorDecl = ident "(" [FormPars] ")" {VarDecl} "{" {Statement} "}. * za C nivo

## Method Declaration
MethodDecl = (Type | "void") ident "(" [FormPars] ")" {VarDecl} "{" {Statement} "}".

## Formal Parameters
FormPars = Type ident ["[" "]"] {"," Type ident ["[" "]"]}.

## Type Definition
Type = ident.

## Statements
Statement = DesignatorStatement ";" | "if" "(" Condition ")" Statement ["else" Statement] | "while" "(" Condition ")" Statement | "break" ";" | "continue" ";" | "return" [Expr] ";" | "read" "(" Designator ")" ";" | "print" "(" Expr ["," numConst] ")" ";" | Designator "." "foreach" "(" ident "=>" Statement ")" ";" | "{" {Statement} "}".

## Designator Statement
DesignatorStatement = Designator (Assignop Expr | "(" [ActPars] ")" | "++" | "‐‐") | "[" [Designator] {"," [Designator]}"]" "=" Designator.

## Actual Parameters
ActPars = Expr {"," Expr}.

## Conditions
Condition = CondTerm {"||" CondTerm}.

## Conditional Term
CondTerm = CondFact {"&&" CondFact}.

## Conditional Factor
CondFact = Expr [Relop Expr].

## Expressions
Expr = [""] Term {Addop Term}.

## Terms
Term = Factor {Mulop Factor}.

## Factors
Factor = Designator ["(" [ActPars] ")"] | numConst | charConst | boolConst | "new" Type ( "[" Expr "]" | "(" [ActPars] ")" ) | "(" Expr ")".

## Designators
Designator = ident {"." ident | "[" Expr]"}.

## Labels
Label = ident.

## Assignment Operator
Assignop = "=".

## Relational Operators
Relop = "==" | "!=" | ">" | ">=" | "<" | "<=".

## Addition Operators
Addop = "+" | "".

## Multiplication Operators
Mulop = "*" | "/" | "%".

A.3 Semantics

All defined terms in this document are underlined to emphasize their particular meaning. Definitions of those terms are given below.

Reference type

Arrays and classes are of reference type.

Constant type

  • The type of an integer constant (e.g., 17) is int.
  • The type of a character constant (e.g., 'x') is char.
  • The type of a logical constant (e.g., true) is bool.

Equivalent Data Types

Two data types are equivalent

  • if they have the same name, or
  • if both are strings, and the types of their elements are equivalent.

Compatible data types

The two data types are compatible

  • if they are equivalent, or
  • if one of them is a reference type, and the other is of type null.

Compatibility of data types when assigning

Type heart is compatible when assigned to type dst

  • if they are heart and dst equivalent,
  • if dst is reference type, and heart is of the type null.
  • if dst is a reference to a base class and src is a reference to a derived class

Predeclared names

  • int: type of all integer values
  • char: type of all character values
  • bool: boolean type
  • null: the null value of a variable of type class or (character) string symbolically indicates a reference that does not point to any data
  • aeolian: end of character line (corresponds to the character '\n'); print(eol) makes a transition to a new line
  • chr: standard method; chr(i) performs the conversion of an integer expression and to character (char)
  • ord: standard method; ord(ch) performs character conversion ch to an integer value (int)

Scope of validity

Validity range (scope) represents the textual reach of a method or class. It extends from the beginning of the method or class definition to the closing brace at the end of that definition. A scope does not include names declared in scopes that are lexically nested within it. In a scope, the names declared within it and all scopes outside it are "seen". The assumption is that there is an artificial global scope (universe), for which the main program is local and which contains all pre-declared names.

  • Name declaration in inner scope hides the declaration of the same name in the outer scope.

Note

  • Indirect recursion is not allowed, and each name must be declared before its first use.
  • Predeclared names (e.g., int or char) can be redeclared in the inner scope (but this is not recommended).

A.4 Contextual conditions

General contextual conditions

  • Each name in the program must be declared before first use.
  • A name must not be declared multiple times within the same scope.
  • There must be a named method in the program main. It must be declared as a void method with no arguments.

Context conditions for standard methods

  • chr(s): must be an expression of type int.
  • ord(c): must be of type char.
  • len(s): must be a string or character string.

##Program Structure program = "program" + " ident {ConstDecl | VarDecl | ClassDecl | RecordDecl } {" + "{MethodDecl} }.";

Const Declaration

constDecl = "const Type ident = (numConst | charConst | boolConst);";

  • Terminal types numConst, charConst, or boolConst must be equivalent to Type.

Variable Declaration

  • varDecl = "Type ident [][, ident [][]];";

Class Declaration

classDecl = "class ident extends Type {" + "VarDecl {" + "ConstructorDecl MethodDecl}}";

  • Type when deriving a class from another class, it must be an inner class of the main program.

Method Declaration

methodDecl = "(Type | void) ident (FormPars) VarDecl {Statement}";

  • If the method is not of type void, it must have a return statement within its body.

Constructor Declaration

constructorDecl = "ident (FormPars) VarDecl {Statement}";

  • A class constructor must have the same name as the class for which it is defined.
  • There cannot be two constructors within the same class with the same formal parameters.

Formal Parameters

formPars = "Type ident [][] [, Type ident [][]]";

Type Declaration

type = "ident";

  • ident must indicate a data type.

Statements

Designator Statement

designatorStatement = "Designator Assignop Expr;";

  • Designator must denote a variable, array element, or field within an object.
  • Non-terminal type Expr must be compatible at assignment with non-terminal type Designator.

Increment/Decrement Statement

incrementDecrementStatement = "Designator (++ | --);";

  • Designator must denote a variable, array element, or field of an inner class object.
  • Designator must be of type int.

Method Call Statement

methodCallStatement = "Designator (ActPars);";

  • Designator must denote a non-static method of the inner class or a global function.

Array Element Assignment

arrayElementAssignment = "[Designator, Designator, ...] = Designator";

  • All Designator non-terminals to the left must denote a variable, array element, or field within an object.
  • Designator to the right must represent a string.
  • Array element types must be compatible at assignment.

Break Statement

breakStatement = "break;";

  • The break statement can only be used inside while or foreach loops.
  • Aborts execution immediately surrounding loops.

Continue Statement

continueStatement = "continue;";

  • The continue statement can only be used inside while or foreach loops.
  • Terminates the current iteration immediately surrounding loops.

Read Statement

readStatement = "read(Designator);";

  • Designator must denote a variable, array element, or field within an object.
  • Designator must be of type int, char, or bool.

Print Statement

printStatement = "print(Expr [, numConst]);";

  • Expr must be of type int, char, or bool.

Return Statement

returnStatement = "return [Expr];";

  • Non-terminal type Expr must be equivalent to the return type of the current method/global function.
  • If non-terminal Expr is missing, the current method must be declared void.
  • It must not exist outside the body of (static) methods, i.e., global functions.

##If Statement ifStatement = "if (Condition) Statement [else Statement];";

  • Conditional expression type Condition must be bool.

While Loop

whileLoop = "while (Condition) Statement;";

  • Conditional expression Condition must be of type bool.
  • Checks the specified condition when entering and at the end of the loop body.

Foreach Loop

foreachLoop = "Designator.foreach(ident => Statement);";

  • Designator must denote a string of arbitrary type.
  • ident must be a local or global variable of the same type as the elements of the array described by the Designator.
  • In each iteration of the loop, ident indicates the current element of the array.

Function Calls

Actual Parameters

actPars = "Expr [, Expr];";

  • The number of formal and actual arguments of a method or constructor must be the same.
  • The type of each actual argument must be compatible at assignment with the type of each formal argument.

Conditions

Conditional Expression

condition = "CondTerm || CondTerm;";

Conditional Term

condTerm = "CondFact && CondFact;";

Conditional Factor

condFact = "Expr Relop Expr;";

  • The types of both expressions must be compatible.
  • With variables of class or array type, only != and == can be used from the relational operators.

Expressions

Basic Expression

expr = "Term;";

Negation Expression

negationExpr = "- Term;";

  • Term must be of type int.

Addition Expression

additionExpr = "Expr Addop Term;";

  • Expr and Term must be of type int and compatible.

Multiplication Expression

multiplicationExpr = "Term Mullop Factor;";

  • Term and Factor must be of type int.

Factors

Designator Factor

designatorFactor = "Designator | numConst | charConst | boolConst | (Expr);";

Method Call Factor

methodCallFactor = "Designator (ActPars);";

  • Designator must denote a non-static method, a constructor, or a global function.

Array Creation Factor

arrayCreationFactor = "new Type [Expr];";

  • Expr must be of type int.

Object Creation Factor

objectCreationFactor = "new Type (ActPars);";

  • Type must denote an inner class (user-defined type).

Designators

designator1 = "Designator . ident;";

  • Non-terminal type Designator must be an inner class (ident must be either a field or a method of an object marked with a non-terminal Designator).

designator2 = "Designator [Expr];";

  • Non-terminal type Designator must be a string.
  • Non-terminal type Expr must be an int.

Operators

assignop = "=";

  • The assignment operator is right-associative.

relop = "== | != | > | >= | < | <="; addop = "+ | -";

  • Operators are left-associative.

mulop = "* | / | %";

  • Operators are left-associative.

Implementation Limitations

  • No more than 256 local variables may be used.
  • No more than 65536 global variables may be used.
  • A class cannot have more than 65536 fields.

License

This project is open source and available under the MIT License. Feel free to use, modify, and distribute the code, respecting the terms and conditions of the license.

For any questions or feedback, please don't hesitate to contact me.

Thank you for your interest in the MicroJava Compiler project. We hope it serves as a valuable resource for understanding compiler construction and the MicroJava programming language.

About

This repository hosts the source code and documentation for the MicroJava Compiler, a Java-based compiler for the MicroJava programming language. MicroJava is a small, educational programming language, and this compiler is designed to convert MicroJava source code into bytecode that can be executed on the MicroJava Runtime Machine.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published