wiki:compiler/carchitecture

Version 12 (modified by eamiri, 9 years ago) (diff)

--

Passes of the Compiler

  1. Parsing Python parser is used to parse the input. The output is an abstract syntax tree.
  1. Extracting List of Live Variables The list of live variables is extracted. The list will be used in dead code elimination.
  1. Inlining Functions calls are inlined. The functionality of this pass is restricted for now. This pass has a couple of internal phases in particular it handles all import statements.
  1. Translation The abstract syntax tree is translated to an abstract syntax tree based on bitExpr classes. The importance of this pass is that it lowers the level of abstraction by introducing carry variables.
  1. Sym-Table Construction Symbol table is constructed.
  1. Conversion to SSA The code is converted to SSA. The SSA form used in the compiler is tailored to needs of the compiler.
  1. Creating Basic Blocks Based on occurrences of optimize pragma, new conditional statements are introduced which results in creation of new basic blocks of code.
  1. Applying Pragmas All optimize pragmas are applied, this means that in proper places of the code, occurrences of optimized variables are replaced by the constant value indicated by optimize pragma. No further simplification is performed.
  1. Normalization Converts code to three address code. This also involves common subexpression elimination.
  1. Code Simplification The code is simplified using boolean algebra rules, copy propagation and constant propagation. Constant propagation has two forms, backward propagation and forward propagation. The main source of simplifications is optimize pragma.
  1. Dead Code Elimination Given the list of live variables extracted in Pass 2, we go backwards in the code, and find all variables that live variables depend on. All other variables are dead and the lines that define them can be removed.
  1. Factoring Out In order to reduce the size of code, the common code at the end of two branches of if-then-else statements, is removed from the branches and put right after that statement.
  1. While Loop Processing Extra operations for handling carry variables inside a while loop is inserted.
  1. Collecting Variable Information The code is analyzed to collect information about variable names and their type.
  1. Output Generation The abstract syntax tree is translated to the equivalent C code.

Conversion to SSA

A variable might have two definitions in two branches of an if statement.

Currently while loop bodies are not converted to SSA.

Normalization

This pass converts the code to three address code and implements common subexpression elimination on the whole program. (This means a variable defined in a previous basic block may be used in another basic block to remove a common subexpression). No common subexpression elimination is done on while loop bodies. This restriction was applied to remove a bug (what was the bug?).

Common subexpression elimination is based on comparing RHS of assignment statement. The string representation of RHSs are compared for equality. If string rep. of RHS of an assignment statement is in the predeclared list then the new assignment is removed.

Variables declared in the true and false branches of a conditional statement are not used for common subexpression elimination in the outer basic blocks. This is because we don't know which path will be taken in a certain execution of code.

For each line of code the boolean logic of RHS is simplified during conversion to three address code.

A better design of this pass will have three individual passes.

-- Boolean simplification of RHSs.
-- Conversion to three address code.
-- Optimizations including common subexpression elimination can be done in later in Simplification pass. To keep the current compilation strategy we may do a common subexpression elimination here.

The code must be in SSA form for this pass. If a variable has two definition then one expression involving that variable may have different values in different lines of code, so common subexpression elimination is not possible.

(Question: Can we move this pass forward and backward among compilation passes?)

Code Simplification

This pass simplifies the code as follows. This pass must be after normalize pass, so the code is in three address form at this point.

For one basic block the following is done:

1 - Simplify RHS of each line of code. (RHS has one operation. So this step basically is useful is one of the variables involved has a constant value, for example due to optimization.)
2- Propagate copies and constants. If RHS of any assignment statement changes then return to step 1. (Question: There is a difference in handling NOT. Is it necessary? look at the comment in the code.)

For basic blocks of if statements we have an extra step. True branch of if statements has an assumption on the value of a variable. The line of code immediately before if statement, defines that variable. We call function calc_implication() to propagate implications from left to right and find any variables that have a fixed value. Then this extra assumptions will be used to simplify the code in the true branch of if statement.

Remark Potentially left to right propagation of constants has broader use in simplification of the code. We should consider this in redesign.

Remark Currently calc_implications() works on the basic block immediately before the if statement. If the line before if is a while it does nothing. Moreover with the current implementation there seems to a bug if the line before if is while loop.