wiki:compiler/newdesign

Version 10 (modified by eamiri, 9 years ago) (diff)

--

Visitor Pattern

Usually compiler has a number of passes and each pass traverses the code line by line and each line of code has a specific type from among a few different types. The work that a pass does on each line of code depends on the type of the line. The simplest implementation for this is that each pass, checks the type of each line of code. The main drawback of this solution is that the compiler code cannot be extended easily by introducing new operations.

The solution to this problem is visitor pattern. A good explanation of visitor pattern is available here. To implement it a python ideas suggested here and here are useful.

Python AST module has two classes NodeVisitor? and NodeTransformer? that implement visitor pattern. Subclasses of the first class cannot modify the node they are visiting and for this purpose the second class should be subclassed. Visitor methods for nodes of type X are named visit_X and there is a generic_visit() method as well.

Symbol Table

1- Must support line numbering and collapsing.
2- Given varname should return uses and/or defs (how to handle use of similar names in different scopes?)
3- Given line number should return variables used and/or defined in that line.

To handle 1, the numbering will be cascaded.
numbers 1,2,3,... show lines of code with all bodies collapsed.
if line m is a while loop, m-n means line n in the body of while.
if line m is an if statement then m-0-i refers to line i in the true branch and m-1-i refers to line i in false branch. The problem with this scheme is that numbers in a tuple are not semantically homogeneous.
Implementation of line numbers will use tuples.

Common Subexpression Elimination

This should be written as an independent pass that can be called whenever needed.

Boolean Simplification

In a couple of places in the code we need to use Boolean logic to simplify the code. This happens in Normalization and in Code Simplification. In both cases we need simplify one formula. In code simplification this pretty straightforward, as the formula has only one operator. In Normalization it might be of arbitrary width. It's important to apply proper Boolean algebra rules in situation.

AST

Currently we translate python AST to our AST. Is this necessary? An alternative solution might be using python AST and potentially subclassing it.

Code Inlining

Implementation note: PyPy? supports code inlining, but after a first look at the source look, it seems that we cannot use their code. Inlining in PyPy? is considered a backend optimization. After they construct a flow graph and annotating it with information about variable types, they pass it to the code inliner. If we want to use inlining we may need to accept other data structures and passes of their translation process.

Unique Prefix for Names

Sometimes compiler needs to generate prefixes for names to make sure the name is unique. Prefixes of some names may need to be consistent with each other. (For example when modify variable names in a function to inline the function, we want a unique prefix for all variables). We need a component that generates a unique name for a given name. This unique name might be based on some extra input (like function name in inlining example).