Compiler Design Notes

Course Evaluation (Final grade: A)

This is definitely a course you should take if you are interested in how the program got compiled into high-performance executables. The workload is large, but this is one of the most interesting courses I have taken in CMU.

Does it help one become a better SDE? Yes! I can start thinking about why using one option to code is better (Dynamic dispatch vs Static dispatch in Rust).

It’s also a great opportunity for one to learn a new language, like Rust / OCaml. I learned Rust in this course, and really love this programming language!

Blog Chapters

Chapter 1: Overview of Compiler Design
Chapter 2: Instruction Selection
Chapter 3: Register Allocation
Chapter 4: Elaboration(after par/lex..)
Chapter 5: Static Semantics
Chapter 6: Grammars
Chapter 7: Lexical Analysis
Chapter 8: Function calls
Chapter 9: SSA
Chapter 10: Dynamic Semantics
Chapter 11: Mutable
Chapter 12: Dataflow Analysis
Chapter 13: Register allocation optimization
Chapter 14: Peephole optimization
Chapter 15: Memory optimization
Chapter 16: Loop optimization
Chapter 17: Garbage Collection
Chapter 18: More advanced features

Chapter 1: Overview of Compiler Design

What makes a good Compiler: metrics

correctness
code quality: compiled code runs fast
efficiency: compilation runs fast
usability: provides errors/warnings, …

Compiler Design

structure compilers
applied alg & data structures
focus on sequential imperative programming languages
- not functional, parallel, distributed, OOP…
code generation and optimization

Organizing a compiler

Front

split work into different phases
Lexical analysis -> Token stream
Parsing -> Abstract syntax tree (mark body of while loop…)
Sementic analysis (type check, variable initialization)

Middle

IR(intermediate representation) Generation -> Intermediate representations
Optimize (most challenging)

Back

Instruction selection -> Abstract assembly
Register allocation -> ASM Middle and Back has unclear distinctions

Back to Blog Chapters

Chapter 1.1: Instruction Selection

Compiler phase
IR tree -> abstract assembly

Example:

x = 5
return x+x+x*2

->>> Instruction selection
x = 5
temp1 = x + x 
temp2 = x * 2
ret_reg = t1 + t2
ret

IR tree (more expressions, statements..)

Programs p ::= s1,...sn (sequence of statements)
statements s ::= x = e 
                return e

Expressions:
e ::= c int const
      x variable
      e1 ⊕ e2 binary OP (nested)
      ⊕ ::= +1 * 1 / 1 ...

Abstract Assembly (flat)

Program: p ::= i1, ... in (instructions)
Instructions i::= d <- s move
                = d <- s1 ⊕ s2 bin op (sometimes one of the source works as dst)
                = ret return
Operands:
    d,s ::= r register (usually* finite numbers as defined)
          = c int const
          = t temps (variables)
          = x var

Translations Expr

translate(e1 ⊕ e2) = translate(e1); translate(e2);
res1 ⊕ res2?
Better: 
trans(d,e): seq of instructions that stores value of e in destination d. 


e           trans(d,e)
x           d <- x
c           d <- c
e1 ⊕ e2     trans(t1, e1), trans(t2, e2), d<-t1⊕t2, (t1 and t2 are fresh temps)

Translate statements

trans'(s): seq of instru that inlements s
s           trans'(s)
x = e       trans(x,e)
return e    trans(ret,e) return (ret: return register)

Example

IR prog: 
z = (x + 1) * (y * 4)
return z

trans'(p) 
= trans'(z = (x + 1) * (y * 4)), trans'(return z)
= trans(z,(x + 1) * (y * 4)),trans(ret, z), return
= trans(t1,x+1), trans(t2,y * 4), z<- t1 * t2, ret<-z, return
= t3 <- x, t4 <- 1, t1 <- t3 + t4, t5 <- y, t6 <- 4, t2 <- t5 * t6, z <- t1*t2, ret<-z, return
Optimize? directly use x and y instead of moving them to temps

How to improve

Add special cases: for example c ⊕ e2
Optimization pass after the first pass of translation (common approach)
Different translation

Constant propagation

goal: eliminate move t <- c, p by replacing t with c in p

But: stop replacing t if it’s written again

Example: 
t <- 4
x <- t+1   ---> x <- 4+1 
t <- 5
ret <- t   --NO--> ret <- 4
return 

Copy propagation

goal: elim move d <- t,p by replacing d with t in p, But: step replacing if d is written or if t is written

t <- 5+1
d <- t 
x <- d+1 ----> x <- t+1
t <- 5+2
ret <- d+1 ---No---> ret <- t + 1
ret

Static Single Assignment Form

- every temp is assigned at most once
- don't have to check "writes" but can replace all occurances in propagations
- Conversion to SSA -> user version nums

t <- 5+1
d <- t 
x <- d+1 ----> x <- t+1
t <- 5+2
ret <- d+1 ----No---> ret <- t + 1
ret

----->>

t0 <- 5+1
d0 <- t0
x0 <- d0+1
t1 <- 5+2
...

Back to Blog Chapters

Chapter 2: Register Allocation

Goal: assign registers and stack locations to temps
X86-64: 16 registers, no temps
stack locations, when keeping track of more variables than registers
Strategy
1. Store all temps on the stack (CON: inefficient, still need registers for efficiency)

IR trees(simplified syntax tree) 
--> Instruction selection --> ASM 
--> reg alloc --> ASM
--> x86 asm


Example: 
d <- s1 ⊕ s2
-> reg alloc

rlld <- exd * 4(rsp)

Difficulty: x86 has 15 gen purpose registers

Goal: assign each variable a register
may have to use stack locations and clever use of registers for variables

Interference:

x <- 14
y <- 15 
z <- x+y

x <- 14
y <- 15 + x 
ret <- 4+y
ret
if x is not used again, we can use overwrite the register for y

Rigth IR for reg alloc?

3 addr abs asm

d <- s1 + s2
d <- s1 / s2
...

2 addr abs asm

d <- s1
d <- d + s2

d <- s1
d <- d / s2
...

abstract x86

MOVL S1,d
ADDL S2,d
...
IDIVL
edx:eax / s2 -> eax

MOVL s1, %eax
CLTD (sign-extends eax into edx:eax.)
----
https://stackoverflow.com/questions/17170388/trying-to-understand-the-assembly-instruction-cltd-on-x86
What this means in practice is that edx is filled with the most significant bit of eax (the sign bit). For example, if eax is 0x7F000000 edx would become 0x00000000 after cdq. And if eax is 0x80000000 edx would become 0xFFFFFFFF.
----
IDIVL s2 (edx:eax / s2)
MOVL %eax, d

Reg Alloc at 3-Addr Assem

leave one register unassigned for later conversion (r11d)

For example

d <- s1 ⊕ s2
--> 
MOVL s1, %r11d
ADDL s2, %r11d
MOVL %r11d, d
(one of them has to be a register
this will always work, but may not be the optimzied solution)

Reg Alloc at 2-Addr Assem

For example

d <- s1 + s2
--> 
d <- s1
d <- s2 + d

How to allocate registers?

Goal: minimize spilling to the stack

One method: graph-based, greedy register allocation

Build interference graph

Vertices are registers and temps
Edges indicate interference (need diff registers)

Example: 
t1  -- t2 -- t3 -- eax
        |           |
         -----------

Complexity pf deciding if two temps interfere? Undecidable
Why? Reduction to the halting problem
x <- 5
y <- 6
complex code (doesn't use x and y)
ret<- x + y
return

x and y interfere iff complex code terminates


Approximate Interference with liveness
Liveness: temp t is live at line l if t might be used in the 
following computation

For L1: work backwards
temp t is live at l if either of them is valid:
- t is read at l
- t is live at l+1 and not written at line l

Example: 
x1 <- 1            -
x2 <- 1           x1
x3 <- x1 + x2     x2,x1
x4 <- x3 + x2     x3,x2
x5 <- x4 + x3     x3,x4
ret <- x5         x5
return ret        ret register

life range of t1: 2-3
life range of x2: 3-4
..


t1 <- 1     -
t1 <- t1+1  t1
t2 <- t1    t1

Construct interf graph:
Option 1: Add edge between t1 and t2 if they have overlapping live ranges
Option 2: rule 1: For every instruction d <- s1 ⊕ s2
                        add an edge {d,t} if t is  live at the next instruction
          rule 2: For every move d <- s, add an edge {d,t} 
                  that t not in {s,d}, and is live at the next instruction

Find coloring with as few colors as possible: Adjacent vertices have different colors

Given: Graph G = (V,E) and k colors
Question: Is there a coloring of V with k colors such that adjacent verticies have different colors
Complexity: NP complete for k >= 3 

What folllows for reg allocation? 

For a Turing-comp lang
It's undecidable for a prog if there's an equiv prog that uses k registers (and no tem)

Assign colors to registers and stack locations

Greedy graph coloring: get a minimal coloring for most programs

Assume colors are number 1,2,3...

Let N(v) be the nbs of v
Input: G(V,E) and ordering V=v1,...,vn
Output: Coloring of V: col:V->{1,..k}

- Order matters

x5 

x4-x3-x2-x1

Order 1: x5,x4,x3,x2,x1,ret

Order 2: x1,x4,x3,x2,x5,ret

Th: There exists an ordering that produces an optimal coloring

How to find optimal ordering for chordal graphs?

Def: A graph is chordal if every cycle with >= 4 vertices has a chord 
(edge between two verticies in the cycle that is not in the cycle)

Example

a - b
|   |
c - d
not chordal

a - b
| / |
c - d
chordal

a - b
| x |
c - d
chordal


Intuition: how to get long cycle w/o chord?
a over lap with b and c
we want d to be not overlapping with a

Maximum cardinality search

Input: G, Op: ordering of v
for each vertex in V, set wt(v) <- 0

For all v ∈ V set wt(v) ← 0
Let W ← V
For i ← 1 to n do
      Let v be a node of maximal weight in W
      Set vi ← v
      For all u ∈ W ∩ N(v) set wt(u) ← wt(u) + 1
      Set W ← W \ {v}


MCS Ordering returns: simplicial elimination ordering if G is chordal
Def: v is simplicial in G in N(v) is a clique
A simplicial vertex is one whose neighbors form a clique: every two neighbors are adjacent. 


Def: A simpl elim ordering is an ord v1, ... vn st vi is simplicial in
Gv1,..vi <- subgraph induced by v1, .. vi (picked)

Theorem 1: The graph is chordal iff it has an simplicial elim ordering

Proof..

Theorem 2: The greedy coloring alg finds an opt coloring if run with a simpl elim ordering

Proof: Let k be the number of colors used
Observations: 
- k <= max|Ni(vi)| + 1
- #min colors >= max|Ni(vi)| + 1

-> Theorem: MCS returns a sompl elem ordering iff G is chordal

Spill: assign remaining colors to stack loc.

strategies: 
number of uses of temps in code, higher use freq get the registers
incorp loop nestings
highest colors (high color may be used in less time, approx for 1)

Summary

Build the interference graph (different ways)
Order the vertices with MCS
Color with greedy alg
Spill if # colors > 13

Liveness analysis & interence rules

live(l,x) ~ x i live at line l
d != u
l: d <- x ⊕ y live(l+1, u)
--------------------------
      live(l, u)


l: d<- x ⊕ y
------------------
    live(l, x)
    live(l, y)

l: return
------------------
   live(l, ret)


l: x <- c
live(l+1, u), x!=u
-------------------
      live(l,u)

l: x<-y, u!=x
live(l+1, u)
--------------
live(l,u)


l:x<-y
-------
live(l,y)

Using derivation tree to prove if x live at line l

General Saturation Alg

Can be used for all predicates
1: start with empty set of facts <- derived predicates
2: pick arg (here: l and t) and a rule with live(l,t) in the conclusion
so that the premises are already facts
3: Repeat until no facts can be derived

Will always stop

Refactoring liveness rules

use(l,x)
---------
live(l,x)

*: l' is a possible succesor
live(l'x) succ(l,l') ~def(l,x) 
---------------------------------
      live(l,x)

Need to define:

use(l,x) ~ x is used at l 
def(l,x) ~ x is def at l
succ(l,l') ~ l' can be a succ of l

Example

l: d <- x ⊕ y
-------------
use(l,x)
use(l,y)
def(l,d)
succ(l,l+1)

Adding loops and conditionals:

i = l: d <- x ⊕ y
.
.
.
l: goto l' (uncond jump)
l: if(x?c) then lt then lt else lf(cond jump)

New rules:

l: goto l'
-----------
succ(l,l')

l: if(x'c) then lt
else lf
-------------------
succ(l,lt)
succ(l,lf)
use(l,x)


Keep doing iterations until we cannot add more new to the liveness sets
pass 1, pass 2, ...

Interference graph

1. Overlapping live ranges

live(l,x) live(l,y)
-------------------
inter(x,y)

But doesn't work in the presence of that code dead code

Example: 
a <- 1
b <- 2
ret <- a
return


2. Assignment based
More sparse

l: x <- y ⊕ z live(l+1, u), u!=x
---------------------------------
inter(x,u)

l:x<-y, y!=u
live(l+1, u) x!=u
------------------
inter(x,u)

l: x<-c, u!=x
live(l+1, u)
--------------
inter(x,u)

Chapter 3: Elaboration

Goal: minimal repr and clear of progr

remove syntactic sugar
make scope explicit

Parsing -> Parse tree -> elaboration -> AST(<-Semantic analysis)

Example: 
turn for lops into while loops

for(int x=4, x<8128, x++){
      y = y+x
}
------> elab
use declaration(x,int,s)
{int x=4; while(x<8128)..}
decl(x, int, while(x<8218){..})

// Before elaboration
for (int x = 4; x < 30; x++) { y = y + x }


// After elaboration
{
int x = 4;
while (x < 30) { y = y + x; x += 1 }
}

abstract syntax

declare(x, int,seq(assign(x, 4),
                  while(x < 30,
                  seq(assign(y, y + x), assign(x, x + 1)))))

The extra scope is necessary
Rather than continuing to perform manipulations on surface syntax, we introduce the BNF of an elaborated abstract syntax


Expressions e ::= n | x | e1 ⊕ e2 | e1 /o e2 | f(e1, . . . , en)
| e1 ? e2 | !e | e1 && e2 | e1 || e2

/0:  potentially effectful operators (such as division or shift, which could raise an exception)

⊕ for effect-free operators
? for comparison operators returning a boolean, !, &&, and || for logical negation, conjunction, and disjunction, respectively

Statements s ::= 
declare(x, τ, s) | 
assign(x, e) | 
if(e, s1, s2) | 
while(e, s)| 
return(e) | 
nop | 
seq(s1, s2)

Inf rule

<tp> τ
<ident> x
<exp1> e1
<exp2> e2
<stmt1> s1
<stmt2> s2
-----------------------------------------------------
for (<tp> <ident> = <exp1>; <exp2>; <stmt1>) <stmt2>
 declare(x, τ,while(e1,seq(s2, s1)))

When we want to translate a for loop that
matches the pattern at the bottom, we have to do six things: 
elaborate the type,
elaborate the identifier, 
elaborate both expressions, 
elaborate the afterthought, and
then elaborate the loop body. 
x++ -> assign(x, x + 1)
make sure any nested for loops in <stmt2>, the body of our for loop, are elaborated

Why IR tree

isolate potentially effectful expressions, making their order of execution explicit.simplifies instruction selection and also means that the remaining pure expressions can be optimized much more effectively.

make the control flow explicit in the form of conditional or unconditional branches, which is closer to the assembly language target and allows us to apply standard program analyses based on an explicit control flow graph.

Pure Expressions p ::= n | x | p1 ⊕ p2

Commands c 
::= 
x ← p
| x ← p1 /0 p2
| x ← f(p1, . . . , pn)
| if (p1 ? p2) then lt else lf
| goto l
| l :
| return(p)
Programs r ::= c1 ; . . . ; cn

Translating Expressions

tr(e) = <ˇe, eˆ>
where eˇ is a sequence of commands r that we need to write down to compute the
effects of e and eˆ is a pure expression p that we can use to compute the value of e back up. 


tr(n) = <·, n>
tr(x) = <·, x>
tr(e1 ⊕ e2) = <(ˇe1 ; ˇe2), eˆ1 ⊕ eˆ2>?

Translating Statements

tr(assign(x, e)) = ˇe ; x ← eˆ
tr(return(e)) = ˇe ; return(ˆe)
tr(nop) = ·
tr(seq(s1, s2)) = ˇs1 ; ˇs2


tr(if(e, s1, s2)) = ˇe ;
if (ˆe != 0) then l1 else l2 ;
l1 : ˇs1 ;
goto l3 ;
l2 : ˇs2 ;
l3 :



tr(while(e, s)) = l1 : ˇe;
if (ˆe != 0) then l2 else l3 ;
l2 : ˇs ; goto l1;
l3 : 

Translating Boolean Expressions

cp(e1 ? e2, l, l') = 
ˇe1 ; ˇe2 ;
if (ˆe1 ? eˆ2) then l else l'


cp(!e, l, l') = cp(e, l', l)

cp(e1 && e2, l, l') = 
cp(e1, l2, l');
l2 : cp(e2, l, l')


cp(e1 || e2, l, l') = left to the reader

cp(0, l, l') = goto l'

cp(1, l, l') = goto l

cp(e, l, l') = 
ˇe ;
if (ˆe != 0) then l else l'


tr(if(b, s1, s2)) = cp(b, l1, l2)
l1 : tr(s1) ; goto l3
l2 : tr(s2) ; goto l3
l3 :


tr(e) = <cp(e, l1, l2);
l1 : t ← 1 ; goto l3
l2 : t ← 0 ; goto l3
l3 :
, t>

Extended basic blocks

 instead of basic blocks being sequences of commands ending
in a jump, they are trees of commands that branch at conditional statements and
have unconditional jumps (goto or return) as their leaves

Chapter 4: Static Semantics

abstract syntax

After lexing and parsing, a compiler will usually apply elaboration to translate the parse tree to a high-level intermediate form often called abstract syntax. Then we verify that the abstract syntax satisfies the requirements of the static semantics.


C0:
• Initialization: variables must be defined before they are used.
• Proper returns: functions that return a value must have an explicit return statement on every control flow path starting at the beginning of the function.
• Types: the program must be well-typed.

Abstract Syntax as defined in last chap

Def

we would like to raise an error if there is a possibility that
an unitialized variable may be used. 

may-property
use(e1/2, x)
-----------
use(e1 && e2, x)

use(e1/2, x)
-----------
use(e1 ⊕ e2, x)


must-property
def(s, x) if the execution of statement s will define x.


--------------------
def(assign(x, e), x)



def(s1, x) def(s2, x)
-----------------------
def(if(e, s1, s2), x)

A conditional only defines a variable if is it defined along
 both branches, and a while loop does not define any variable 
 (since the body may never be executed).

def(s1/2, x)
-----------------
def(seq(s1, s2), x)

def(s, x) y != x
---------------------
def(decl(y, τ, s), x)
x is declared at most once on a control-flow
path. 

--------------------
def(return(e), x)
Since a return statement will never
pass the control flow to the next instruction, any subsequent statements are unreachable. It is therefore permissible to claim that all variables currently in scope have been defined. 

Liveness

We observe that liveness is indeed a may-property, since a variable is live in a conditional if is used in the condition or live in one or more of the branches.


use(e, x)
----------------
live(assign(y, e), x)


use(e, x)
----------------
live(if(e, s1, s2), x)


use(s1, x)
----------------
live(if(e, s1, s2), x)


use(s2, x)
----------------
live(if(e, s1, s2), x)

use(e, x)
----------------
live(while(e, s), x)

live(s, x)
----------------
live(while(e, s), x)

use(e, x)
----------------
live(return(e), x)

no rule for
live(nop, x)

live(x, s) y != x
----------------
live(decl(y, τ, s), x)


live(s1, x)
----------------
live(seq(s1, s2), x)

¬def(s1, x) live(s2, x)
----------------
live(seq(s1, s2), x)

Initialization

captures the general condition.


should be read from the premises
to the conclusion.
decl(x, τ, s) in p   live(s, x)
------------------------------
error


from the conclusion to the premises


------------------------------
init(nop)


init(s1) init(s2)
------------------------------
init(seq(s1, s2))


init(s) ¬live(s, x)
------------------------------
init(decl(x, τ, s))

From Judgments to Functions

init : stm → bool
init(nop) = T
init(seq(s1, s2)) = init(s1) ∧ init(s2)
init(decl(x, τ, s)) = init(s) ∧ ¬live(s, x)

live(nop, x) = ⊥
live(seq(s1, s2), x) = live(s1, x) ∨ (¬def(s1, x) ∧ live(s2, x))
live(decl(y, τ, s), x) = y != x ∧ live(x, s)
. . .

...
init(δ, s, δ'): 
assuming all the variables in δ are defined when s is reached, 
no uninitialized variable will be referenced and after its execution 
all the variables in δ' will be defined.

use(δ, e): e will only reference variables defined in δ.

δ |- s ⇒ δ' for init(δ, s, δ').
δ |- e for use(δ, e).

δ ⊢ s1 ⇒ δ1
δ1 ⊢ s2 ⇒ δ2
______________
δ ⊢ seq(s1, s2) ⇒ δ2


δ ⊢ e
_________
δ ⊢ assign(x, e) ⇒ δ ∪ {x}


δ ⊢ e
δ ⊢ s1 ⇒ δ1
δ ⊢ s2 ⇒ δ2
_________________
δ ⊢ if(e, s1, s2) ⇒ δ1 ∩ δ2


δ ⊢ e
δ ⊢ s ⇒ δ'
_________
δ ⊢ while(e, s) ⇒ δ



In particular, declare(x, τ, s) means that the variable x is declared (only) within the statement s.
δ ⊢ s ⇒ δ'
___________
δ ⊢ decl(y, τ, s) ⇒ δ' - {y}


δ ⊢ e
_________
δ ⊢ return(e) ⇒ {x | x in scope}



Typing Judgment for Statements:
The notation Γ ⊢ s : [τ] is a typing judgment for statements,
where Γ is the type environment (context), 
s is a statement, 
and τ is the type. 
This notation signifies that the statement s is well-typed within the 
context Γ and ultimately returns a value of type τ

Chapter 5: Grammars

Grammars are a general way to describe languages.
For a given grammar G with start symbol S, a derivation in G is a sequence of rewritings
S → γ1 → · · · → γn = w

A context-free grammar consists of a set of productions of the form X −→ γ, where X is a non-terminal symbol and γ is a potentially mixed sequence of terminal and non-terminal symbols.

While the parse tree removes some ambiguity, it turns out that the example grammar is ambiguous in other, more important, ways.

we can add that string specifically as the new base case
It is important that programming languages be unambiguous in practice. 
 
We can usually rewrite grammars to remove ambiguity, but sometimes we extend the language of context-free grammars to resolve ambiguity. One example is explicitly stating precedence and associativity as a way of resolving ambiguities.

CYK Parsing

CYK Parsing

The Cocke-Younger-Kasami (CYK) parsing algorithm is a method used to check if a given string can be generated by a context-free grammar (CFG). 
It's especially useful in computational linguistics and computer science for parsing strings. The algorithm requires the CFG to be in Chomsky Normal Form (CNF), 
where production rules are either a non-terminal producing two non-terminals, or a non-terminal producing a terminal.

How CYK Works
Input and Initialization: The algorithm takes a string w of length n and a CFG G in CNF. It initializes a table to store which grammar symbols can generate which substrings of w.

Filling the Table: The algorithm fills a table P, where P[i,j] holds non-terminal symbols that can generate the substring w[i,j]. It starts with the smallest substrings 
(single characters) and combines them to form larger substrings.

For each terminal in w, it adds corresponding non-terminals to P as per the grammar rules.
For each possible split of substrings, it checks if their combination can be generated by any non-terminals as per the grammar rules.
Iterating Over Substrings: It considers all possible substrings of w by their length, from shortest to longest, and checks all possible splits for each substring, looking 
for non-terminals that could generate these parts.

Checking for Acceptance: After filling the table, it checks if the start symbol S is in the top-right cell of the table (P[1,n]). 
If S is present, the string w is accepted by the grammar.

Complexity
The CYK algorithm has a time complexity of O(n^3 * |G|), where n is the length of the string and |G| is the size of the grammar. This complexity comes from considering all substrings, 
all possible splits of each substring, and the grammar size.

Applications
The CYK algorithm is used in syntax checking, natural language processing, and compiler design, where it's important to determine if strings follow specific grammatical rules.

By using dynamic programming, the CYK algorithm efficiently parses strings and determines their grammatical structure under a given CFG.

https://www.youtube.com/watch?v=VTH1k-xiswM

Shift-reduce parsing

Deductive parsing involves two primary rules originally used in the CYK parsing algorithm, but when we interpret these for shift-reduce parsing, 
we make some modifications.

Rule D1: This is an assertion rule where for any terminal symbol 'a' found in the input string, we can deduce that 'a' derives itself. This is represented as:

a : a


Rule D2: This is a combination rule that states if you have a production in the grammar where a non-terminal X leads to a sequence of symbols (γ1 ... γn), 
and you have established that for substrings w1 through wn of your input string w0, each wi derives γi, 
then you can conclude that the concatenated string w1...wn is derived from X. This rule is written as:

[r] X → γ1 ... γn
w1 : γ1 ... wn : γn
------------------------
w1 ... wn : X


Certainly! Let's put the explanation of the rules for deductive shift-reduce parsing in plain text form:

Deductive parsing involves two primary rules originally used in the CYK parsing algorithm, but when we interpret these for shift-reduce parsing, we make some modifications.

Rule D1: This is an assertion rule where for any terminal symbol 'a' found in the input string, we can deduce that 'a' derives itself. This is represented as:


a : a
Rule D2: This is a combination rule that states if you have a production in the grammar where a non-terminal X leads to a sequence of symbols (γ1 ... γn), 
and you have established that for substrings w1 through wn of your input string w0, each wi derives γi, 
then you can conclude that the concatenated string w1...wn is derived from X. This rule is written as:


[r] X → γ1 ... γn
w1 : γ1 ... wn : γn
------------------------
w1 ... wn : X

When we adapt these rules for shift-reduce parsing, we modify the form of the facts used:

Instead of w : γ, where γ is a sequence of terminals and non-terminals, we look to conclude facts of the form w : β, 
where β is a possibly empty series of terminals and non-terminals.

We then have two operations: shift and reduce, focusing on the rightmost elements of w or β. The rules are:

Start Rule:
This is the base case for parsing, where you start with an empty string deriving an empty string:
ε : ε

Shift Rule:
If you have a fact w : β and 'a' is a terminal symbol next in the input string, you can append 'a' to both w and β:
w : β
w a : β a


Reduce Rule:
If there's a production where X leads to α and you have a string w followed by α in β, you can reduce this by recognizing α as a unit derived from X:

[r] X → α
w : β α
------------
w : β X

The reduce rule is the key to recognizing higher-level structures. Each time you apply it, you replace a sequence of symbols 
(which matches the right-hand side of a production rule) with the non-terminal symbol from the left-hand side of that production rule.

If a grammar is unambiguous, this means
that, as we apply a series of rules to try to derive w : S from ε : ε, there is at most one rule we can apply that will lead us to success.

If we can successfully predict what the next step should be at every point, then we can implement this proof search with a stack 
holding the terminals and nonterminals (on the left) and a queue or array index tracking the unprocessed tokens (on the right).

What would we need to know how to always make the right decision for the grammar above?



We will begin constructing a parse table, where the columns
correspond to the next unshifted token and the rows correspond to patterns that we match against the stack β.

Parse Table

Conflicts

Cases where we can make different decisions and still successfully parse the string are called conflicts.

Example:
we have decided to resolve the conflicts by giving a precedence to the operators and declaring both of them to be left-associative

It is also possible to have reduce/reduce conflicts

For many parser generators, the default behavior of a shift/reduce conflict is to shift, and for a reduce/reduce conflict to apply the textually first production in the grammar.

LR(1) and more?

The set of languages that we can parse with shift-reduce parsers that have lookahead 1 is called LR(1). But even though a language may be 
describable with an LR(1) grammar, it’s not necessarily the case that every grammar for an LR(1) language can be parsed with a shift-reduce parser. 


Even though the grammar is unambiguous, to parse it correctly, we’d need arbitrary lookahead – we’d need to look over an arbitrary 
number of b tokens to find whether they were followed by a c or a d.

Chapter 6: Lexical Analysis

turn a raw byte or character input stream coming from the source file into a token stream by chopping the input into pieces and skipping over irrelevant details.

classify input tokens into types like INTEGER or IDENTIFIER or WHILE-keyword or OPENINGBRACKET...

Lexers are specified by regular expressions. Classically, however, they are implemented by finite automata.

deterministic finite automaton (DFA). At every state and every input there is at most one edge enabling a transition.

But in general, finite automata can be nondeterministic finite automata (NFA). That is, for the same input, one path may lead to
an accepting state while another attempt fails.

Regular Expressions -> Nondeterministic Finite Automata

Nondeterministic Finite Automata -> Deterministic Finite Automata

 instead of a single state, we now consider the set of states in which we could be.

Another operation that is often done by lexer generator tools is to minimize the resulting DFA by merging states and reducing the number of states and transitions in the automaton.

Lexical analysis reduces the complexity of subsequent syntactical analysis by first dividing the raw input stream up into a shorter sequence of tokens, 
each classified by its type (INT, IDENTIFIER, REAL, IF, ...). 
The lexer also filters out irrelevant whitespace and comments from the input stream so that the parser does not have to deal with that anymore. 
The steps for generating a lexer are

1. Specify the token types to be recognized from the input stream by a sequence of regular expressions

2. Bear in mind that the longest possible match rule applies and the first production that matches longest takes precedence.

3. Lexical analysis is implemented by DFA.

4. Convert the regular expressions into NFAs (or directly into DFAs using derivatives).

5. Join them into a master NFA that chooses between the NFAs for each regular expression by a spontaneous ε-transition

6. Determinize the NFA into a DFA

7. Optional: minimize the DFA for space

8. Implement the DFA for a recognizer. Respect the longest possible match rule by storing the last accepted token and 
backtracking the input to this one if the DFA run cannot otherwise complete.

Back to Blog Chapters

Chapter 8: Function Calls

IR

two lower-level forms of intermediate representation for function calls. 
1. d ← f(s1, . . . , sn)
2. call f

calling conventions

Parameters
first six arguments are passed in registers, the remaining arguments
are passed on the stack
all arguments take 8 bytes of space on the stack, even if the type of
argument would indicate that only 4 bytes need to be passed

callee, should set up its stack frame, reserving space for local variables, spilled temps that could not be assigned to registers, 
and arguments passed to functions it calls in turn

It is recommended to calculate the total space needed statically and then decrementing the stack pointer %rsp by the appropriate amount only one time within the function

%rsp should be aligned 0 mod 16 before another function is called,
and may be assumed to be aligned 8 mod 16 on function entry.


Return
The result is returned in a specific return register %rax.





...
n + 16(%rsp) argument 8
n + 8(%rsp) argument 7
n + (%rsp) return address
local variables         Callee
(%rsp) end of frame     Callee

Registers convention

Abstract x86-64 
form Register Usage function Preserved accross calls
res0 %rax return value∗ No
arg1 %rdi argument 1 No
arg2 %rsi argument 2 No
arg3 %rdx argument 3 No
arg4 %rcx argument 4 No
arg5 %r8 argument 5 No
arg6 %r9 argument 6 No
ler 7 %r10 caller-saved No
ler 8 %r11 caller-saved No

lee9 %rbx callee-saved Yes
lee10 %rbp callee-saved∗ Yes
lee11 %r12 callee-saved Yes
lee12 %r13 callee-saved Yes
lee13 %r14 callee-saved Yes
lee14 %r15 callee-saved Yes
%rsp stack pointer Yes

Typical Calling Sequence

d ← f(s1, s2, s3):

arg3 ← s3
arg2 ← s2
arg1 ← s1
call f
d ← res0


l : call f
caller-save(r)
---------------- J'8
def(l, r)

caller-save(r) is true of register r among %rax, %rdi, %rsi, %rdx, %rcx, %r8, %r9, %r10, and %r11.

Now if a temp t is live after a function call, we have to add an infererence edge connecting t with any of the fixed registers 


Callee-saved Registers
%rbx, %rbp, %r12, %r13, %r14 and %r15
The standard approach is to save those that are needed onto the stack in the function prologue and restore them from the stack in the function epilogue, just before returning.
**: saving and restoring them all is safe, but may be overkill
for small functions

If we need more than the available number of caller-saved registers, we assign callee-save registers before we resort to spilling,
but make sure the save them at the beginning of a function and restore them at the end. 


f :
t1 ← lee9
t2 ← lee10
· · ·
function body
· · ·
lee10 ← t2
lee9 ← t1
ret

Liveness about calling convention

1. During a call, res0, arg1-6 and ler7, ler8 are defined.
2. For each line l and each temp t defined at l, we create an edge between t and any variable live in the successor.

All precolored registers implicitly interfere with each other, so we don’t include that in the interference graph.

Chapter 9: SSA

relabel variables in the code so that each variable is defined
only once in the program text. If the program has this form, called static single assignment (SSA),


then we can perform constant propagation immediately
There are other program analyses and optimizations for which it is convenient to have this property..

Basic blocks

int dist(int x, int y) {
x = x * x;
y = y * y;
return isqrt(x+y);
}

dist(x0,y0):
x1 <- x0 * x0
y1 <- y0 * y0
t0 <- x1 + y1
t1 <- isqrt(t0)
return t1

Loops

int pow(int b, int e)
//@requires e >= 0;
{
int r = 1;
while (e > 0)
//@loop_invariant e >= 0;
// r*b^e remains invariant
{
r = r * b;
e = e - 1;
}
return r;
}

pow(b0,e0):
      r0 <- 1
      goto loop(b0,e0,r0)
loop(b1,e1,r1):
      if (e1 > 0)
            then body(b1,e1,r1)
            else done(b1,e1,r1)
body(b2,e2,r2):
      r3 <- r2 * b2
      e3 <- e2 - 1
      goto loop(b2,e3,r3)
done(b3,e4,r4):
      return r4

Minimize SSA Form

φ Function form to minimized form.
It’s quite simple: repeatedly remove φ-functions of the form
ti = φ(tx1, tx2, . . . txk)


pow(b0,e0):
r0 <- 1
      goto loop
loop:
      b1 <- phi(b0,b2)
      e1 <- phi(e0,e3)
      r1 <- phi(r0,r3)
      if (e1 > 0) then body else done
body:
      b2 <- phi(b1)
      e2 <- phi(e1)
      r2 <- phi(r1)
      r3 <- r2 * b2
      e3 <- e2 - 1
      goto loop

done:
      b3 <- phi(b1)
      e4 <- phi(e1)
      r4 <- phi(e1)
      return r4

-------------------------------------------
pow(b0,e0):
      r0 <- 1
      goto loop
loop:
      e1 <- phi(e0,e3)
      r1 <- phi(r0,r3)
      if (e1 > 0) then body else done
body:
      r3 <- r1 * b0
      e3 <- e1 - 1
      goto loop
done:
      return r1


The new form on the right is, of course, no longer in SSA form. Therefore one cannot apply any SSA-based optimization. Conversion out of SSA should therefore be one of the last steps before code emission. 

At this point register allocation, possibly with register coalescing, can do a good job of eliminating redundant moves.

Chapter 10: Dynamics

Denotational Semantics

Each part of a program is associated with a denotation

Axiomatic Semantics

Strongly related to program logic
Gives meaning to phrases using logical axioms
Operational Semantics
Related to interpreters and abstract machines
Most popular and flexible form of semantics

Expressions

• Many different styles
- Natural semantics (or big-step semantics or evaluation dynamics)
- Structural operational semantics
- Substructural operational semantics
- Abstract machine (or small-step with continuation)

abstract machine:
Very general
Low-level and elaborate

e > K
valuate expression e and pass the result to the K

Binary operations
With effects

Constant and the empty continuation, stop

Boolean, short cutting

Variable: env that maps variables to values
Never changes when evaluating expressions

Executing Statements

Don't pass values to the continuation
Usually have effects on the env

Function Calls

What needs to happen at a function call?
• Evaluate the arguments in left-to-right order
• Save the environment of the caller to continue the execution after the function call
• Save the continuation of the caller
• Execute the body of the callee in a new environment that maps the
formal parameters to the argument values
• Pass the return value to the environment of the caller

Back to Blog Chapters

Chapter 11: Mutable

Pointers and Arrays

• Static semantics of pointers
Extend:
types with pointer types
expressions with alloc, deref, and null ptr
use an indefinite (polymorphic) type any* for synthesis for NULL
We can compare two pointers using p==q if the have the same type


• Dynamic semantics of pointers
value of a type ptr* is an address that stores a value of type
(or a special address 0)
Allocations return fresh (unused) addresses
Dereferencing retrieves a stored value
Need heap that maps addresses to values


• Static semantics of arrays
type[]
expression has: alloc_array(type, e) | e1[e2]
destination: d[e]

• Dynamic semantics of arrays
Array Evaluation: Access

H ; S ; η ⊢ e1[e2] ▷ K
   ⟶ H ; S ; η ⊢ e1 ▷ (_[e2], K)

H ; S ; η ⊢ a ▷ (_[e2], K)
   ⟶ H ; S ; η ⊢ e2 ▷ (a[_], K)
   // Need types.

H ; S ; η ⊢ i ▷ (a[_], K)
   ⟶ H ; S ; η ⊢ H(a + i*τ|τ) ▷ K
   // a ≠ 0, 0 ≤ i < length(a), a : τ[]
   // Need array sizes.

H ; S ; η ⊢ i ▷ (a[_], K)
   ⟶ exception(mem)
   // a = 0 or i < 0 or i ≥ length(a)


Default Values of Array Type
We also need a default value for array types
• We will just use 0 as the default value again
• It represents an array of length 0
• We can never legally access an array element in the default array
• Warning: Arrays can be compared with equality
• Make sure that alloc_array(t,0) returns a fresh address different from 0
• If arrays have address a=0 then you should not access M[a-8]

* We cannot translate d1[e2] += e3 to d1[e2] = d1[e2] + e3
Effects of e2 and d1 would be evaluated twice

Heap

H : (N U {next}) -> Val


Memory exceptions -> SIGUSR2
better for debugging

Array len

Alternative 1: Add length at the front, array address points to the start
Alternative 2: Array address points to the first element
Simplifies address arithmetic
Allows to pass pointers to C

Struct

Decl:
struct s;

Defn:
struct s {t1 f1; ... tn fn; };


Arrays are represented with pointers (but cannot be dereferenced)
-> they can be compared and stored in registers

Structs are usually also pointers but they can be dereferenced

Structs are large types that do not fit in registers

C0 restrictions

Local variables, function parameters, and return values must have
small type

Left and right-hand sides of assignments must have small type

Conditional expressions must have small type

Equality and disequality must compare expressions of small type

Expressions used as statements must have small type

Struct static semantics rules

Field names occupy their own namespace: allowed to overlap with
variable, function, or type names (but they must be distinct from
keywords)
• The same field names can be used in different struct definitions
• In a given struct definition, all field names must be distinct
• A struct may be defined at most once

Types struct s that have not (yet) been defined may be referenced as
long as their size is irrelevant

Size is relevant for
- alloc(struct s)
- alloc_array(struct s,e)
- definitions of structs if structs are types of fields

• Struct declarations are optional
‣ An occurrence of struct s in a context where its size is irrelevant
serves as an implicit declaration of the type struct s.

Types:
Extend types with struct types

Expr:
Field access

Lval

Define:
Elab

struct s* x = alloc(struct s)
e->f = (*e).f
must be defined before access

Struct sizes

Struct sizes determined by laying out the fields from left to right
Ints/bools aligned by 4
Pointers aligned by 8
Struct aligned by most restrictive fields


- Need to pick the right instructions (movl vs movq, cmpl vs cmpq)
- Could always use 8 bytes for spilling.
- Maintain size information in IRs!
- It is a good idea to keep temp/registers of different sizes separate
- If you want moves from small to large temps then make conversion
explicit
zeroextend s^32
signextend s^32

Back to Blog Chapters

Chapter 12: Dataflow analysis

Liveness analysis -> interfere?
Neededness analysis -> dead code?
reaching definitions -> const/copy prop

Recap liveness

use(l,x) => instr at line l uses x 
def(l,x) => instr at line l defines x
succ(l,l') => line l' is a successor of l
live(l,x) => temp x is live at line l

use(l,x)
----------
live(l,x)

propagate backwards
only if not defined in l' 
use(l',u), succ(l,l'), -def(l', u)
----------------------------------
live(l,u)

Memory instr in IR

M[y] <- x => store x at addr y
x <- M[y] => load addr y'


we use the src and dest
l: M[y] <- x
-------------
use(l,x)
use(l,y)
succ(l, l+1)


we define dst as x
l: x <- M[y]
-------------
def(l,x)
use(l,y)
succ(l, l+1)

Dead code elimination

Dead code: operations that don’t influence the result of progr/function

Result of progr
- mem effects
- return val
- errors
- non-terminate ? 
Can exist in either source code
or translated ASM,... 
Usually we run it multiple times 

Example: Remove dead code in code Factorial using liveness info

l1: p <- 1
l2: p <- p * x
l3: z <- p + 1 ----------- dead
l4: x <- x-1
l5: if(x > 0) then l2 else l6
l6: return p

Liv anal


l1: p <- 1                          x
l2: p <- p * x                      p,x
l3: z <- p + 1                      p,x
l4: x <- x-1                        p,x   
l5: if(x > 0) then l2 else l6       p
l6: return p                        p      

z is not live at l4 --> can replace l3 with nop
- only if op at l3 has no effects
- like memory operations, division

but if we change that line:
because we propagate liveness of z, we can't use this rule to remove though 
z <- z + 1 is dead code
l1: p <- 1                          x,z
l2: p <- p * x                      p,x,z
l3: z <- z + 1                      p,x,z
l4: x <- x-1                        p,x,z
l5: if(x > 0) then l2 else l6       p,x,z
l6: return p                        p


Conclusion: Liveness does not help much in some cases for removing dead codes.

Neededness Analysis

----Goal: remove ops that----
(a) have no effect and 
(b) don't have a "needed" result

nec(l,x) => x is necessary at l 

l: x<- y effect ops z
-----------------------
nec(l,y)
nec(l,z)


l: return x
-----------------------
nec(l,x)


l: M[y] <- x
-----------------------
nec(l,y)
nec(l,z)


c0 no need to worry about this:
l: x <- M[y]
-----------------------
nec(l,y)


l: if (x?c) then l' else l"
---------------------------
nec(l,x)


~~~~~~~~~~ Like liveness ~~~~~~~~~~

nec(l,x)
----------
needed(l,x)


needed(l',x), -def(l,x), succ(l,l')
-------------------------------
needed(l,x)



A more complex rule: 
l: x <- y
.
.
l': return x 

needed(l',x), succ(l, l'), def(x, l)
--------------------------------------
needed(l,x)



We would not accidentally end non-termination:
For infinite loop:

f:
l1: goto f
l2: ret 0

Complexity O(#vars * #lines)

Look at this problematic version again using neededness:

z <- z + 1 is dead code

First pass, we start at l6, propagate
                                    Needed
l0L z <- 1                          
l1: p <- 1                          x,
l2: p <- p * x                      p,x (the added complex rule)
l3: z <- z + 1                      p
l4: x <- x-1                        p
l5: if(x > 0) then l2 else l6       p
l6: return p                        p


Second pass, we start at l5, propagate
                                    Needed
l0L z <- 1                          
l1: p <- 1                          x,
l2: p <- p * x                      p,x (the added complex rule)
l3: z <- z + 1                      p,x
l4: x <- x-1                        p,x
l5: if(x > 0) then l2 else l6       p,x
l6: return p                        p


?? TODO: Maybe merge in one pass? 

Example for optimization

s => elem size
a => addr of array
n => length of array

Step 1:
l1: i <- 0
l2: if ( 0 <0) then error else l3 -------> const propagation ----> const folding---> jump l3 ----> one more pass ---> (nop)
l3: if ( 0 >=n) then error else l4
l4: t <- 0 * s ----------------------> t <- 0 --------> then maybe t is not needed, so removed
l5: u <- a + t ----------------------> copy propagation, u <- a
l6: x <- M[u]

what if 
l7: i <- i+1
l7: if (i<n) then l2 else l9
l9: return x 

Then problem is this is not in SSA form, that i is defined twice
defs in l1 and l7 each l2
Cannot do constant propagation, so we are not sure i is 0

predicate we want to define: 

reach(l,x,l') 
complexity: O(#line * #line), much larger than O(#var * #line)

Back to Blog Chapters

Chapter 13: Optimizations

Reaching definitions

if we do not do SSA, then this helps us figure out what definitions reach which line

Rules of reach analysis

reaches(l,x,l') = def of temp x at line l reaches line l'


def(l,x)
succ(l,l')
----------
reaches(l'x)


reaches(l,x,l')
not def(l',x)
succ(l', l'')
----------------
reaches(l,x,l'')


l' could be like this: x <- x + y


Complexity O((#line) ^ 2)

Higher than O((#var) * (#line))

What is an optimization?

Compiler phase
optional
goal: make code more efficient

What to optimize

*Runtime
*Mewory use
code size
domain specific(data center network…)
energey use (battery cause trouble..use environment, then it becomes important)

How to pick optimizations?

Cannot implement all
Compare compiled code to other compilers

Register Allocation

Construct inerference graph
Find coloring (MCS + greedy coloring)
Assign 13 colors to registers
Assign remaining colors to stack locations
Coalesce non-interfering moves

Coalescing

Remove move t <- s
- 1. if t and s do not interfere
      so we can use the same register
      self move t <- t has no effect
      this move can also be eliminated

Spend more 
but graph gets denser

Greedy Coalescing

1. Consider each move t <- s 
2. If there's an edge between in the interference graph then skip
3. Otherwise: if there's a color c not in col(N(s) U N(t)) that corresponds
to a reg, then we coalesce c and s into fresh temp w. 
      - add w to graph
      - create edges between w and N(s) U N(t)
      - remove s and t from the graph

Can do before stack alloc?

Spitting live ranges

 introduce move 
-> split a temp
-> graph get sparser

problem: no good and simple heuristics

done:             liveness
a <- x + 8128     x
b <- a + a        x,a
return x * b      x,b

Example:
                                    live vars
x <- y                              y,u,v
n <- u + v                          x,u,v
i <- n                              x,n
l1: if i <= 0 then l2 else done     x,i
l2: i <- i-1                        x,i
      x <- x * x                    x,i
      goto l1                       x,i

where to split and when to split?
Graph:
u, v, b, i, a , b

u(0)-|    4     |-i(1)
v(1)-|--- x ----|-a(2)
n(2)-|          |-b(3)


make x' to replace x
                                    live vars
x <- y                              y,u,v
n <- u + v                          x,u,v
i <- n                              x,n
x' <- x
l1: if i <= 0 then l2 else done     x',i
l2: i <- i-1                        x',i
      x' <- x' * x'                 x',i
      goto l1                       x',i


u(0)-|    3 0     |-i(1)
v(1)-|--- x x'----|-a(2)
n(2)-|            |-b(3)


which to split? may use a heuristic
where to split?
One round of reg allocation -> another round of reg alloca

Back to Blog Chapters

Chapter 13: Peephole Optimizations

local optimizations that modify a few lines

Constant folding

Example:
replace binop with a const

l: x <-  c1 ⊕ c2 ->
l:  x <- c if c = c1 ⊕ c2

l: x <-  c1 ⊕ c2 ->
l: raise(arith) if c1 ⊕ c2 is undefined

Gen notation
l: inst0 -> inst0'
...
l: instn -> instn'

in LLVM >= 1000 peephole optimizations -> source of bugs

Optimize conditionals
l: if c1 ? c2 then l1 else l2 -> {l: goto li if c1 ? c2} -> block could become dead code


Example:
l1: x <- y + c1
l1: z <- x + c2

-> 
l1: x <- y + c1  <--- dead code ?
l2: z <- y + c
where c =  c1 + c2

Strength reduction

Replace complex instruction with simpler one

Examples:
a + 0 = a 
a - 0 = a
a * 0 = 0
a * 1 = a
a * 2^n  = a << n
a * b + a * c  = a * (b + c)

Don't introduce bugs!
x + 1 > x 

NULL sequences

instructions that have no effect
l: x <- x -> l: nop


l: x <- y -> l1: x <- y (dead code)
l: y <- x -> l2: nop



Example: 
a <- a + x
a <- a + y

-----------> a,x,y are on the stack

r1 <- a
r1 <- x + r1
a <- r1 ----> dead code
r1 <- a -----> dead code
r1 <- y + r1
a< r1

l: goto l'
l' instr
=> 
l: nop
l' instr

Common subexpression elim

l: x <- s1 ⊕ s2
..
l': y <- s1 ⊕ s2


-> 
l: x <- s1 ⊕ s2
..
l': y <- x

problem: every code path that reaches l' needs to go through l

Example:
lab1: 
x <- a + b
if a < 0 then lab2
      else lab3

lab2:
y <- a + b
goto lab4

lab3:
z <- x + b
goto lab4

lab4:
n <- a + b ? can we replace a + b? appear three times
1. replace u <- a + b with u <- y
2. replace u <- a + b with u <- x

Back to Blog Chapters

Chapter 15: Memory Optimizations

Dominator Graph

l > l' = l dominates l' = every path to l' goes through l

(1) compute l > l' on the CFG
- Lengauer-Tarjau (Asympt Faster)
- Cooper et al (Simpler & Faster in practice) <- Idea: construct dominator tree

Dominator tree: 
- Nodes are lines
- Edge beween each line and its immidiate dominator

decide l > l' by
- start at l' in dom tree
- traverse tree until
      - we find l -> l > l'
      - we arrive at function entry -> l not larger l'

Example: 
loops

while(e,s)

l0: instr(e)
l0': if (res(e)==0) then l2 else l1
l1: insts(s)
l1': goto l0
l2: 

control flow graph
                                            ------------ l2
                                           |
k(before loop) -----> l0 - --e- - -> l0'  -| 
                        ^                   ------ l1 - s -> l1'
                        |                                      |
                         --------------------------------------

l0' has predecessor of K
l1,l2 has predecessor of l0'

High-level view:

algs solve data flow equation: 

Dom(l) = 
 {l0} if l = l0 (func entry)
 {l1} union U(all dominators of l') otherwise


Naive alg:
1. Dom(l0) = {l0}
   Dom(l!=l0) = L (all lines)
2. use 2nd part of eq until no dom

Construct dom tree during code gen

Example: conditional if(e,s1,s2)
l0: instr(e)
l'0: if(res(e)!=0) then l1 else l2
l1: instr(s1); goto l3
l2: instr(s2); goto l3
l3: 


l3 -> l0'
l1 -> l0'
l2 -> l0'

Memory Optimization

- can give significant speedups
- difficult to impl

Example: 
M[q] <- 4
M[p] <- 8128
x <- M[q] ---> X <- 4

p and q don't alias (q!=p ?)
Alias analysis!

Alias analysis: One approach..

Assume we already have 
may-alias (a,b) = temps a and b can contain the same address

l: t <- M[a]
.
.
.
l': t' <- M[a]


=>

l: t <- M[a]
.
.
.
l': t' <- t 


What are the conditions that need to hold for us to do this? 
- No other def of t reaches l' (SSA, good)
- a has the same value in l and l' (SSA, good)
- M[a] is still "available" <- not overwrtitten
- t is defiend at l'

copy propagation applies at basic blocks


if avail(l, l') and l > l'
Easier: def unavail(l,l')

unavail(l,l') = l: t <- M[a]
M[a] is potentially defined on a path from l to l' and l > l'(for efficiency only)



l: t<- M[a]
l > l'
l': M[b] <- S
may-alias(a,S)
succ(l',k)
-----------------------
unavailable(l,k)
l > k


unavail(l,k)
succ(k,k')
l > k'
--------------
unavail(l,k')


Function call can be similar
if not used pointer not use memory, easier

Loop optimizations

hoisting loop-invariant computation out of a loop, 
and optimizations based on induction variables.

loop(h, l) if line is in the loop with header h

Fusion 
If there are two adjacent loops that iterate over the same data then these
loops can be combined in some situations.

Interchange 
Sometimes it can improve locality to switch the position of an inner
and outer loop.


Unrolling 
Sometimes it is beneficial to place a copy of the body of the loop in a
new code block in front of the loop. This can lead to new opportunities for
optimization and in some situations we can avoid some executions for the
loop guard.

Hoisting invariant computation 
If the result of an operation in the loop body is
the same in every loop iteration then we can perform it before entering the
loop body.


Inversion 
Replacing a while loop with a do while loop can reduce the number of
jumps and improve the effectiveness of instruction pipelines.
Induction variable substitution If a variable changes by a constant between loop
iterations then it can enable optimizations to make this relationship explicit.

A (pure) expression is loop invariant if its value does not change throughout the
loop. We can then define the predicate inv(h, p), where p is a pure expression, as
follows:

c constant
----------------
inv(h,c)

def(l, x) ¬loop(h, l)
-----------------------
inv(h, x)


inv(h, s1) inv(h, s2)
-----------------------
inv(h, s1 ⊕ s2)

Since we are concerned only with programs in SSA form, it is easy to see that variables are loop invariant if they are not parameters of the header label.

l : t ← p inv(h, p) loop(h, l)
-------------------------------
inv(h, t)

In order to hoist loop invariant computations out of a loop we should have a
loop pre-header in the control-flow graph, which immediately dominates the loop
header. We then move all the loop invariant computations to the pre-header, in
order.

Hoisting Loop-Invariant Computation

Back to Blog Chapters

Chapter 17: Garbage Collection

Memory Management
so far: we only allocate
mem: HOW TO FREE? 
two options: manual or automatic

Autom mem mgmt:
advantages: 
- easier for the program 
- correct: "no use after free"
- prevents memory leak(some)
disadvantages:
- some mem leaks possible
- potentially less efficient
- more mem use
- time overhead

Two main approaches to autom mem mgmt: 
1. garbage collection
2. reference counting

Tracing GC

- Identify objects on the heap that are still needed
- start with "root pointers"
- root pointers: addresses on the stack(in local variable) and registers (and global vars)
- find all objs that are reachable from the roots

1:
Conservative collector
- all words on the stack are treated as pointers
Better 
- store type info for stack frames
* Need to maintain type info for allocated obj (to ident pointers)

Mark and Sweap

reserve a bit for GC in each object
during tracing: mark obj
sweap: reclaim mem that is not marked
“Stop the world” GC, undesirable in terms of performance
fragmentation

Copying GC

divide heap into two "semi spaces"


Two semi-spaces:
A | B
Mark phase at A, 
instead of sweap, copy the reachable objects from A to B, 
move reachable to the other space

Can be done incrementally:
Copy some parts of A to B

Back to Blog Chapters

Chapter 18: More advanced features

Function pointers

typedef int optype(int, int);

typedef int (*optype_pt)(int, int);


int f(int x, int y){
      return x + y;
}

int (*g)(int x, int y) = &f;

int main(){
      (*g)(1,2);
}


rewrite local f2 by linking... 
similar idea to xz Utils Backdoor, function hooking with broken dynamic linking semantics

Static resource analysis

Info about resources comsumption at dev time
- Upper and lower bounds, security..

Why not testing:
- doesn't provide guarantees, doesn't show the worst case like quick sort

Resource Usage in Safety-Critical Systems
Memory Usage -> Toyota cars
Timing -> ICE 3 Velaro D delivery delay 
Cloud computing, job scheduling if we know the resource usage

Software Security
- side channels: meltdown, spectre..
- Algorithmic complexity attacks: worse-case input for low bdw DOS atks

Analysis of Alg: 
Big-O
Abstracts from low level details 
Disconnected from actual code
Asymptotic bounds not useful


Automatic Amortized Resource Analysis
- Given program, worse-case resource consumption of P as a function of the 
size of the input?
- potential functions p(before) >= p(after) + cost --> p(init state) >= sum(cost)


Example:
append(x,y)
Heap-space usage is 2n if n is len of x

f(x,y,z)
let t = append(x,y) in append(t,z)

Heap usage of f(x,y,z) is 2n + 2(n+m)

Evaluator
= Emult 2, Einc 1, Econst 0
Bound: 2n where n is the number of nodes in e
Better boundL 2m + k (m: Emult nodes, l: Einc nodes)

Back to Blog Chapters

Course Evaluation (Final grade: A)

Blog Chapters

**Chapter 1: Overview of Compiler Design **

What makes a good Compiler: metrics

Compiler Design

Organizing a compiler

Front

Middle

Back

**Chapter 1.1: Instruction Selection **

IR tree (more expressions, statements..)

Abstract Assembly (flat)

Translations Expr

Translate statements

Example

How to improve

Constant propagation

Copy propagation

Static Single Assignment Form

**Chapter 2: Register Allocation **

X86-64: 16 registers, no temps

Strategy

Difficulty: x86 has 15 gen purpose registers

Interference:

Rigth IR for reg alloc?

3 addr abs asm

2 addr abs asm

abstract x86

Reg Alloc at 3-Addr Assem

Reg Alloc at 2-Addr Assem

How to allocate registers?

Goal: minimize spilling to the stack

One method: graph-based, greedy register allocation

Build interference graph

Find coloring with as few colors as possible: Adjacent vertices have different colors

Assign colors to registers and stack locations

Th: There exists an ordering that produces an optimal coloring

Maximum cardinality search

Theorem 1: The graph is chordal iff it has an simplicial elim ordering

Theorem 2: The greedy coloring alg finds an opt coloring if run with a simpl elim ordering

-> Theorem: MCS returns a sompl elem ordering iff G is chordal

Spill: assign remaining colors to stack loc.

Summary

Liveness analysis & interence rules

General Saturation Alg

Refactoring liveness rules

Need to define:

Example

Adding loops and conditionals:

Interference graph

Chapter 3: Elaboration

Goal: minimal repr and clear of progr

Inf rule

Why IR tree

isolate potentially effectful expressions, making their order of execution explicit.simplifies instruction selection and also means that the remaining pure expressions can be optimized much more effectively.

make the control flow explicit in the form of conditional or unconditional branches, which is closer to the assembly language target and allows us to apply standard program analyses based on an explicit control flow graph.

Translating Expressions

Translating Statements

Translating Boolean Expressions

Extended basic blocks

Chapter 4: Static Semantics

abstract syntax

Abstract Syntax as defined in last chap

Def

Liveness

Initialization

From Judgments to Functions

**Chapter 5: Grammars **

CYK Parsing

Shift-reduce parsing

Parse Table

Conflicts

LR(1) and more?

**Chapter 6: Lexical Analysis **

Regular Expressions -> Nondeterministic Finite Automata

Nondeterministic Finite Automata -> Deterministic Finite Automata

**Chapter 8: Function Calls **

IR

calling conventions

Registers convention

Chapter 1: Overview of Compiler Design

Chapter 1.1: Instruction Selection

Chapter 2: Register Allocation

Chapter 5: Grammars

Chapter 6: Lexical Analysis

Chapter 8: Function Calls

Chapter 9: SSA

Chapter 10: Dynamics

Chapter 11: Mutable

Chapter 12: Dataflow analysis

Chapter 13: Optimizations

Chapter 13: Peephole Optimizations

Chapter 15: Memory Optimizations

Chapter 17: Garbage Collection

Chapter 18: More advanced features