Skip to content

Commit 354173d

Browse files
committed
added sctiptIt language
1 parent 4780832 commit 354173d

5 files changed

Lines changed: 4282 additions & 0 deletions

File tree

include/pythonic/REPL/INTERNALS.md

Lines changed: 387 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,387 @@
1+
# 🔬 How MyLang Works — Interpreter Internals
2+
3+
This document explains how the interpreter transforms your code into output, step by step. Even if you've never built a language before, you'll understand the full pipeline by the end.
4+
5+
---
6+
7+
## The Big Picture
8+
9+
When you write `2 + 3.`, the interpreter runs through **4 stages** to produce `5`:
10+
11+
```
12+
Source Code → [Tokenizer] → [Parser] → [Evaluator] → Output
13+
"2 + 3." Tokens AST Tree Walks tree "5"
14+
```
15+
16+
Let's walk through each stage.
17+
18+
---
19+
20+
## Stage 1: Tokenizer (Lexer)
21+
22+
**File**: `me_doingIt.cpp``class Tokenizer`
23+
24+
The tokenizer reads raw text character by character and breaks it into **tokens** — small meaningful pieces. Think of it like breaking a sentence into words.
25+
26+
### Example
27+
28+
Input: `var x = 10 + 3.`
29+
30+
Tokens produced:
31+
32+
```
33+
[KeywordVar: "var"] [Identifier: "x"] [Equals: "="] [Number: "10"]
34+
[Operator: "+"] [Number: "3"] [Dot: "."] [Eof]
35+
```
36+
37+
### How It Works
38+
39+
The tokenizer uses a `while` loop that walks through the source string one character at a time:
40+
41+
```
42+
Position: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
43+
Source: v a r x = 1 0 + 3 .
44+
```
45+
46+
For each character it asks:
47+
48+
1. **Is it a digit?** → Keep reading digits to form a `Number` token (`10`)
49+
2. **Is it a letter?** → Keep reading letters to form a word, then check:
50+
- Is it a keyword (`var`, `fn`, `if`, `while`, etc.)? → Keyword token
51+
- Is it a logical operator (`and`, `or`, `not`)? → Operator token
52+
- Otherwise? → `Identifier` token (a variable/function name)
53+
3. **Is it a symbol?** → Match single or double-character operators (`+`, `==`, `&&`, `<=`, etc.)
54+
4. **Is it `-->`?** → Skip everything until `<--` (comment)
55+
5. **Whitespace?** → Skip it
56+
57+
### Important Detail: The Dot Ambiguity
58+
59+
The character `.` serves **two purposes**:
60+
61+
- **Statement terminator**: `x.` means "end of statement, print x"
62+
- **Decimal point**: `3.14` is a floating-point number
63+
64+
The tokenizer resolves this by checking: _Is the next character after `.` a digit?_ If yes → it's part of a decimal number. If no → it's a terminator.
65+
66+
```
67+
"3.14." → [Number: "3.14"] [Dot: "."]
68+
"10." → [Number: "10"] [Dot: "."]
69+
```
70+
71+
---
72+
73+
## Stage 2: Parser
74+
75+
**File**: `me_doingIt.cpp``class Parser`
76+
77+
The parser reads the flat list of tokens and builds a **tree structure** called an **AST** (Abstract Syntax Tree). This tree represents the logical structure of your program.
78+
79+
### What the Parser Produces
80+
81+
For this code:
82+
83+
```
84+
if x > 10:
85+
x.
86+
;
87+
```
88+
89+
The parser creates this tree:
90+
91+
```
92+
IfStmt
93+
├── condition: Expression [x > 10]
94+
├── then-block: BlockStmt
95+
│ └── ExprStmt
96+
│ └── Expression [x]
97+
└── else-block: (none)
98+
```
99+
100+
### Statement Types
101+
102+
The parser knows how to recognize these statement patterns:
103+
104+
| Statement | Pattern | Produced Node |
105+
| -------------------- | ---------------------------------------- | --------------------------------- |
106+
| Variable declaration | `var NAME = EXPR.` | `AssignStmt` (isDeclaration=true) |
107+
| Assignment | `NAME = EXPR.` | `AssignStmt` |
108+
| Function definition | `fn NAME @(PARAMS): BODY ;` | `FunctionDefStmt` |
109+
| If/elif/else | `if EXPR: BODY ; [elif...] [else...]` | `IfStmt` |
110+
| While loop | `while EXPR: BODY ;` | `WhileStmt` |
111+
| For loop | `for NAME in range(from X to Y): BODY ;` | `ForStmt` |
112+
| Return | `give(EXPR).` | `ReturnStmt` |
113+
| Pass | `pass.` | `PassStmt` |
114+
| Expression | `EXPR.` | `ExprStmt` (prints the result) |
115+
116+
### How Expression Parsing Works: The Shunting-Yard Algorithm
117+
118+
This is the most complex part. Expressions like `2 + 3 * 4` need to respect operator precedence (`*` before `+`). The parser uses the **Shunting-Yard Algorithm** (invented by Edsger Dijkstra) to convert infix notation to **RPN** (Reverse Polish Notation).
119+
120+
#### What is RPN?
121+
122+
Normal math (infix): `2 + 3 * 4`
123+
RPN (postfix): `2 3 4 * +`
124+
125+
In RPN, operators come **after** their operands. The beauty: **no parentheses needed** and evaluation is trivially simple with a stack.
126+
127+
#### The Algorithm
128+
129+
Uses two data structures: an **output queue** and an **operator stack**.
130+
131+
```
132+
Input tokens: 2 + 3 * 4
133+
134+
Step 1: "2" is a number → push to output
135+
Output: [2] Stack: []
136+
137+
Step 2: "+" is an operator → push to stack
138+
Output: [2] Stack: [+]
139+
140+
Step 3: "3" is a number → push to output
141+
Output: [2, 3] Stack: [+]
142+
143+
Step 4: "*" is an operator → precedence of * (6) > + (5)
144+
So * goes on top, + stays
145+
Output: [2, 3] Stack: [+, *]
146+
147+
Step 5: "4" is a number → push to output
148+
Output: [2, 3, 4] Stack: [+, *]
149+
150+
Step 6: End of input → pop all operators to output
151+
Output: [2, 3, 4, *, +] Stack: []
152+
```
153+
154+
Result RPN: `2 3 4 * +`
155+
156+
#### How Parentheses Work
157+
158+
`(2 + 3) * 4`:
159+
160+
- `(` → pushed to stack as marker
161+
- `2 + 3` processed normally
162+
- `)` → pop operators until `(` is found, removing the marker
163+
- `*` → normal processing
164+
165+
Result: `2 3 + 4 *` ✓ (addition happens first)
166+
167+
#### Unary Operators
168+
169+
`-5` is tricky because `-` could be subtraction or negation. The parser checks: was the **previous token** an operator, opening paren, or nothing? If so, it's unary.
170+
171+
Unary `-` is renamed to `~` internally so the evaluator can distinguish:
172+
173+
- `-` with two operands = subtraction
174+
- `~` with one operand = negation
175+
176+
Unary `!` stays as `!`.
177+
178+
### Short-Circuit Evaluation (the Tricky Part)
179+
180+
`&&` and `||` need **lazy evaluation** — the right side shouldn't run if the left side already determines the result. But RPN evaluates everything eagerly!
181+
182+
**Solution**: The parser has **three layers**:
183+
184+
```
185+
parseExpression() → calls parseLogicalOr()
186+
parseLogicalOr() → calls parseLogicalAnd(), handles ||
187+
parseLogicalAnd() → calls parsePrimaryExpr(), handles &&
188+
parsePrimaryExpr() → Shunting-Yard for everything else
189+
```
190+
191+
When `||` or `&&` appears **at the top level** (not inside parentheses), the parser **doesn't** put them in the RPN. Instead, it creates a tree node:
192+
193+
```
194+
Expression
195+
├── logicalOp: "||"
196+
├── lhs: Expression [left side - RPN]
197+
└── rhs: Expression [right side - RPN]
198+
```
199+
200+
The evaluator then checks the LHS first, and **only evaluates RHS if needed**:
201+
202+
```cpp
203+
if (logicalOp == "&&") {
204+
double leftVal = lhs->evaluate(scope);
205+
if (leftVal == 0) return 0.0; // Short-circuit: skip RHS!
206+
return rhs->evaluate(scope); // Only evaluate if LHS was true
207+
}
208+
```
209+
210+
---
211+
212+
## Stage 3: Evaluator
213+
214+
**File**: `me_doingIt.cpp``Expression::evaluate()` and `*.execute()` methods
215+
216+
### Expression Evaluation (RPN Stack Machine)
217+
218+
Evaluating RPN is beautifully simple. Use a **stack**:
219+
220+
```
221+
RPN: 2 3 4 * +
222+
223+
Step 1: "2" → push Stack: [2]
224+
Step 2: "3" → push Stack: [2, 3]
225+
Step 3: "4" → push Stack: [2, 3, 4]
226+
Step 4: "*" → pop 4 and 3,
227+
push 3*4=12 Stack: [2, 12]
228+
Step 5: "+" → pop 12 and 2,
229+
push 2+12=14 Stack: [14]
230+
231+
Result: 14 ✓
232+
```
233+
234+
### Statement Execution
235+
236+
Each AST node has an `execute()` method:
237+
238+
- **ExprStmt**: Evaluates the expression and **prints** the result
239+
- **AssignStmt**: Evaluates the expression, stores the result in the scope
240+
- **IfStmt**: Evaluates condition → if non-zero, executes the matching branch's block
241+
- **WhileStmt**: Evaluates condition → while non-zero, executes body, re-evaluates condition
242+
- **ForStmt**: Determines range → iterates, setting loop variable in scope for each iteration
243+
- **FunctionDefStmt**: Stores the function definition in the scope (does not run it yet)
244+
- **ReturnStmt**: Evaluates expression, throws `ReturnException` with the value
245+
246+
### How `give` (Return) Works
247+
248+
`give(value)` throws a C++ exception (`ReturnException`). This exception **unwinds** through any nested loops, if-blocks, etc., until it's caught by the function call code in the evaluator. This is why `give` correctly exits from inside while loops:
249+
250+
```
251+
fn find @():
252+
var i = 0.
253+
while i < 100: ← loop running
254+
if i == 42:
255+
give(i). ← throws ReturnException(42)
256+
; ← exception flies through if-block
257+
i = i + 1.
258+
; ← exception flies through while-loop
259+
; ← caught here by function call handler
260+
```
261+
262+
### Function Calls
263+
264+
When the evaluator encounters a function call in an expression:
265+
266+
1. **Pop arguments** from the stack
267+
2. **Create a new scope** (child of caller's scope, with barrier)
268+
3. **Define parameters** as local variables in the new scope
269+
4. **Execute** the function body
270+
5. **Catch** any `ReturnException` → push the return value onto the stack
271+
6. If no `give` was used → push `0` (implicit return)
272+
273+
---
274+
275+
## Stage 4: Scope System
276+
277+
**File**: `me_doingIt.cpp``struct Scope`
278+
279+
The scope system controls **which variables are visible** and **which can be modified**. It's implemented as a **linked list** of scope frames.
280+
281+
### Scope Chain
282+
283+
```
284+
Global Scope ← defines: x=10, PI=3.14
285+
286+
├── Function Scope (barrier=true) ← defines: a=5 (parameter)
287+
│ │
288+
│ └── If-Block Scope (barrier=false) ← defines: temp=1
289+
290+
└── For-Loop Scope (barrier=false) ← defines: i=3 (loop var)
291+
```
292+
293+
### The Barrier Mechanism
294+
295+
Each scope has a `barrier` flag:
296+
297+
- **`barrier = false`** (if/else, for, while blocks): The `set()` method **propagates** writes to the parent scope. So `x = 99` inside an if-block modifies the outer `x`.
298+
299+
- **`barrier = true`** (function scopes): The `set()` method **stops** at the barrier. So `x = 99` inside a function throws an error — it can't reach the outer `x`.
300+
301+
### Variable Lookup (`get`)
302+
303+
When reading variable `x`, the scope walks **up** the chain:
304+
305+
```
306+
Current scope → has x? → Yes → return it
307+
→ No → check parent → has x? → Yes → return it
308+
→ No → check parent → ...
309+
→ Error!
310+
```
311+
312+
There's **no barrier for reading** — functions can always read outer variables. Only writing is blocked.
313+
314+
### Variable Assignment (`set`)
315+
316+
When writing `x = value`:
317+
318+
```
319+
Current scope → has x? → Yes → update it
320+
→ No → barrier? → Yes → ERROR ("cannot mutate outer scope")
321+
→ No → try parent.set(x, value)
322+
```
323+
324+
---
325+
326+
## Putting It All Together
327+
328+
Here's the full journey of this program:
329+
330+
```
331+
var x = 5.
332+
fn double @(n): give(n * 2). ;
333+
double(x).
334+
```
335+
336+
### 1. Tokenizer
337+
338+
```
339+
[var] [x] [=] [5] [.] [fn] [double] [@] [(] [n] [)] [:] [give] [(] [n] [*] [2] [)] [.] [;] [double] [(] [x] [)] [.]
340+
```
341+
342+
### 2. Parser
343+
344+
```
345+
Program (BlockStmt)
346+
├── AssignStmt { name="x", expr=RPN[5], isDeclaration=true }
347+
├── FunctionDefStmt { name="double", params=["n"],
348+
│ body=BlockStmt [
349+
│ ReturnStmt { expr=RPN[n, 2, *] }
350+
│ ]
351+
│ }
352+
└── ExprStmt { expr=RPN[x, double CALL(1)] }
353+
```
354+
355+
### 3. Evaluator
356+
357+
```
358+
1. AssignStmt: evaluate RPN[5] → 5, store x=5 in global scope
359+
2. FunctionDefStmt: store "double" function definition in scope
360+
3. ExprStmt: evaluate RPN[x, double CALL(1)]
361+
a. Push x → stack: [5]
362+
b. CALL double with 1 arg
363+
- Pop 5 from stack
364+
- Create new scope with n=5
365+
- Execute body: evaluate RPN[n, 2, *]
366+
- Push n=5, push 2 → stack: [5, 2]
367+
- Pop 2, pop 5, push 10 → stack: [10]
368+
- ReturnStmt throws ReturnException(10)
369+
- Catch → push 10 to stack
370+
c. Stack: [10]
371+
d. Print: 10
372+
```
373+
374+
**Output**: `10`
375+
376+
---
377+
378+
## Summary of Key Design Decisions
379+
380+
| Decision | Choice | Why |
381+
| ------------------------- | --------------------------------------- | ---------------------------------------------------------------------------------------------- |
382+
| Expression representation | RPN (Reverse Polish Notation) | Simple stack-based evaluation, no recursion needed |
383+
| Short-circuit `&&`/`\|\|` | Tree nodes wrapping RPN sub-expressions | Can't lazily evaluate inside flat RPN, so logical ops are lifted to tree layer |
384+
| Scope model | Dynamic scope with barriers | Simple, satisfies "inner functions can read outer vars" while preventing mutation |
385+
| Return mechanism | C++ exceptions (`ReturnException`) | Cleanly unwinds through nested loops and blocks without adding return-checking code everywhere |
386+
| Statement terminator | `.` (dot) | Chosen by language designer as a visual alternative to `;` |
387+
| Function syntax | `fn NAME @(PARAMS): BODY ;` | `@` is a visual separator, `:` and `;` delimit the body |

0 commit comments

Comments
 (0)