Compiler Design Project

What this project does

This project is a simple pipeline for checking Python-like code. It has three main stages:

Lexical Analysis — split code into tokens and build a symbol table.
Syntax Checking — find common problems like invalid names, bad operators, indentation issues, and unmatched brackets.
Grammar Parsing — try to parse simplified statements using LR(1) and LALR(1) parser tables.

It also includes a basic Streamlit user interface (app.py) so you can type code, load samples, and see the results in tabs.

Project files and what they do

`main.py`

The main driver for the project.
Uses the lexer to convert source code into tokens and transformed statements.
Runs the syntax checker to collect errors and warnings.
Runs both LR(1) and LALR(1) parser checks on the transformed statements.
Prints a final report showing tokens, symbol table, syntax results, grammar parse results, and line-by-line source output.
Includes two sample programs: one correct and one with errors.

Key functions:

tokenize_for_parser(transformed_stmt) — turns transformed statements like id = id + num into parser tokens such as ['id', '=', 'id', '+', 'num', '$'].
analyze(source_code, verbose=True) — runs the whole pipeline and prints the report.

`lexer.py`

Converts source code into tokens.
Detects identifiers, numbers, strings, keywords, operators, punctuation, indentation, and other lexemes.
Stores identifiers in a simple symbol table with line and type information.
Produces "transformed statements" that the grammar parser can consume.

Main classes:

Token — holds token type, value, and source line.
SymbolTable — stores variable names with line, scope, and type.
Lexer — tokenizes the source and transforms lines into parser-friendly statements.

`syntax_checker.py`

Runs rule-based checks on the source code.
Looks for common Python-style errors and warnings before parsing.

Main checks include:

invalid variable names (e.g. names that start with digits or use Python keywords)
bad operator sequences (e.g. +*, ** **)
inconsistent or wrong indentation
unmatched brackets, braces, or quotation marks
incorrect print() or input() style usage
simple loop and condition warnings

Main class:

SyntaxChecker — takes source code and lexer tokens, then returns lists of errors and warnings.

`grammar_parser.py`

Implements a small grammar parser using LR(1) and LALR(1).
Builds parser tables automatically from grammar rules.
Can show whether each transformed statement is accepted or rejected.
Builds a parse tree for accepted statements.

Grammar support includes:

assignment statements like id = E
arithmetic expressions with +, -, *, /
parentheses ( and )
simple condition headers like if COND :
print(id) calls
for id in range(num) : loop headers

Important functions:

build_parse_tree(tokens, action, goto_t) — parses token lists and returns a parse tree or an error.
print_parsing_table(action, goto_t, label) — prints the ACTION and GOTO tables for debugging.

`app.py`

A Streamlit-based browser interface for the project.
Lets you type code, load sample programs, and run analysis.
Shows results in tabs for:
- lexical analysis,
- syntax checking,
- grammar parsing,
- parsing tables.
Allows choosing between LR(1) and LALR(1) parser tables.

How the project works in simple language

Read the source code line by line.
Lexical analysis turns text into tokens like id, num, =, +, if, print, and range.
Syntax checks look for common mistakes before parsing.
Grammar parsing uses a small set of rules to verify whether transformed statements match the expected structure.
Final report shows which lines are valid and which lines fail, both for syntax rules and grammar rules.

How to run the project

From the cdProject folder, run:

python -u main.py

For the Streamlit UI, run:

streamlit run app.py

Sample programs

The project includes two sample source codes in main.py:

SAMPLE_CORRECT — a valid program with assignments, if, print, and for.
SAMPLE_WITH_ERRORS — a faulty program with invalid variable names, bad operators, missing parentheses, and other syntax issues.

What this project teaches

basic lexical analysis and tokenization
how to build a symbol table
rule-based syntax checking
LR(1) and LALR(1) parser table construction
parse tree building
how to connect parsing logic to a simple UI with Streamlit

Important limitations

This is not a full Python parser.
Only a small set of statement patterns is supported.
The parser uses a simplified grammar and may reject valid Python code outside that grammar.
Syntax checking and grammar parsing are separate: passing one does not guarantee passing the other.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
CDproject_Tasks.docx		CDproject_Tasks.docx
README.md		README.md
app.py		app.py
grammar_parser.py		grammar_parser.py
lexer.py		lexer.py
main.py		main.py
requirements.txt		requirements.txt
syntax_checker.py		syntax_checker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compiler Design Project

What this project does

Project files and what they do

`main.py`

`lexer.py`

`syntax_checker.py`

`grammar_parser.py`

`app.py`

How the project works in simple language

How to run the project

Sample programs

What this project teaches

Important limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Compiler Design Project

What this project does

Project files and what they do

main.py

lexer.py

syntax_checker.py

grammar_parser.py

app.py

How the project works in simple language

How to run the project

Sample programs

What this project teaches

Important limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main.py`

`lexer.py`

`syntax_checker.py`

`grammar_parser.py`

`app.py`

Packages