Skip to content

pri-2711/ErrorDetector

Repository files navigation

Compiler Design Project

What this project does

This project is a simple pipeline for checking Python-like code. It has three main stages:

  1. Lexical Analysis — split code into tokens and build a symbol table.
  2. Syntax Checking — find common problems like invalid names, bad operators, indentation issues, and unmatched brackets.
  3. Grammar Parsing — try to parse simplified statements using LR(1) and LALR(1) parser tables.

It also includes a basic Streamlit user interface (app.py) so you can type code, load samples, and see the results in tabs.

Project files and what they do

main.py

  • The main driver for the project.
  • Uses the lexer to convert source code into tokens and transformed statements.
  • Runs the syntax checker to collect errors and warnings.
  • Runs both LR(1) and LALR(1) parser checks on the transformed statements.
  • Prints a final report showing tokens, symbol table, syntax results, grammar parse results, and line-by-line source output.
  • Includes two sample programs: one correct and one with errors.

Key functions:

  • tokenize_for_parser(transformed_stmt) — turns transformed statements like id = id + num into parser tokens such as ['id', '=', 'id', '+', 'num', '$'].
  • analyze(source_code, verbose=True) — runs the whole pipeline and prints the report.

lexer.py

  • Converts source code into tokens.
  • Detects identifiers, numbers, strings, keywords, operators, punctuation, indentation, and other lexemes.
  • Stores identifiers in a simple symbol table with line and type information.
  • Produces "transformed statements" that the grammar parser can consume.

Main classes:

  • Token — holds token type, value, and source line.
  • SymbolTable — stores variable names with line, scope, and type.
  • Lexer — tokenizes the source and transforms lines into parser-friendly statements.

syntax_checker.py

  • Runs rule-based checks on the source code.
  • Looks for common Python-style errors and warnings before parsing.

Main checks include:

  • invalid variable names (e.g. names that start with digits or use Python keywords)
  • bad operator sequences (e.g. +*, ** **)
  • inconsistent or wrong indentation
  • unmatched brackets, braces, or quotation marks
  • incorrect print() or input() style usage
  • simple loop and condition warnings

Main class:

  • SyntaxChecker — takes source code and lexer tokens, then returns lists of errors and warnings.

grammar_parser.py

  • Implements a small grammar parser using LR(1) and LALR(1).
  • Builds parser tables automatically from grammar rules.
  • Can show whether each transformed statement is accepted or rejected.
  • Builds a parse tree for accepted statements.

Grammar support includes:

  • assignment statements like id = E
  • arithmetic expressions with +, -, *, /
  • parentheses ( and )
  • simple condition headers like if COND :
  • print(id) calls
  • for id in range(num) : loop headers

Important functions:

  • build_parse_tree(tokens, action, goto_t) — parses token lists and returns a parse tree or an error.
  • print_parsing_table(action, goto_t, label) — prints the ACTION and GOTO tables for debugging.

app.py

  • A Streamlit-based browser interface for the project.
  • Lets you type code, load sample programs, and run analysis.
  • Shows results in tabs for:
    • lexical analysis,
    • syntax checking,
    • grammar parsing,
    • parsing tables.
  • Allows choosing between LR(1) and LALR(1) parser tables.

How the project works in simple language

  1. Read the source code line by line.
  2. Lexical analysis turns text into tokens like id, num, =, +, if, print, and range.
  3. Syntax checks look for common mistakes before parsing.
  4. Grammar parsing uses a small set of rules to verify whether transformed statements match the expected structure.
  5. Final report shows which lines are valid and which lines fail, both for syntax rules and grammar rules.

How to run the project

From the cdProject folder, run:

python -u main.py

For the Streamlit UI, run:

streamlit run app.py

Sample programs

The project includes two sample source codes in main.py:

  • SAMPLE_CORRECT — a valid program with assignments, if, print, and for.
  • SAMPLE_WITH_ERRORS — a faulty program with invalid variable names, bad operators, missing parentheses, and other syntax issues.

What this project teaches

  • basic lexical analysis and tokenization
  • how to build a symbol table
  • rule-based syntax checking
  • LR(1) and LALR(1) parser table construction
  • parse tree building
  • how to connect parsing logic to a simple UI with Streamlit

Important limitations

  • This is not a full Python parser.
  • Only a small set of statement patterns is supported.
  • The parser uses a simplified grammar and may reject valid Python code outside that grammar.
  • Syntax checking and grammar parsing are separate: passing one does not guarantee passing the other.

About

Syntax analyzer which detects errors in python scripts using compiler design concepts and parser - LR(1), LALR

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages