Skip to content

asarthak2003/CodeSense-AI-Based-Code-Quality-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 CodeSense

AI-Based Code Quality Analyzer

My B.Tech Final Year Project β€” built from scratch over 6 months πŸ˜…


Python Streamlit scikit-learn SQLite


"I was tired of submitting code and not knowing WHY it was bad. So I built something that actually explains it."


πŸš€ Quick Start β€’ ✨ Features β€’ πŸ“Έ Screenshots β€’ πŸ—οΈ How it Works β€’ 🀝 Contributing


πŸ‘‹ Hey, what's this?

So basically in my 3rd year I kept getting feedback like "your code quality is bad" from professors but nobody ever told me what exactly was wrong or how to fix it. That's when I decided to make CodeSense for my final year project.

It analyzes your Python, Java, or C++ code and gives you:

  • A proper score (not just vibes)
  • Exactly what's wrong and on which line
  • How to fix it
  • Which algorithms you used and their complexity
  • Security vulnerabilities you didn't even know you had

I spent way too many nights on this. Hope it helps someone πŸ™


✨ What it does

πŸ€– ML-Based Scoring

Trained an ensemble model (Random Forest + Gradient Boosting) on 10,000 code samples. The score actually comes from the model β€” not some random formula I made up. RΒ² β‰₯ 0.90 on test set.

🧠 Algorithm Detection

Detects 40+ algorithms (bubble sort to Dijkstra's) and tells you the Big-O complexity. Honestly this part was the most fun to build.

πŸ”’ Security Scanner

50+ vulnerability patterns. Finds SQL injection, hardcoded passwords, weak hashing and more. Saved me from some embarrassing code lol.

πŸ“š Learning Path

Doesn't just point out problems β€” gives you resources, LeetCode problems, and step-by-step improvement tips based on YOUR specific issues.

πŸ”§ Auto-Fix Suggestions

Shows you the before/after for safe fixes. Doesn't just say "this is bad" β€” actually shows what good looks like.

πŸ“ˆ Progress Tracking

Tracks your scores over time so you can actually see yourself improving. Made this because I wanted to see if I was getting better.


πŸ“Έ Screenshots

Login & Register Page

Login & Register Page

Dashboard

Dashboard

Analysis Results

Analysis Results

Achievement

Achievement

Progress

Progress


πŸš€ Quick Start

Step 1 β€” Clone the repo

git clone https://github.com/asarthak2003/CodeSense-AI-Based-Code-Quality-Analyzer
cd CodeSense-AI-Based-Code-Quality-Analyzer

Step 2 β€” Create virtual environment

# Windows
python -m venv venv
venv\Scripts\activate

# Mac/Linux
python3 -m venv venv
source venv/bin/activate

Step 3 β€” Install dependencies

pip install -r requirements.txt

⚠️ This installs scikit-learn, streamlit, plotly, bcrypt etc. Might take 2-3 mins on slow internet.

Step 4 β€” Setup config

cp .env.example .env

Open .env and change the SECRET_KEY to anything random (min 32 chars).

Step 5 β€” Train the ML model

python train_model.py --samples 10000

This runs once and saves the model to models/. Takes about 30-60 seconds. You'll see output like:

Generating 10000 training samples...
Running 10-fold cross-validation...
CV RΒ²: 0.923 Β± 0.008  βœ… PASSED
Model saved to models/codesense_model.pkl

Step 6 β€” Run it! πŸŽ‰

streamlit run app.py

Go to http://localhost:8501 β€” create an account and start analyzing!


πŸ—οΈ How it Works

I know most project READMEs skip this part but I think it's actually interesting so here's a quick breakdown:

Your Code
    β”‚
    β”œβ”€β”€β–Ί Syntax Analyzer    (AST + tokenizer, finds parse errors)
    β”œβ”€β”€β–Ί Semantic Analyzer  (unused vars, dead code, mutable defaults)
    β”œβ”€β”€β–Ί Static Analyzer    (security patterns, complexity, style)
    β”œβ”€β”€β–Ί DSA Detector       (pattern matching for 40+ algorithms)
    └──► Context Engine     (is this a test file? web app? algorithm?)
                β”‚
                β–Ό
         Feature Extractor
         (converts everything into 33 numbers the ML model understands)
                β”‚
                β–Ό
         ML Model (RandomForest + GradientBoosting ensemble)
                β”‚
                β–Ό
         Quality Score (0-100) + Grade + Confidence
                β”‚
                β–Ό
         Feedback Engine (generates human-readable explanation)

The score comes ENTIRELY from the ML model. I spent a long time making sure there are no hardcoded rules that override it.

The 33 Features

The ML model uses 33 features I engineered from the static analysis:

Category Features Examples
Basic Metrics 8 Lines of code, comment ratio, blank lines
Complexity 6 Cyclomatic complexity, nesting depth, function length
Style 5 Naming consistency, magic numbers, long lines
Security 4 Issue count, critical count, severity score
DSA 3 Algorithm complexity score, count, DS count
Contextual 7 Code duplication, exception coverage, technical debt

πŸ“ Project Structure

CodeSense/
β”‚
β”œβ”€β”€ πŸ“± app.py                 ← Main Streamlit app (all the pages)
β”œβ”€β”€ πŸ” analyzer.py            ← Static analysis engine
β”œβ”€β”€ 🧠 dsa_detector.py        ← Algorithm + data structure detection
β”œβ”€β”€ πŸ“Š features.py            ← Feature engineering (33 features)
β”œβ”€β”€ πŸ€– train_model.py         ← ML model training + inference
β”œβ”€β”€ πŸ’¬ student_feedback.py    ← Feedback generation
β”œβ”€β”€ ⚠️  syntax_analyzer.py    ← Syntax error detection
β”œβ”€β”€ πŸ”¬ semantic_analyzer.py   ← Semantic issue detection
β”œβ”€β”€ 🌍 context_engine.py      ← Code context understanding
β”œβ”€β”€ πŸ”§ code_fixer.py          ← Auto-fix suggestions
β”œβ”€β”€ 🎨 ui_components.py       ← Reusable UI components
β”‚
β”œβ”€β”€ πŸ—„οΈ  db.py                 ← SQLite database (WAL mode)
β”œβ”€β”€ πŸ” auth.py                ← Auth (bcrypt + OTP + sessions)
β”œβ”€β”€ ⚑ cache.py               ← Two-tier LRU + disk cache
β”œβ”€β”€ βœ… validators.py          ← Input validation
β”œβ”€β”€ πŸ› οΈ  utils.py              ← GitHub fetch, exports, timing
β”œβ”€β”€ πŸ“ logger.py              ← Structured logging
β”œβ”€β”€ βš™οΈ  config.py             ← Config management
β”œβ”€β”€ πŸ“Œ constants.py           ← All constants in one place
β”‚
β”œβ”€β”€ πŸ“ models/                ← Saved ML model files
β”œβ”€β”€ πŸ“ tests/                 ← Test suite (50+ tests)
β”œβ”€β”€ πŸ“ logs/                  ← App logs
β”œβ”€β”€ πŸ“ cache/                 ← Disk cache files
β”‚
β”œβ”€β”€ πŸ“‹ requirements.txt
└── πŸ”’ .env.example

πŸ€– ML Model Details

This was my main challenge β€” making a model that actually gives meaningful scores

  • Algorithm: Voting ensemble (RandomForest 55% + GradientBoosting 45%)
  • Training data: 10,000 synthetic code samples with realistic score distributions
  • Validation: 10-fold cross-validation (not just train/test split)
  • Target RΒ²: β‰₯ 0.90 (retrains if it doesn't hit this)
  • Contextual adjustment: Max Β±5 points from context engine (test files, algorithm-heavy code etc.)

I initially tried just using cyclomatic complexity as the score (lol) but that was terrible. The ML approach captures way more nuance.


πŸ”’ Security Features

Both in the app AND in what it detects:

App security:

  • bcrypt password hashing (12 rounds)
  • OTP email verification
  • Brute force lockout (5 attempts β†’ 30 min lock)
  • Parameterized SQL queries (no injection)
  • Session tokens (48-byte cryptographically secure)

Code security detection:

  • SQL injection, command injection, XSS
  • Hardcoded passwords, API keys, tokens
  • Weak crypto (MD5, SHA1, insecure random)
  • Unsafe deserialization (pickle, yaml.load)
  • Buffer overflows in C++ (gets, strcpy, sprintf)
  • And 40+ more patterns

πŸ§ͺ Running Tests

# Run all tests
python tests/test_suite_enhanced.py

# Or with pytest for better output
pip install pytest pytest-cov
pytest tests/ -v --tb=short --cov=. --cov-report=term-missing

Tests cover all major modules β€” syntax analysis, ML model, database, auth, cache, DSA detection etc.


Then go to http://localhost:8501


πŸ“š Tech Stack

What Why I chose it
Streamlit Fastest way to build a Python web UI. Perfect for data science projects.
scikit-learn Industry standard ML library. The ensemble models are solid.
SQLite No external database server needed. WAL mode handles concurrent reads fine.
bcrypt Proper password hashing. None of that MD5 nonsense.
Plotly Interactive charts that actually look good in dark mode.
Python AST Built-in library for parsing Python code β€” no external parsers needed.

🀝 Contributing

This is my final year project so I'm not taking major contributions right now, but if you find bugs or have suggestions feel free to open an issue!

If you fork this for your own project, a star ⭐ would be really appreciated β€” it helps with the placement portfolio πŸ˜„


πŸ“ Known Issues / TODO

  • Add support for JavaScript / TypeScript
  • Fix occasional false positives in C++ template code detection
  • Add side-by-side code diff view in the Fixes tab
  • Mobile responsive layout (Streamlit has some limitations here)
  • Export analysis as PDF
  • Add more LeetCode problem suggestions

πŸ™ Acknowledgements

  • My project guide for not giving up on me when I showed up with "I'll just use cyclomatic complexity as the score" in month 2
  • Big-O Cheat Sheet β€” referenced constantly
  • OWASP Top 10 β€” for the security patterns
  • Streamlit docs β€” surprisingly good docs
  • Stack Overflow β€” obviously

Made with way too much coffee β˜• and late nights πŸŒ™

Shivam Jaiswal β€” B.Tech CSE, Final Year Sarthak Agrawal β€” B.Tech CSE, Final Year Yash Sharma β€” B.Tech CSE, Final Year Tanishq Kumarβ€” B.Tech CSE, Final Year

⭐ Star this repo if it helped you! ⭐

About

CodeSense is an AI-powered code quality analyzer that evaluates source code for bugs, performance issues, security risks, and best practices. It combines static analysis with machine learning models to provide intelligent suggestions, quality scores, and actionable insights to help developers write clean, efficient, and maintainable code.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages