Skip to content

soaapp/mle-interview-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 ML Engineer Interview Trainer

An interactive AI-powered application that helps Machine Learning Engineers train and prep for interviews using intelligent question generation, ML-powered recommendations, and comprehensive progress tracking.

Visual Walkthrough

Screenshot 2025-12-01 at 4 11 59 PM (2) Screenshot 2025-12-01 at 4 13 58 PM (2) Screenshot 2025-12-01 at 4 14 18 PM (2)

🚀 Quick Start

pip install -r requirements.txt
streamlit run app.py

Set your OpenAI API key in Settings or create a .env file for AI-generated questions.

🔍 Topics Covered

Master the essential knowledge areas every ML engineer should know:

  • 📖 Definitions - LLM, Encoder, Fine-tuning, Tokens, Embeddings, Transformers
  • 🎯 ML Fundamentals - Supervised/Unsupervised learning, Overfitting, Bias-variance tradeoff
  • 🧠 Deep Learning Basics - Neural networks, Backpropagation, Activation functions
  • 🔍 RAG - Retrieval Augmented Generation systems and techniques
  • 🎯 Fine-tuning - Transfer learning, Parameter-efficient methods (LoRA, PEFT)
  • 🏗️ Model Architecture - Transformers, CNNs, RNNs, Attention mechanisms
  • 📊 Data Processing - Feature engineering, Tokenization, Preprocessing pipelines
  • 🔄 MLOps - Model deployment, Monitoring, CI/CD for ML systems
  • 📈 Evaluation - Metrics, A/B testing, Model validation techniques
  • 💾 Vector Databases - Similarity search, Indexing methods (FAISS, HNSW)
  • 🧠 Transformers - Architecture details, Positional encoding, Variants

67+ expert-crafted questions across multiple choice, open-ended theory, and coding challenges.

🤖 Machine Learning Components I wanted to build into this

This project showcases several production-ready ML components beyond simple API calls, demonstrating real machine learning engineering skills:

1. Question Similarity Engine

  • Technology: Sentence Transformers (all-MiniLM-L6-v2) + FAISS vector database
  • Purpose: Semantic similarity search for intelligent question recommendations
  • Implementation:
    • Embeds all 67+ questions into 384-dimensional vectors
    • Uses FAISS for sub-millisecond similarity search with cosine similarity
    • Enables "find questions like this one" functionality
  • Resume Value: Vector embeddings, semantic search, high-performance indexing

2. Personalized Recommendation System

  • Technology: Collaborative filtering + content-based filtering
  • Purpose: Recommend questions based on user's weak areas and learning patterns
  • Implementation:
    • Analyzes user performance across categories and difficulties
    • Combines similarity scores with performance gaps
    • Multi-factor recommendation algorithm considering accuracy, time, and topic coverage
  • Resume Value: Recommendation engines, user behavior analysis, personalization algorithms

3. Adaptive Difficulty Prediction

  • Technology: Random Forest Classifier (scikit-learn)
  • Purpose: Predict optimal question difficulty for individual users
  • Implementation:
    • Feature engineering: response time, answer length, question type, user history
    • Trains on user interaction data (supervised learning)
    • Real-time difficulty adjustment based on performance
  • Resume Value: Supervised learning, feature engineering, model training & deployment

4. Response Quality Classification

  • Technology: NLP analysis + Statistical modeling
  • Purpose: Automated assessment of open-ended response quality
  • Implementation:
    • Text analysis: vocabulary richness, technical term usage, response complexity
    • Pattern recognition for learning progression tracking
    • Statistical modeling of user improvement over time
  • Resume Value: NLP, text classification, behavioral modeling

5. Knowledge Gap Analysis

  • Technology: Statistical analysis + Pattern recognition
  • Purpose: Identify specific learning gaps and weak knowledge clusters
  • Implementation:
    • Performance clustering across topics and difficulties
    • Statistical significance testing for weakness identification
    • Automated learning path generation based on gap analysis
  • Resume Value: Data analysis, statistical modeling, educational data mining

6. ML Model Pipeline & MLOps

  • Technology: Model persistence, caching, batch processing
  • Purpose: Production-ready ML deployment with efficient inference
  • Implementation:
    • Model serialization and versioning (pickle, faiss.write_index)
    • Efficient caching strategies for embeddings and predictions
    • Error handling and fallback mechanisms
    • Batch processing for embedding generation
  • Resume Value: MLOps, model deployment, production ML systems

Technical Implementation Details:

# Example: Semantic similarity search
embeddings = sentence_transformer.encode(questions)
faiss_index = faiss.IndexFlatIP(384)  # Inner product for cosine similarity
faiss.normalize_L2(embeddings)       # Normalize for cosine similarity
faiss_index.add(embeddings)          # Build searchable index

# Example: Difficulty prediction
features = [response_time, answer_length, question_type_encoded, user_accuracy]
difficulty_model = RandomForestClassifier(n_estimators=100)
difficulty_model.fit(X_train, y_train)
predicted_difficulty = difficulty_model.predict(new_features)

Why These ML Components Matter:

  1. Real ML Engineering: Beyond API calls - actual model training, inference, and deployment
  2. Production-Ready: Error handling, caching, model persistence
  3. Scalable Architecture: Efficient algorithms suitable for real-world usage
  4. Multiple ML Domains: NLP, recommendation systems, supervised learning, vector databases
  5. End-to-End Pipeline: Data processing → model training → inference → user experience

This demonstrates the ability to build complete ML systems rather than just integrating external APIs.

✨ Core Features

🎲 Smart Question Generation

  • AI-Generated Questions using OpenAI with smart API usage limiting (every 3rd question)
  • Curated Question Bank with 67+ expert-crafted questions
  • Multiple Formats: Multiple choice, open-ended theory, coding challenges
  • Adaptive Difficulty: Automatically adjusts based on your performance

🤖 ML-Powered Recommendations

  • Semantic Similarity Search finds questions similar to your weak areas
  • Personalized Learning Path based on performance analysis
  • Weak Area Detection using statistical modeling
  • Smart Question Suggestions combining multiple ML algorithms

📊 Intelligent Progress Tracking

  • Real-time Analytics with interactive charts and performance trends
  • Achievement System with streaks, accuracy milestones, and topic coverage
  • Session Management to organize focused practice sessions
  • Data Export for external analysis

💡 AI-Powered Learning Experience

  • Contextual Hints for challenging questions using GPT
  • Detailed Feedback with AI evaluation of open-ended responses
  • Response Pattern Analysis to identify learning gaps
  • Dynamic Content that evolves with your skill level

Ready to level up your ML engineering interview skills? 🚀

Clone, run, and start practicing with AI-powered questions tailored to your learning needs!

About

This is a fun side experiment on using generative AI to create a tool that can train + prep ML Engineers for interviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages