SignSense is a machine learning-based hand gesture recognition system that detects and classifies hand gestures (wave, stop, thumbs_up) using MediaPipe for hand landmark detection and a Random Forest classifier for robust prediction. It supports both dataset-based training (from Kaggle) and real-time inference via webcam, making it ideal for human-computer interaction, accessibility tools, and interactive AI applications.
- Project Overview
- Features
- Technologies Used
- Installation
- Dataset
- Usage
- Code Structure
- Improving Model Accuracy
- Troubleshooting
- Contributing
- License
- Acknowledgments
SignSense uses computer vision and machine learning to identify hand gestures from static images and live video. It:
- Extracts 3D hand landmarks with MediaPipe
- Converts them into feature vectors
- Trains a Random Forest classifier
- Performs real-time gesture detection with live visual feedback
The system resolves common issues like mislabelled gesture classes and low model accuracy by applying:
- Class filtering
- Data augmentation
- Feature scaling
It runs seamlessly on:
- Local machines (real-time webcam inference)
- Cloud environments like Kaggle (dataset-based training)
- π Gesture Recognition: Detects
wave,stop,thumbs_up - π₯ Real-Time Inference: Predicts gestures live from webcam
- π Dynamic Dataset Loader: Auto-maps folder names like
0,1,2orwave,stop,thumbs_up - π Data Augmentation: Random rotations, flips, brightness adjustments
- βοΈ Feature Scaling: Uses StandardScaler to normalize landmarks
- π Live Confidence Plot: Real-time matplotlib graph during webcam predictions
- π‘ Error Logging: Handles image load errors and missing landmarks
- π Kaggle Compatible: Fully runnable in Kaggle notebooks (no webcam needed)
- Python 3.6+
- OpenCV β for webcam & image handling
- MediaPipe β for hand landmark extraction
- Scikit-learn β for Random Forest classifier & scaling
- NumPy β for feature manipulation
- Matplotlib β for live prediction plotting
- KaggleHub β for dataset download
- Tqdm β for progress bars
- Python 3.6+
- Webcam (optional for real-time)
- Internet access (for Kaggle dataset)
kaggle.jsonAPI token if using KaggleHub
git clone https://github.com/your-username/SignSense.git
cd SignSense
pip install opencv-python mediapipe numpy scikit-learn matplotlib kagglehub tqdm-
Get your
kaggle.jsonfrom Kaggle Account Settings -
Place it in:
- Linux/Mac:
~/.kaggle/kaggle.json - Windows:
C:\Users\<Username>\.kaggle\kaggle.json
- Linux/Mac:
-
Set permissions:
chmod 600 ~/.kaggle/kaggle.json- Source: Kaggle: abhishek14398/gesture-recognition-dataset
- Gestures:
wave,stop,thumbs_up - Format:
.jpg,.jpeg,.png - Structure:
gesture-recognition-dataset/
βββ train/
β βββ 0/ or wave/
β βββ 1/ or stop/
β βββ 2/ or thumbs_up/
βββ val/
βββ ...
- Filters non-gesture classes
- Extracts 63 features per image (21 landmarks Γ 3 coords)
- Max 500 images per class
- Augments each with flips, rotations, brightness tweaks
pip install opencv-python mediapipe numpy scikit-learn matplotlib kagglehub tqdmimport os, glob, kagglehub
KAGGLE_DATASET = "abhishek14398/gesture-recognition-dataset"
dataset_path = kagglehub.dataset_download(KAGGLE_DATASET)
for root, dirs, files in os.walk(dataset_path):
print(f"{root} β {dirs}, Files: {len(files)}")
image_paths = glob.glob(os.path.join(dataset_path, "**/*.*"), recursive=True)
print(f"Found {len(image_paths)} images.")- Trains on dataset
- Opens webcam
- Live prediction + matplotlib plot
python gesture_recognition.pygesture_recognition.py
βββ Config
βββ Feature Extraction
βββ Data Augmentation
βββ Dataset Loader
βββ Training + Inference
βββ Live Visualization- β Feature scaling (StandardScaler)
- β 100-tree Random Forest
- β Augmentation Γ3 (flip, rotate, brightness)
- β Min detection confidence: 0.3
π Try:
- More augmentations
- MLPClassifier
- K-Fold validation
Check dataset_path and folder mappings (GESTURE_MAPPING).
Try:
- Valid landmark detection:
img = cv2.imread("image.jpg")
result = hands.process(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
print("Landmarks:", bool(result.multi_hand_landmarks))- Increase
AUGMENTATION_FACTOR - Use a different classifier
import cv2
cap = cv2.VideoCapture(0)
print("Webcam opened:", cap.isOpened())- Fork the repo
- Create a feature branch
- Commit & push
- Open a pull request
Please follow PEP8 and document your changes!
MIT License β See the LICENSE file.
- MediaPipe for real-time hand tracking
- Kaggle for the dataset
- Scikit-learn for ML tools
- OpenCV for video & image I/O