Skip to content

Latest commit

Β 

History

History
174 lines (124 loc) Β· 4.21 KB

File metadata and controls

174 lines (124 loc) Β· 4.21 KB

IoT Intrusion Detection System (IDS)

Apache Spark MLlib | BoT-IoT Dataset | Real-time Detection

System Status Spark Python React

A production-grade, scalable IoT Intrusion Detection System built with Apache Spark for processing massive network traffic. Train ML models on the BoT-IoT dataset from UNSW Canberra Cyber, achieving near real-time anomaly detection with reproducible ML pipelines.


🎯 Project Overview

This system implements a complete ML pipeline for detecting IoT-based network attacks using:

  • Apache Spark MLlib for distributed ML training
  • BoT-IoT Dataset with realistic attack patterns (DDoS, DoS, Reconnaissance, Theft)
  • Real-time streaming detection with live alert monitoring
  • Full-stack GUI for complete pipeline control

Key Features

βœ… Data Ingestion

  • Generate synthetic BoT-IoT dataset with realistic traffic patterns
  • Support for 72M+ records (configurable)
  • Automatic preprocessing and validation

βœ… ML Model Training

  • Random Forest (F1 Score: >99%)
  • Decision Tree (F1 Score: ~98%)
  • Naive Bayes (Baseline)
  • Chi-square feature selection (Top 5, Top 10, All features)
  • Hyperparameter tuning (Max Depth, Number of Trees)

βœ… Model Evaluation

  • Comprehensive metrics (Accuracy, F1, Precision, Recall)
  • Confusion matrix visualization
  • Model comparison dashboard
  • Training time analysis

βœ… Real-time Detection

  • Simulated streaming detection service
  • Live threat alerts with severity levels
  • Attack categorization (DDoS, DoS, Reconnaissance, Theft)
  • Real-time statistics and monitoring

βœ… Full GUI Dashboard

  • System overview with key metrics
  • Data ingestion and preprocessing interface
  • Model training configuration
  • Evaluation and comparison tools
  • Live monitoring with real-time alerts

πŸš€ Quick Start

Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:8001/api

Complete Workflow

  1. Generate Dataset

    • Navigate to Data Ingestion
    • Set sample size (e.g., 10,000)
    • Click "Generate Dataset"
    • View statistics
  2. Train Model

    • Go to Model Training
    • Select algorithm (Random Forest recommended)
    • Choose feature selection
    • Adjust hyperparameters
    • Click "Start Training"
    • Wait for results (~15-20s)
  3. Evaluate Models

    • Visit Evaluation page
    • Compare all trained models
    • View best performing model
    • Analyze metrics
  4. Start Monitoring

    • Go to Live Monitoring
    • Click "Start Detection"
    • View real-time alerts
    • Monitor attack patterns

πŸ“Š Dataset: BoT-IoT

Source

UNSW Canberra Cyber - https://research.unsw.edu.au/projects/bot-iot-dataset

Attack Types

  • DDoS - Distributed Denial of Service
  • DoS - Denial of Service
  • Reconnaissance - Network scanning
  • Theft - Data exfiltration

Performance Benchmarks

Algorithm F1 Score Accuracy Training Time
Random Forest 99.7% 99.8% ~15-20s
Decision Tree 99.3% 99.5% ~8-12s
Naive Bayes 50.9% 51.2% ~5-8s

πŸ”§ API Endpoints

Data Ingestion

POST /api/data/generate
GET /api/data/stats

Model Training

POST /api/train
GET /api/models

Streaming Detection

POST /api/streaming/start
POST /api/streaming/stop
GET /api/alerts/recent

πŸ“ Project Structure

/app/
β”œβ”€β”€ backend/               # FastAPI + Spark MLlib
β”‚   β”œβ”€β”€ server.py
β”‚   β”œβ”€β”€ spark_engine.py
β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”œβ”€β”€ model_training.py
β”‚   └── streaming_detection.py
β”‚
β”œβ”€β”€ frontend/              # React Dashboard
β”‚   └── src/
β”‚       β”œβ”€β”€ pages/
β”‚       └── components/
β”‚
β”œβ”€β”€ data/                  # Datasets & models
└── alerts/               # Detection alerts