Skip to content

0khacha/FossilNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FossilNet — Deep Learning Fossil Classification

Python 3.10+ PyTorch FastAPI License: MIT

A computer-vision system that classifies fossil images into 6 taxonomic categories and returns rich paleontological metadata — built end-to-end following the CRISP-DM methodology.


Overview

Identifying fossils requires specialized knowledge that takes years to develop. FossilNet automates the process: upload a photo of a fossil and instantly receive its name, geological period, taxonomic classification, and a list of visually similar taxa.

The model uses Transfer Learning with an EfficientNet-B0 backbone, fine-tuned on real-world images from the Geo Fossils-I dataset.

Supported Categories

# Class Period
1 Ammonite Devonian – Cretaceous
2 Trilobite Cambrian – Permian
3 Crinoid (Sea Lily) Ordovician – Present
4 Coral Ordovician – Present
5 Belemnite Late Triassic – Late Cretaceous
6 Leaf Fossil Silurian – Present

Project Structure

FossilNet/
│
├── data/                          # Not tracked by Git
│   ├── raw/                       # Original downloaded images
│   ├── train/                     # 70 % training split
│   ├── val/                       # 15 % validation split
│   └── test/                      # 15 % test split
│
├── notebooks/                     # Jupyter notebooks
│   ├── 1_eda.ipynb                # Exploratory Data Analysis
│   └── 2_evaluation.ipynb         # Metrics & Confusion Matrix
│
├── src/                           # Core ML pipelines
│   ├── download_data.py           # Wikimedia Commons image scraper
│   ├── clean_data.py              # Corrupt-image removal utility
│   ├── data_prep.py               # Distribution plotting (EDA)
│   ├── dataset.py                 # PyTorch DataLoader + augmentations
│   ├── model.py                   # EfficientNet-B0 architecture
│   ├── train.py                   # Training loop with LR scheduling
│   └── predict.py                 # CLI single-image inference
│
├── api/                           # Deployment
│   ├── app.py                     # FastAPI backend
│   └── fossil_metadata.json       # Paleontological metadata DB
│
├── models/                        # Weights not tracked by Git
│   ├── fossil_efficientnet.pth    # Best model checkpoint
│   ├── class_names.json           # Ordered class list
│   └── training_history.json      # Per-epoch metrics
│
├── .gitignore
├── requirements.txt
└── README.md

Quick Start

1 · Clone & Install

git clone https://github.com/0khacha/FossilNet.git
cd FossilNet
pip install -r requirements.txt

2 · Download the Dataset

The automated scraper pulls real fossil images from Wikimedia Commons, validates every download, and splits into train / val / test:

python src/download_data.py

3 · Explore the Data

jupyter notebook notebooks/1_eda.ipynb

4 · Train the Model

python src/train.py

You will see per-epoch output like:

 Epoch |  Train Loss |  Val Loss |  Val Acc |  Time
───────────────────────────────────────────────────────
     1 |      1.8432 |    1.2105 |  42.86% |  12.3s [saved]
     2 |      1.1204 |    0.8912 |  61.43% |  11.8s [saved]
    ...

5 · Evaluate

jupyter notebook notebooks/2_evaluation.ipynb

6 · Quick CLI Prediction

python src/predict.py  path/to/fossil.jpg

7 · Deploy as API

uvicorn api.app:app --host 0.0.0.0 --port 8000

Then visit http://localhost:8000/docs to try the Swagger UI.

Example response:

{
  "fossil_name": "Ammonite",
  "geological_period": "Devonian – Cretaceous (419 – 66 Ma)",
  "classification": "Mollusca → Cephalopoda → Ammonoidea",
  "description": "Extinct marine cephalopods with tightly coiled, chambered shells.",
  "confidence_score": 0.9412,
  "similar_fossils": ["Nautiloid", "Goniatite"]
}

Methodology (CRISP-DM)

Phase Implementation
Business Understanding Solve fossil identification for students, researchers, and museums
Data Understanding EDA notebook with class distributions and sample visualization
Data Preparation Automated scraping, validation, augmentation (rotation, crop, jitter)
Modeling EfficientNet-B0 transfer learning, AdamW, LR scheduling
Evaluation Confusion matrix, classification report (P / R / F1), training curves
Deployment FastAPI REST API + CLI tool

Tech Stack

  • Python 3.10+
  • PyTorch + torchvision
  • EfficientNet-B0 (transfer learning)
  • FastAPI (REST API)
  • scikit-learn (metrics)
  • Pillow (image processing)
  • seaborn / matplotlib (visualization)

Potential Improvements

  • More classes — Add dinosaur teeth, plant fossils, microfossils
  • Few-shot learning — Siamese / Prototypical networks for rare specimens
  • Multimodal — Combine image + GPS coordinates + rock type
  • Similarity search — Embedding-based nearest-neighbor via FAISS
  • Mobile — Export to ONNX / TFLite for on-device inference

License

This project is provided for educational purposes. Images are sourced from Wikimedia Commons under their respective licenses.


Built by @0khacha

About

A deep learning computer vision system for fossil image classification built with PyTorch, EfficientNet, and FastAPI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors