A computer-vision system that classifies fossil images into 6 taxonomic categories and returns rich paleontological metadata — built end-to-end following the CRISP-DM methodology.
Identifying fossils requires specialized knowledge that takes years to develop. FossilNet automates the process: upload a photo of a fossil and instantly receive its name, geological period, taxonomic classification, and a list of visually similar taxa.
The model uses Transfer Learning with an EfficientNet-B0 backbone, fine-tuned on real-world images from the Geo Fossils-I dataset.
| # | Class | Period |
|---|---|---|
| 1 | Ammonite | Devonian – Cretaceous |
| 2 | Trilobite | Cambrian – Permian |
| 3 | Crinoid (Sea Lily) | Ordovician – Present |
| 4 | Coral | Ordovician – Present |
| 5 | Belemnite | Late Triassic – Late Cretaceous |
| 6 | Leaf Fossil | Silurian – Present |
FossilNet/
│
├── data/ # Not tracked by Git
│ ├── raw/ # Original downloaded images
│ ├── train/ # 70 % training split
│ ├── val/ # 15 % validation split
│ └── test/ # 15 % test split
│
├── notebooks/ # Jupyter notebooks
│ ├── 1_eda.ipynb # Exploratory Data Analysis
│ └── 2_evaluation.ipynb # Metrics & Confusion Matrix
│
├── src/ # Core ML pipelines
│ ├── download_data.py # Wikimedia Commons image scraper
│ ├── clean_data.py # Corrupt-image removal utility
│ ├── data_prep.py # Distribution plotting (EDA)
│ ├── dataset.py # PyTorch DataLoader + augmentations
│ ├── model.py # EfficientNet-B0 architecture
│ ├── train.py # Training loop with LR scheduling
│ └── predict.py # CLI single-image inference
│
├── api/ # Deployment
│ ├── app.py # FastAPI backend
│ └── fossil_metadata.json # Paleontological metadata DB
│
├── models/ # Weights not tracked by Git
│ ├── fossil_efficientnet.pth # Best model checkpoint
│ ├── class_names.json # Ordered class list
│ └── training_history.json # Per-epoch metrics
│
├── .gitignore
├── requirements.txt
└── README.md
git clone https://github.com/0khacha/FossilNet.git
cd FossilNet
pip install -r requirements.txtThe automated scraper pulls real fossil images from Wikimedia Commons, validates every download, and splits into train / val / test:
python src/download_data.pyjupyter notebook notebooks/1_eda.ipynbpython src/train.pyYou will see per-epoch output like:
Epoch | Train Loss | Val Loss | Val Acc | Time
───────────────────────────────────────────────────────
1 | 1.8432 | 1.2105 | 42.86% | 12.3s [saved]
2 | 1.1204 | 0.8912 | 61.43% | 11.8s [saved]
...
jupyter notebook notebooks/2_evaluation.ipynbpython src/predict.py path/to/fossil.jpguvicorn api.app:app --host 0.0.0.0 --port 8000Then visit http://localhost:8000/docs to try the Swagger UI.
Example response:
{
"fossil_name": "Ammonite",
"geological_period": "Devonian – Cretaceous (419 – 66 Ma)",
"classification": "Mollusca → Cephalopoda → Ammonoidea",
"description": "Extinct marine cephalopods with tightly coiled, chambered shells.",
"confidence_score": 0.9412,
"similar_fossils": ["Nautiloid", "Goniatite"]
}| Phase | Implementation |
|---|---|
| Business Understanding | Solve fossil identification for students, researchers, and museums |
| Data Understanding | EDA notebook with class distributions and sample visualization |
| Data Preparation | Automated scraping, validation, augmentation (rotation, crop, jitter) |
| Modeling | EfficientNet-B0 transfer learning, AdamW, LR scheduling |
| Evaluation | Confusion matrix, classification report (P / R / F1), training curves |
| Deployment | FastAPI REST API + CLI tool |
- Python 3.10+
- PyTorch + torchvision
- EfficientNet-B0 (transfer learning)
- FastAPI (REST API)
- scikit-learn (metrics)
- Pillow (image processing)
- seaborn / matplotlib (visualization)
- More classes — Add dinosaur teeth, plant fossils, microfossils
- Few-shot learning — Siamese / Prototypical networks for rare specimens
- Multimodal — Combine image + GPS coordinates + rock type
- Similarity search — Embedding-based nearest-neighbor via FAISS
- Mobile — Export to ONNX / TFLite for on-device inference
This project is provided for educational purposes. Images are sourced from Wikimedia Commons under their respective licenses.
Built by @0khacha