Skip to content

iwstech3/smart_campus_guide-Backend

Repository files navigation

# Smart Campus Guide - Embedding Generation System

> Converting campus location data into semantic embeddings for intelligent navigation

## 📖 Overview

This system transforms raw campus location data (buildings, classrooms, landmarks) into vector embeddings that enable semantic search. Users can ask "Where's the computer lab near the library?" and get accurate results based on meaning, not just keywords.

**Current Phase:** Embedding Generation  
**Next Phase:** Vector Database Integration (Qdrant) + Navigation API

---

## 🎯 What This Does

1. **Reads** campus location data from JSON
2. **Generates** AI-powered embeddings using sentence transformers
3. **Saves** enriched data with embeddings for semantic search
4. **Enables** natural language queries like "class near cafeteria"

---

## 📁 Project Structure

```
project-folder/
│
├── venv/                          # Virtual environment (not in git)
│
├── data/                          # Location data storage
│   ├── campus_locations.json                  # Your raw data (INPUT)
│   └── campus_locations_with_embeddings.json  # Generated output
│
├── services/                      # Core processing scripts
│   └── embedding_service.py       # Embedding generation engine
│
├── tests/                         # Validation scripts
│   └── test_embedding.py          # Tests embedding quality
│
├── requirements.txt               # Python dependencies
├── .gitignore                     # Git ignore rules
└── README.md                      # This file
```

---

## 🚀 Quick Start

### 1. Setup Environment

```bash
# Activate virtual environment
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

**Already installed?** Skip this step!

### 2. Prepare Your Data

Create or update `data/campus_locations.json`:

```json
[
  {
    "id": "loc_001",
    "name": "Computer Science Building",
    "latitude": 5.9631,
    "longitude": 10.1591,
    "type": "building",
    "description": "Main CS building near the library and cafeteria"
  },
  {
    "id": "loc_002",
    "name": "CS-201",
    "latitude": 5.9632,
    "longitude": 10.1592,
    "type": "class",
    "description": "Computer networks lecture hall, second floor CS building, next to library"
  }
]
```

**Required Fields:**
- `id` - Unique identifier (e.g., "loc_001")
- `name` - Location name
- `latitude` - GPS latitude coordinate
- `longitude` - GPS longitude coordinate  
- `type` - Category (class, building, landmark, office, etc.)
- `description` - Detailed description with nearby landmarks

**Pro Tip:** Mention nearby landmarks in descriptions for better semantic search!

### 3. Generate Embeddings

```bash
python services/embedding_service.py
```

**Output:**
```
Loading model: all-MiniLM-L6-v2...
Model loaded successfully!
Loaded 15 locations from data/campus_locations.json
Generating embeddings for 15 locations...
Batches: 100%|████████| 1/1 [00:00<00:00]
Embeddings generated successfully!
Saved 15 locations with embeddings to data/campus_locations_with_embeddings.json

✅ Processing complete!
```

**First run:** Downloads ~80MB model (cached for future use)

### 4. Validate Results

```bash
python tests/test_embedding.py
```

Tests check:
- ✅ Embeddings generated correctly
- ✅ All required fields present
- ✅ Semantic search functionality works

---

## 📊 Data Collection Guide

### Best Practices

1. **Be Descriptive**
   ```json
   "description": "Large lecture hall, third floor Science building, near main staircase and water fountain"
   ```

2. **Include Nearby Landmarks**
   ```json
   "description": "CS lab next to library, across from cafeteria, ground floor"
   ```

3. **Mention Floor Levels**
   ```json
   "description": "Professor's office, second floor Admin building, room 205"
   ```

4. **Add Navigation Hints**
   ```json
   "description": "Classroom at end of hallway, past the vending machines, left side"
   ```

### Location Types

- `class` - Classrooms, lecture halls
- `building` - Main buildings
- `landmark` - Cafeteria, library, statues, fountains
- `office` - Faculty offices, admin offices
- `facility` - Restrooms, parking, gates
- `outdoor` - Courtyards, sports fields, gardens

---

## 🔧 How It Works

### Embedding Generation

The system creates semantic embeddings by combining:
```python
embedding_text = f"{name}. {description}. Type: {type}"
```

**Example:**
```
Input: "CS-201. Computer networks class, second floor, near library. Type: class"
Output: [0.123, -0.456, 0.789, ...] (384-dimensional vector)
```

### Why This Enables Smart Search

Traditional keyword search:
- Query: "computer lab" → Only finds exact matches

Semantic embedding search:
- Query: "computer lab" → Finds: CS lab, programming room, coding space, etc.
- Query: "near library" → Finds locations that mention library in descriptions
- Query: "second floor classroom" → Understands floor + type relationships

---

## 📈 What's Next?

### Phase 2: Vector Database (Qdrant)
- Store embeddings in Qdrant for fast similarity search
- Enable real-time queries with sub-second response

### Phase 3: Navigation Integration
- **Google Maps API:** Get routes between user location and destination
- **Gemini API:** Generate human-like walking directions
- **Final Output:** "Walk past the library, turn right at the cafeteria. CS-201 is on the second floor, near the main staircase."

---

## 🛠️ Technical Details

**Embedding Model:** `all-MiniLM-L6-v2`
- Dimension: 384
- Speed: Fast (suitable for real-time)
- Quality: High accuracy for semantic search

**Dependencies:**
- `sentence-transformers` - Embedding generation
- `numpy` - Numerical operations
- `torch` - Deep learning backend

**Performance:**
- 100 locations: ~5 seconds
- 1000 locations: ~30 seconds
- Model download (first time): ~30 seconds

---

## 🐛 Troubleshooting

**Error: File not found**
```
Solution: Create data/campus_locations.json with your location data
```

**Error: JSON format invalid**
```
Solution: Validate JSON at https://jsonlint.com
Check: Proper commas, brackets, quotes
```

**Model download fails**
```
Solution: Check internet connection
The model downloads from HuggingFace (80MB)
```

**Memory issues**
```
Solution: Process in smaller batches or use a lighter model
```

---

## 📝 Example Use Cases

### Search Query Examples

After generating embeddings, these queries will work:

1. **"computer science classroom"** → Finds CS-201, CS-Lab, etc.
2. **"near library"** → Finds all locations mentioning library
3. **"second floor"** → Finds locations on second floor
4. **"professor office"** → Finds faculty offices
5. **"place to eat"** → Finds cafeteria, food court, etc.

---

## 👥 Contributing

### Adding New Locations

1. Add entry to `data/campus_locations.json`
2. Run `python services/embedding_service.py`
3. Test with `python tests/test_embedding.py`

### Data Format

```json
{
  "id": "unique_id_here",
  "name": "Location Name",
  "latitude": 0.000000,
  "longitude": 0.000000,
  "type": "class|building|landmark|office|facility",
  "description": "Detailed description with nearby landmarks"
}
```

---

## 📄 License

Smart Campus Guide - Educational Project

---

## 🎓 Project Info

**Goal:** Provide human-like navigation directions to help students and visitors find exact locations on campus

**Technology Stack:**
- Python 3.x
- Sentence Transformers (AI embeddings)
- Future: Qdrant (vector DB), Google Maps API, Gemini API

**Status:** ✅ Phase 1 Complete (Embedding Generation)

---

**Questions?** Run tests or validate your JSON format if something isn't working.






# 🎓 Smart Campus Guide - Backend API

An intelligent campus navigation and topic suggestion system for University of Bamenda.

---

## ✨ Features

### 🗺️ Navigation System
- **Natural Language Search**: "Where is the library?"
- **Human-Like Directions**: NOT robotic GPS! Sounds like a friendly student helping
- **On-Campus & Off-Campus**: Finds locations anywhere
- **Offline Cache**: Works with poor connectivity
- **Multiple Travel Modes**: Walking, driving, transit

### 💡 Topic Suggestions (Phase 4B)
- **AI-Powered Recommendations**: Gemini suggests unique topics
- **Duplicate Detection**: Avoid already-taken topics
- **Topic Reservation**: Reserve topics for students
- **Statistics**: Track topics by department/option/year

### 📊 Analytics
- **Search Tracking**: Identify popular locations
- **Usage Statistics**: Monitor system performance
- **Offline Cache Generation**: Auto-generate cache daily

---

## 🏗️ Architecture

```
Smart Campus Guide
│
├── Navigation System (Page 1)
│   ├── Qdrant (Vector Search)
│   ├── Google Maps (Routing)
│   ├── Gemini AI (Humanization)
│   └── MongoDB (Analytics)
│
└── Topic System (Page 2)
    ├── MongoDB (Topics Storage)
    ├── Gemini AI (Suggestions)
    └── Embeddings (Similarity)
```

---

## 📁 Project Structure

```
Smart-Campus-Guide/
│
├── venv/                          # Virtual environment
│
├── data/                          # Data files
│   ├── locations/
│   │   ├── campus_locations.json
│   │   └── campus_locations_embeddings.json
│   ├── topics/
│   │   └── computer_engineering.json
│   └── cache/
│       └── popular_places_latest.json
│
├── config/                        # Configuration
│   ├── qdrant_config.json
│   └── api_keys.json
│
├── services/                      # Backend services
│   ├── location_utils.py          # ✅ Geographic calculations
│   ├── maps_service.py            # ✅ Google Maps integration
│   ├── gemini_service.py          # ✅ AI humanization
│   ├── mongodb_service.py         # ✅ Database operations
│   ├── analytics_service.py       # ✅ Search tracking
│   ├── cache_service.py           # ✅ Offline cache
│   ├── qdrant_service.py          # ✅ Vector search
│   ├── embedding_service.py       # ✅ Generate embeddings
│   └── topic_service.py           # ✅ Topic management
│
├── api/                           # FastAPI application
│   ├── main.py                    # ✅ Main app
│   ├── routes/
│   │   ├── navigation.py          # Navigation endpoints
│   │   ├── topics.py              # Topic endpoints
│   │   └── cache.py               # Cache endpoints
│   └── models/
│       ├── navigation_models.py   # ✅ Request/response models
│       ├── topic_models.py        # ✅ Topic models
│       └── cache_models.py        # ✅ Cache models
│
├── scripts/                       # Utility scripts
│   ├── migrate_to_mongodb.py      # ✅ Migrate topics
│   └── generate_offline_cache.py  # ✅ Generate cache
│
├── tests/                         # Tests
│   ├── test_embedding.py          # ✅ Done
│   └── test_qdrant.py             # ✅ Done
│
├── requirements.txt               # ✅ Dependencies
├── .env.example                   # ✅ Config template
├── .gitignore                     # ✅ Git ignore
└── README.md                      # ✅ This file
```

---

## 🚀 Quick Start

### 1. Prerequisites

- Python 3.10+
- MongoDB Atlas account (free tier)
- Google Maps API key
- Gemini API key
- Qdrant Cloud account (free tier)

### 2. Installation

```bash
# Clone repository
git clone <your-repo-url>
cd Smart-Campus-Guide

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

### 3. Configuration

```bash
# Copy environment template
cp .env.example .env

# Edit .env with your API keys
# - GOOGLE_MAPS_API_KEY
# - GEMINI_API_KEY
# - MONGODB_URI

# Update config/api_keys.json with same values
```

### 4. Setup Data

```bash
# Generate embeddings for locations
python services/embedding_service.py

# Upload to Qdrant
python services/qdrant_service.py upload

# Test Qdrant
python tests/test_qdrant.py
```

### 5. Run API

```bash
# Start FastAPI server
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

# Access Swagger UI
# Open: http://localhost:8000/docs
```

---

## 📡 API Endpoints

### Navigation

- `POST /api/navigate` - Get directions to a location
- `POST /api/search` - Search for locations
- `GET /api/location/{id}` - Get location details

### Topics (Phase 4B)

- `POST /api/topics/add` - Add new topic
- `GET /api/topics/list` - Get all topics
- `POST /api/topics/check` - Check duplicate
- `GET /api/topics/statistics` - Get statistics

### Cache

- `GET /api/cache/download` - Download offline cache
- `GET /api/cache/info` - Get cache information
- `GET /api/cache/analytics` - Get search analytics

---

## 🧪 Testing

```bash
# Test all services
python services/maps_service.py
python services/gemini_service.py
python services/mongodb_service.py

# Test Qdrant integration
python tests/test_qdrant.py

# Test embeddings
python tests/test_embedding.py

# Run all tests
pytest tests/
```

---

## 🔧 Maintenance Scripts

### Generate Offline Cache

```bash
# Generate cache for top 20 locations
python scripts/generate_offline_cache.py --top 20

# Schedule this to run daily (cron job)
0 0 * * * cd /path/to/project && python scripts/generate_offline_cache.py
```

### Migrate Topics

```bash
# Migrate all topic files
python scripts/migrate_to_mongodb.py

# Migrate specific file
python scripts/migrate_to_mongodb.py --file data/topics/computer_engineering.json

# Dry run (test without importing)
python scripts/migrate_to_mongodb.py --dry-run
```

---

## 🌐 Deployment

### Development

```bash
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
```

### Production (Render/Heroku)

```bash
# Procfile already configured
web: uvicorn api.main:app --host 0.0.0.0 --port $PORT --workers 4
```

---

## 🔐 Security

- API keys in `.env` (NOT in git)
- Rate limiting: 60 requests/minute
- CORS configured for frontend
- Input validation via Pydantic
- MongoDB connection encrypted

---

## 📊 Technology Stack

- **Backend**: FastAPI, Python 3.10+
- **AI**: Google Gemini, Sentence Transformers
- **Database**: MongoDB Atlas, Qdrant Cloud
- **Maps**: Google Maps API
- **Embeddings**: all-MiniLM-L6-v2

---

## 👥 Team

- **Backend Developer**: Tracy
- **Frontend Team**: React/Next.js
- **University**: University of Bamenda
- **Department**: Computer Engineering

---

## 📝 License

Educational project for University of Bamenda.

---

## 🆘 Support

For issues or questions:
1. Check `/docs` for API documentation
2. Review error logs
3. Contact: tracy@campusguide.com

---

## ✅ Project Status

**Phase 1**: ✅ Embeddings System - Complete
**Phase 2**: ✅ Qdrant Integration - Complete
**Phase 3**: ✅ Navigation System - Complete
**Phase 4A**: ✅ Offline Cache - Complete
**Phase 4B**: 🔄 Topic Intelligence - In Progress

---

**Built with ❤️ by Tracy for University of Bamenda**

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages