Local Deployment Guide

This guide covers setting up and running the Multi-Modal Academic Research System on your local development machine.

Prerequisites
Initial Setup
Environment Configuration
Running the Application
Running Multiple Instances
Port Configuration
Local Development Workflow
Troubleshooting

Prerequisites

System Requirements

Operating System: macOS, Linux, or Windows 10/11
Python: Version 3.9 or higher
RAM: Minimum 4GB, recommended 8GB+
Disk Space: Minimum 5GB for application and data
Docker: For running OpenSearch (or install OpenSearch natively)

Required Software

Python 3.9+

# Check Python version
python --version
# or
python3 --version

pip (Python package manager)
```
# Check pip version
pip --version
```
Docker Desktop (recommended for OpenSearch)
- Download from: https://www.docker.com/products/docker-desktop
- Verify installation:
```
docker --version
docker-compose --version
```
Git (for cloning repository)
```
git --version
```

Initial Setup

1. Clone the Repository

# Clone the repository
git clone <repository-url>
cd multi-modal-academic-research-system

# Verify repository structure
ls -la

2. Create Virtual Environment

Using a virtual environment isolates project dependencies from your system Python.

macOS/Linux:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Verify activation (should show venv in path)
which python

Windows:

# Create virtual environment
python -m venv venv

# Activate virtual environment
venv\Scripts\activate

# Verify activation
where python

3. Install Dependencies

# Upgrade pip to latest version
pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

# Verify installation
pip list

Expected Key Packages:

opensearch-py==2.4.0
langchain==0.2.16
google-generativeai==0.7.2
gradio==4.8.0
sentence-transformers==2.7.0

4. Create Data Directories

The application stores data in various directories:

# Create all necessary directories
mkdir -p data/papers
mkdir -p data/videos
mkdir -p data/podcasts
mkdir -p data/processed
mkdir -p configs
mkdir -p logs

# Verify directory structure
tree -d -L 2

Environment Configuration

1. Create Environment File

# Copy example environment file
cp .env.example .env

# Edit the file with your preferred editor
nano .env
# or
vim .env
# or
code .env  # VS Code

2. Configure Environment Variables

Edit .env file with the following configuration:

# ============================================
# API Keys (Required)
# ============================================
# Get your Gemini API key from: https://makersuite.google.com/app/apikey
GEMINI_API_KEY=your_actual_api_key_here

# ============================================
# OpenSearch Configuration
# ============================================
OPENSEARCH_HOST=localhost
OPENSEARCH_PORT=9200
OPENSEARCH_USER=admin
OPENSEARCH_PASSWORD=admin
OPENSEARCH_USE_SSL=false
OPENSEARCH_VERIFY_CERTS=false

# ============================================
# Application Configuration
# ============================================
# Gradio server settings
GRADIO_SERVER_NAME=0.0.0.0
GRADIO_SERVER_PORT=7860
GRADIO_SHARE=true

# Index name for OpenSearch
OPENSEARCH_INDEX_NAME=research_assistant

# Embedding model
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384

# ============================================
# Data Collection Settings
# ============================================
# Maximum papers to collect per query
MAX_PAPERS_PER_QUERY=10

# Maximum videos to collect per query
MAX_VIDEOS_PER_QUERY=10

# Rate limiting (requests per minute)
API_RATE_LIMIT=60

# ============================================
# Logging Configuration
# ============================================
LOG_LEVEL=INFO
LOG_FILE=logs/research_assistant.log

3. Obtain Gemini API Key

The Gemini API key is required for the application to function:

Visit: https://makersuite.google.com/app/apikey
Sign in with your Google account
Click "Create API Key"
Copy the generated key
Paste it into your .env file

Security Note: Never commit your .env file with real API keys to version control.

Running the Application

1. Start OpenSearch

Using Docker (Recommended):

# Start OpenSearch in background
docker run -d \
  --name opensearch-node1 \
  -p 9200:9200 \
  -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=MyStrongPassword123!" \
  opensearchproject/opensearch:latest

# Verify OpenSearch is running
curl -X GET "http://localhost:9200" -u admin:MyStrongPassword123!

# Check container logs
docker logs opensearch-node1

Expected Response:

{
  "name" : "opensearch-node1",
  "cluster_name" : "docker-cluster",
  "version" : {
    "number" : "2.11.0",
    ...
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Stop OpenSearch:

docker stop opensearch-node1
docker rm opensearch-node1

2. Start the Application

# Ensure virtual environment is activated
source venv/bin/activate  # macOS/Linux
# or
venv\Scripts\activate  # Windows

# Run the application
python main.py

Expected Output:

🚀 Initializing Multi-Modal Research Assistant...
✅ Research Assistant ready!
🌐 Opening web interface...
📝 Logs are being written to: logs/research_assistant.log

Running on local URL:  http://0.0.0.0:7860
Running on public URL: https://xxxxx.gradio.live

This share link expires in 72 hours.

3. Access the Application

Local URL: http://localhost:7860
Public URL: Provided in console output (valid for 72 hours)
Local Network: http://:7860

Running Multiple Instances

You may want to run multiple instances for different use cases or testing.

Method 1: Different Ports

# Terminal 1 - Instance 1 on port 7860
export GRADIO_SERVER_PORT=7860
python main.py

# Terminal 2 - Instance 2 on port 7861
export GRADIO_SERVER_PORT=7861
python main.py

# Terminal 3 - Instance 3 on port 7862
export GRADIO_SERVER_PORT=7862
python main.py

Method 2: Different Virtual Environments

# Create separate environments for each instance
python3 -m venv venv1
python3 -m venv venv2

# Setup instance 1
source venv1/bin/activate
pip install -r requirements.txt
cp .env .env1
# Edit .env1 with different GRADIO_SERVER_PORT
python main.py

# Setup instance 2 (new terminal)
source venv2/bin/activate
pip install -r requirements.txt
cp .env .env2
# Edit .env2 with different GRADIO_SERVER_PORT and INDEX_NAME
python main.py

Method 3: Different OpenSearch Indices

To avoid data conflicts, use separate indices:

# Instance 1 - .env file
OPENSEARCH_INDEX_NAME=research_assistant_dev
GRADIO_SERVER_PORT=7860

# Instance 2 - .env.test file
OPENSEARCH_INDEX_NAME=research_assistant_test
GRADIO_SERVER_PORT=7861

# Run instances
python main.py  # Uses .env
python main.py --env-file .env.test  # Custom env file (if supported)

Port Configuration

Understanding Port Usage

The application uses the following ports:

7860 - Gradio web interface (default)
9200 - OpenSearch HTTP API
9600 - OpenSearch Performance Analyzer (optional)

Changing the Gradio Port

Method 1: Environment Variable

# In .env file
GRADIO_SERVER_PORT=8080

# Or set temporarily
export GRADIO_SERVER_PORT=8080
python main.py

Method 2: Modify main.py

# In main.py, find:
app.launch(
    server_name="0.0.0.0",
    server_port=7860,  # Change this
    share=True
)

Checking Port Availability

# macOS/Linux
lsof -i :7860
netstat -an | grep 7860

# Windows
netstat -ano | findstr :7860

Freeing Up Ports

# Find process using port
lsof -ti:7860

# Kill process
kill -9 $(lsof -ti:7860)

# Or on Windows
# Find PID
netstat -ano | findstr :7860
# Kill using Task Manager or:
taskkill /PID <pid> /F

Local Development Workflow

Daily Development Routine

Start Development Session

# Navigate to project
cd /path/to/multi-modal-academic-research-system

# Activate virtual environment
source venv/bin/activate

# Start OpenSearch (if not running)
docker start opensearch-node1

# Verify OpenSearch
curl -X GET "http://localhost:9200/_cluster/health"

# Start application
python main.py

Make Changes
- Edit code in your preferred IDE
- Save changes
- Restart application to see changes

View Logs

# Tail application logs
tail -f logs/research_assistant.log

# View OpenSearch logs
docker logs -f opensearch-node1

End Development Session

# Stop application (Ctrl+C)

# Optionally stop OpenSearch
docker stop opensearch-node1

# Deactivate virtual environment
deactivate

Development Tips

Auto-Reload During Development

Use a file watcher to restart on changes:

# Install watchdog
pip install watchdog

# Create restart script
echo '#!/bin/bash
while true; do
  python main.py
  sleep 2
done' > run_dev.sh

chmod +x run_dev.sh
./run_dev.sh

Debugging

Enable debug logging in .env:
```
LOG_LEVEL=DEBUG
```
Or use Python debugger:
```
# Add to code
import pdb; pdb.set_trace()
```

Testing Changes

# Run with test index
export OPENSEARCH_INDEX_NAME=test_index
python main.py

Troubleshooting

Common Issues and Solutions

1. Port Already in Use

Error:

OSError: [Errno 48] Address already in use

Solution:

# Find and kill process on port 7860
lsof -ti:7860 | xargs kill -9

# Or change port
export GRADIO_SERVER_PORT=7861
python main.py

2. OpenSearch Connection Failed

Error:

ConnectionRefusedError: [Errno 61] Connection refused

Solutions:

a) Check if OpenSearch is running:

docker ps | grep opensearch
curl http://localhost:9200

b) Start OpenSearch:

docker start opensearch-node1
# or
docker run -d --name opensearch-node1 -p 9200:9200 -e "discovery.type=single-node" opensearchproject/opensearch:latest

c) Check firewall settings:

# macOS
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add python3
sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock python3

# Linux (UFW)
sudo ufw allow 9200

d) Verify .env configuration:

# Check OPENSEARCH_HOST and OPENSEARCH_PORT
cat .env | grep OPENSEARCH

3. Missing GEMINI_API_KEY

Error:

⚠️  Please set GEMINI_API_KEY in your .env file

Solution:

Get API key from: https://makersuite.google.com/app/apikey
Add to .env:
```
GEMINI_API_KEY=your_actual_key_here
```
Restart application

4. Module Not Found Errors

Error:

ModuleNotFoundError: No module named 'opensearch'

Solution:

# Verify virtual environment is activated
which python
# Should show path inside venv/

# Reinstall dependencies
pip install -r requirements.txt

# Check installed packages
pip list | grep opensearch

5. Permission Denied Errors

Error:

PermissionError: [Errno 13] Permission denied: 'data/papers'

Solution:

# Fix directory permissions
chmod -R 755 data/
chmod -R 755 logs/

# Or recreate directories
rm -rf data/ logs/
mkdir -p data/{papers,videos,podcasts,processed} logs/

6. Out of Memory Errors

Error:

MemoryError: Unable to allocate array

Solutions:

a) Reduce batch size in configuration b) Use smaller embedding model c) Increase system swap space d) Process fewer documents at once

7. Gradio Not Accessible

Error: Cannot access http://localhost:7860

Solutions:

a) Check if application is running:

ps aux | grep python

b) Check if port is listening:

netstat -an | grep 7860
lsof -i :7860

c) Try different server name:

# In main.py
app.launch(
    server_name="127.0.0.1",  # Instead of 0.0.0.0
    server_port=7860,
    share=False
)

d) Check firewall:

# macOS
sudo pfctl -d  # Disable firewall temporarily

# Linux
sudo ufw status
sudo ufw allow 7860

8. Slow Performance

Symptoms:

Slow search queries
UI freezing
Long processing times

Solutions:

a) Check OpenSearch health:

curl http://localhost:9200/_cluster/health?pretty

b) Monitor resource usage:

# CPU and memory
top
htop

# Docker resources
docker stats opensearch-node1

c) Optimize OpenSearch:

# Increase heap size
docker run -d \
  --name opensearch-node1 \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_JAVA_OPTS=-Xms2g -Xmx2g" \
  opensearchproject/opensearch:latest

d) Clear old data:

# Delete old indices
curl -X DELETE "http://localhost:9200/old_index_name"

# Clear cache
curl -X POST "http://localhost:9200/_cache/clear"

9. SSL Certificate Errors

Error:

SSLError: certificate verify failed

Solution:

# In .env file
OPENSEARCH_USE_SSL=false
OPENSEARCH_VERIFY_CERTS=false

10. Application Crashes on Startup

Solutions:

a) Check logs:

tail -100 logs/research_assistant.log

b) Verify Python version:

python --version  # Should be 3.9+

c) Clean reinstall:

# Remove virtual environment
rm -rf venv/

# Recreate
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

d) Check for conflicting processes:

# Kill all python processes
pkill -f python

# Restart clean
python main.py

Getting Help

If issues persist:

Check logs: logs/research_assistant.log
Enable debug logging: Set LOG_LEVEL=DEBUG in .env
Check GitHub issues: Search for similar problems
Stack traces: Include full error messages when reporting issues

Useful Commands Reference

# Check application status
ps aux | grep "python main.py"

# Monitor logs in real-time
tail -f logs/research_assistant.log

# Check OpenSearch status
curl http://localhost:9200/_cat/health?v

# List all indices
curl http://localhost:9200/_cat/indices?v

# Check index document count
curl http://localhost:9200/research_assistant/_count

# Monitor system resources
top -o cpu  # Sort by CPU
top -o mem  # Sort by memory

# Test Gemini API connection
curl -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"Hello"}]}]}' \
  "https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent?key=YOUR_API_KEY"

Next Steps

Review Docker Deployment for containerized setup
See OpenSearch Setup for advanced configuration
Check Production Considerations before deploying to production

FilesExpand file tree

local.md

Latest commit

History

local.md

File metadata and controls

Local Deployment Guide

Table of Contents

Prerequisites

System Requirements

Required Software

Initial Setup

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Create Data Directories

Environment Configuration

1. Create Environment File

2. Configure Environment Variables

3. Obtain Gemini API Key

Running the Application

1. Start OpenSearch

2. Start the Application

3. Access the Application

Running Multiple Instances

Method 1: Different Ports

Method 2: Different Virtual Environments

Method 3: Different OpenSearch Indices

Port Configuration

Understanding Port Usage

Changing the Gradio Port

Checking Port Availability

Freeing Up Ports

Local Development Workflow

Daily Development Routine

Development Tips

Troubleshooting

Common Issues and Solutions

1. Port Already in Use

2. OpenSearch Connection Failed

3. Missing GEMINI_API_KEY

4. Module Not Found Errors

5. Permission Denied Errors

6. Out of Memory Errors

7. Gradio Not Accessible

8. Slow Performance

9. SSL Certificate Errors

10. Application Crashes on Startup

Getting Help

Useful Commands Reference

Next Steps