Solidigm NVMe Optimized RAG

Solidigm SSDs enhance scalable retrieval-augmented generation (RAG) databases and AI inference workloads by offering speed, reliability, and cost efficiency. They enable faster, more accurate AI outputs and allow enterprises to handle larger datasets while reducing the need for expensive GPU and System Memory upgrades, optimizing the total cost of ownership (TCO).

By offloading RAG workloads i.e, vectorDB indexing and large language model generation to SSDs, businesses achieve faster results and improved energy efficiency compared to memory-bound solutions. These high-performance SSDs are engineered for AI inference, delivering exceptional performance and minimizing power consumption for optimal speed and cost.

Pre-requisites

Software Requirements

Ubuntu version 22.04.5 LTS or higher
Docker version 25.0.3 or higher

Hardware Requirements

Storage: SOLIDIGM D7-PS1010 (SB5PH27X153T) NVMe SSD (15.36TB)
Compute:
- CPU: AMD EPYC 9554 64-Core Processor
- GPU: 4x NVIDIA L40S GPUs

Note

This tool has been validated with the following configuration:

AMD EPYC 9554 64-Core with 4x NVIDIA L40S GPUs
Ubuntu version 22.04.5 LTS
Docker version 25.0.3
SOLIDIGM D7-PS1010 (SB5PH27X153T) NVMe SSD (15.3TB)

Building Solution Image

The solution utilizes vLLM serving.

Download the required models from HuggingFace.

i. Install the HuggingFace CLI
```
pip3 install  -U "huggingface_hub[cli]"
```
ii. Download the Phi-3-vision vision language model
```
huggingface-cli download microsoft/Phi-3-vision-128k-instruct
```
iii. Download the GTE-Qwen2 embedding model
```
huggingface-cli download Alibaba-NLP/gte-Qwen2-1.5B-instruct
```
iv. Download the Llama 3.3 70B large language model for generation
```
huggingface-cli download meta-llama/Llama-3.3-70b
```

Setting up the Environment Variables

Update the appropriate values for the environment variables in the file ./solidigm/.env by referring to the table below:

Variable Name	Value	Description
HF_CACHE_DIR	/mnt/ssd/cache	Storage location for model cache files on the SSD under test
VECTOR_DB_OFFLOAD	/mnt/ssd/offload/vector_db	Swap space for vector database operations on the SSD under test
OFFLOAD_ROOT	/mnt/ssd/offload/inference	Directory for inference offload operations on the SSD under test
STACK_ROOT	/mnt/ssd/stack	Directory for storing video and processing data on the SSD under test
HF_TOKEN	Refer HuggingFace User Access Tokens	Token for accessing Hugging Face models
IMAGE_SERVER_UR	`http://localhost:8100/backend/output/images`	Report server image endpoint URL. Change the localhost to the hosted endpoint.

Creating Required Directories

Before starting the solution, you need to create the required directories for model cache, vector database, inference offload, and video processing:

# Create directories for model cache and offload
sudo mkdir -p /mnt/ssd/cache
sudo mkdir -p /mnt/ssd/offload/vector_db
sudo mkdir -p /mnt/ssd/offload/inference
sudo mkdir -p /mnt/ssd/stack

# Set appropriate permissions
sudo chmod -R 777 /mnt/ssd/cache
sudo chmod -R 777 /mnt/ssd/offload
sudo chmod -R 777 /mnt/ssd/stack

Note

Make sure the SSD is properly mounted at /mnt/ssd before creating these directories. If your SSD is mounted at a different location, update the paths in the .env file accordingly.

Advanced Configuration

The following environment variables can be configured for fine-tuning the system's performance and behavior:

Variable Name	Value	Description
DEFAULT_FPS	1.0	Default frames per second for video processing
DEFAULT_WIDTH	640	Default video width
DEFAULT_HEIGHT	480	Default video height
WORKER_PROCESSES	4	Number of worker processes
VISION_SERVER_URL	http://vllm-serving:8000	Vision server endpoint
VISION_BATCH_SIZE	128	Batch size for vision processing
VISION_MAX_CONCURRENT	8	Maximum concurrent vision processes
EMBEDDING_ENDPOINT	http://embedding-service:80	Endpoint for embedding service
MILVUS_ENDPOINT	http://standalone:19530	Milvus vector database endpoint
SECRET_KEY	1234567890	Authentication secret key

Run Solution

Note

Make sure all Setup and Configuration steps are completed.

First, build all the service under the Solidigm by following the below command.
```
cd solidigm
sudo docker compose build
```
Once the build is completed, start the all services by running the below command
```
sudo docker compose up -d
```
Make sure that all the services are running by checking the status.
```
sudo docker compose ps
```

Note

Refer back to the Environment variables configurations if you are facing issues running the solution.

Solution Interface

Important

Please be aware that after deployment, the service may take several minutes to fully start. During this initialization period, you might encounter:

Partially Functional Webpage: The webpage may not fully load or display all components correctly.
Temporary Errors: Error messages or "Service Unavailable" notices might appear.

These issues are expected and should resolve automatically as the service completes its startup process. If problems persist beyond the expected startup time, please check the service logs for any errors or misconfigurations.

Note

Access to port 8100 is necessary for the interface to work properly. If needed, port forwarding can be used.

Navigate to localhost:8100/login to open the Solution Interface on browser.

Dashboard Overview

The Solution Interface provides an intuitive dashboard for video analysis and processing. Here's how to use it:

1. Video Upload and Analysis

Upload Area: Drop your videos in the designated area on the left panel
- Supported formats: MP4, AVI, MOV, MKV
Upload Process:
- Click "UPLOAD VIDEOS" button to initiate upload
- Once uploaded, click "BEGIN ANALYSIS" to start processing
- A unique job ID is created for tracking your analysis

Note

For testing purposes, you can use sample video datasets from the Urban Tracker Dataset. The dataset includes various urban traffic videos with different resolutions and frame rates. Please refer to the website for detailed information about video formats, annotations, and licensing terms.

2. Processing Pipeline

The dashboard shows a 5-step pipeline with real-time status:

Processing Videos (Step 1)
- Initial video processing and frame extraction
- Green color indicates completion
Summarizing Content (Step 2)
- Content analysis and summarization
- Purple color indicates active processing
Generating Embeddings (Step 3)
- Vector embedding generation
- Gray indicates pending status
Querying Insights (Step 4)
- Information retrieval and analysis
- Updates to gray when pending
Generating Report (Step 5)
- Final report compilation
- Last step in the pipeline

3. Real-time Performance Monitoring

Bandwidth Metrics
- Data Read Bandwidth: Real-time monitoring of read operations for both SSDs
- Data Write Bandwidth: Real-time monitoring of write operations for both SSDs
Storage Metrics
- SSD Usage: Percentage utilization monitoring for both SSDs
- SSD Temperature: Operational temperature monitoring

4. System Resource Metrics

System Usage
- CPU Usage: Real-time CPU utilization monitoring
- Memory Usage: System memory utilization tracking
Power Consumption
- System Power: Total system power consumption
- CPU Power: Processor power usage monitoring
- GPU Power: Graphics processor power usage
Temperature Monitoring
- CPU Temperature: Processor temperature monitoring
- GPU Temperature: Graphics processor temperature monitoring

5. VectorDB Performance

Query Performance
- Queries per second
- Search latency

6. Configuration Controls

Toggle Options
- VectorDB with SSD Offload: Enable/disable SSD-based vector storage
- LLM with GPU Offload: Toggle GPU acceleration for language models
- Additional processing parameters available for customization

Stopping the Solution

To stop and remove all the containers, networks, and volumes created by docker compose up, run:

sudo docker compose down -v

Note

The generation server may take several minutes to shut down completely due to its large model size and cleanup processes. This is normal behavior and you should wait for the process to complete before proceeding.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
solidigm		solidigm
LICENSE		LICENSE
README.md		README.md
release_notes.md		release_notes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Solidigm NVMe Optimized RAG

Pre-requisites

Software Requirements

Hardware Requirements

Building Solution Image

Setting up the Environment Variables

Creating Required Directories

Advanced Configuration

Run Solution

Solution Interface

Dashboard Overview

1. Video Upload and Analysis

2. Processing Pipeline

3. Real-time Performance Monitoring

4. System Resource Metrics

5. VectorDB Performance

6. Configuration Controls

Stopping the Solution

About

Uh oh!

Releases

Packages

Languages

License

solidigm/nvme-optimized-rag

Folders and files

Latest commit

History

Repository files navigation

Solidigm NVMe Optimized RAG

Pre-requisites

Software Requirements

Hardware Requirements

Building Solution Image

Setting up the Environment Variables

Creating Required Directories

Advanced Configuration

Run Solution

Solution Interface

Dashboard Overview

1. Video Upload and Analysis

2. Processing Pipeline

3. Real-time Performance Monitoring

4. System Resource Metrics

5. VectorDB Performance

6. Configuration Controls

Stopping the Solution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages