Solidigm SSDs enhance scalable retrieval-augmented generation (RAG) databases and AI inference workloads by offering speed, reliability, and cost efficiency. They enable faster, more accurate AI outputs and allow enterprises to handle larger datasets while reducing the need for expensive GPU and System Memory upgrades, optimizing the total cost of ownership (TCO).
By offloading RAG workloads i.e, vectorDB indexing and large language model generation to SSDs, businesses achieve faster results and improved energy efficiency compared to memory-bound solutions. These high-performance SSDs are engineered for AI inference, delivering exceptional performance and minimizing power consumption for optimal speed and cost.
- Ubuntu version 22.04.5 LTS or higher
- Docker version 25.0.3 or higher
-
Storage: SOLIDIGM D7-PS1010 (SB5PH27X153T) NVMe SSD (15.36TB)
-
Compute:
- CPU: AMD EPYC 9554 64-Core Processor
- GPU: 4x NVIDIA L40S GPUs
Note
This tool has been validated with the following configuration:
- AMD EPYC 9554 64-Core with 4x NVIDIA L40S GPUs
- Ubuntu version 22.04.5 LTS
- Docker version 25.0.3
- SOLIDIGM D7-PS1010 (SB5PH27X153T) NVMe SSD (15.3TB)
The solution utilizes vLLM serving.
-
Download the required models from HuggingFace.
i. Install the HuggingFace CLI
pip3 install -U "huggingface_hub[cli]"ii. Download the Phi-3-vision vision language model
huggingface-cli download microsoft/Phi-3-vision-128k-instruct
iii. Download the GTE-Qwen2 embedding model
huggingface-cli download Alibaba-NLP/gte-Qwen2-1.5B-instruct
iv. Download the Llama 3.3 70B large language model for generation
huggingface-cli download meta-llama/Llama-3.3-70b
Update the appropriate values for the environment variables in the file ./solidigm/.env by referring to the table below:
| Variable Name | Value | Description |
|---|---|---|
| HF_CACHE_DIR | /mnt/ssd/cache | Storage location for model cache files on the SSD under test |
| VECTOR_DB_OFFLOAD | /mnt/ssd/offload/vector_db | Swap space for vector database operations on the SSD under test |
| OFFLOAD_ROOT | /mnt/ssd/offload/inference | Directory for inference offload operations on the SSD under test |
| STACK_ROOT | /mnt/ssd/stack | Directory for storing video and processing data on the SSD under test |
| HF_TOKEN | Refer HuggingFace User Access Tokens | Token for accessing Hugging Face models |
| IMAGE_SERVER_UR | http://localhost:8100/backend/output/images |
Report server image endpoint URL. Change the localhost to the hosted endpoint. |
Before starting the solution, you need to create the required directories for model cache, vector database, inference offload, and video processing:
# Create directories for model cache and offload
sudo mkdir -p /mnt/ssd/cache
sudo mkdir -p /mnt/ssd/offload/vector_db
sudo mkdir -p /mnt/ssd/offload/inference
sudo mkdir -p /mnt/ssd/stack
# Set appropriate permissions
sudo chmod -R 777 /mnt/ssd/cache
sudo chmod -R 777 /mnt/ssd/offload
sudo chmod -R 777 /mnt/ssd/stackNote
Make sure the SSD is properly mounted at /mnt/ssd before creating these directories. If your SSD is mounted at a different location, update the paths in the .env file accordingly.
The following environment variables can be configured for fine-tuning the system's performance and behavior:
| Variable Name | Value | Description |
|---|---|---|
| DEFAULT_FPS | 1.0 | Default frames per second for video processing |
| DEFAULT_WIDTH | 640 | Default video width |
| DEFAULT_HEIGHT | 480 | Default video height |
| WORKER_PROCESSES | 4 | Number of worker processes |
| VISION_SERVER_URL | http://vllm-serving:8000 | Vision server endpoint |
| VISION_BATCH_SIZE | 128 | Batch size for vision processing |
| VISION_MAX_CONCURRENT | 8 | Maximum concurrent vision processes |
| EMBEDDING_ENDPOINT | http://embedding-service:80 | Endpoint for embedding service |
| MILVUS_ENDPOINT | http://standalone:19530 | Milvus vector database endpoint |
| SECRET_KEY | 1234567890 | Authentication secret key |
Note
Make sure all Setup and Configuration steps are completed.
-
First, build all the service under the Solidigm by following the below command.
cd solidigm sudo docker compose build -
Once the build is completed, start the all services by running the below command
sudo docker compose up -d
-
Make sure that all the services are running by checking the status.
sudo docker compose ps
Note
Refer back to the Environment variables configurations if you are facing issues running the solution.
Important
Please be aware that after deployment, the service may take several minutes to fully start. During this initialization period, you might encounter:
- Partially Functional Webpage: The webpage may not fully load or display all components correctly.
- Temporary Errors: Error messages or "Service Unavailable" notices might appear.
These issues are expected and should resolve automatically as the service completes its startup process. If problems persist beyond the expected startup time, please check the service logs for any errors or misconfigurations.
Note
Access to port 8100 is necessary for the interface to work properly. If needed, port forwarding can be used.
Navigate to localhost:8100/login to open the Solution Interface on browser.
The Solution Interface provides an intuitive dashboard for video analysis and processing. Here's how to use it:
- Upload Area: Drop your videos in the designated area on the left panel
- Supported formats: MP4, AVI, MOV, MKV
- Upload Process:
- Click "UPLOAD VIDEOS" button to initiate upload
- Once uploaded, click "BEGIN ANALYSIS" to start processing
- A unique job ID is created for tracking your analysis
Note
For testing purposes, you can use sample video datasets from the Urban Tracker Dataset. The dataset includes various urban traffic videos with different resolutions and frame rates. Please refer to the website for detailed information about video formats, annotations, and licensing terms.
The dashboard shows a 5-step pipeline with real-time status:
-
Processing Videos (Step 1)
- Initial video processing and frame extraction
- Green color indicates completion
-
Summarizing Content (Step 2)
- Content analysis and summarization
- Purple color indicates active processing
-
Generating Embeddings (Step 3)
- Vector embedding generation
- Gray indicates pending status
-
Querying Insights (Step 4)
- Information retrieval and analysis
- Updates to gray when pending
-
Generating Report (Step 5)
- Final report compilation
- Last step in the pipeline
-
Bandwidth Metrics
- Data Read Bandwidth: Real-time monitoring of read operations for both SSDs
- Data Write Bandwidth: Real-time monitoring of write operations for both SSDs
-
Storage Metrics
- SSD Usage: Percentage utilization monitoring for both SSDs
- SSD Temperature: Operational temperature monitoring
-
System Usage
- CPU Usage: Real-time CPU utilization monitoring
- Memory Usage: System memory utilization tracking
-
Power Consumption
- System Power: Total system power consumption
- CPU Power: Processor power usage monitoring
- GPU Power: Graphics processor power usage
-
Temperature Monitoring
- CPU Temperature: Processor temperature monitoring
- GPU Temperature: Graphics processor temperature monitoring
- Query Performance
- Queries per second
- Search latency
- Toggle Options
- VectorDB with SSD Offload: Enable/disable SSD-based vector storage
- LLM with GPU Offload: Toggle GPU acceleration for language models
- Additional processing parameters available for customization
To stop and remove all the containers, networks, and volumes created by docker compose up, run:
sudo docker compose down -vNote
The generation server may take several minutes to shut down completely due to its large model size and cleanup processes. This is normal behavior and you should wait for the process to complete before proceeding.


