This guide explains how to use the enhanced data loading capabilities in the Quantum Geometric Learning library.
The data loading system provides:
- Streaming support for large datasets
- GPU acceleration and caching
- Performance monitoring
- Memory optimization
- Parallel processing
Here's a simple example of loading a dataset:
// Configure memory management
memory_config_t memory_config = {
.streaming = false,
.max_memory = 1024 * 1024 * 1024, // 1GB limit
.gpu_cache = false,
.compress = false
};
// Configure dataset loading
dataset_config_t config = {
.format = DATA_FORMAT_CSV,
.csv_config = {
.delimiter = ",",
.has_header = true
},
.memory = memory_config,
.normalize = true,
.normalization_method = NORMALIZATION_ZSCORE
};
// Load dataset
dataset_t* dataset = quantum_load_dataset("data/features.csv", config);For large datasets that don't fit in memory:
memory_config_t config = {
.streaming = true,
.chunk_size = 1024 * 1024, // 1MB chunks
.max_memory = 4ULL * 1024 * 1024 * 1024 // 4GB limit
};To utilize GPU for data loading and preprocessing:
memory_config_t config = {
.gpu_cache = true,
.max_memory = 4ULL * 1024 * 1024 * 1024
};
performance_config_t perf_config = {
.num_workers = 4,
.prefetch_size = 2,
.pin_memory = true
};
quantum_configure_memory(config);
quantum_configure_performance(perf_config);Track data loading performance:
performance_metrics_t metrics;
quantum_get_performance_metrics(&metrics);
printf("Load time: %.2f seconds\n", metrics.load_time);
printf("Memory usage: %.2f MB\n", metrics.memory_usage / (1024.0 * 1024.0));
printf("Throughput: %.2f MB/s\n", metrics.throughput / (1024.0 * 1024.0));streaming: Enable streaming mode for large datasetschunk_size: Size of data chunks when streamingmax_memory: Maximum memory usage limitgpu_cache: Enable GPU cachingcompress: Enable data compression
num_workers: Number of worker threadsprefetch_size: Number of chunks to prefetchcache_size: Size of memory cachepin_memory: Pin memory for GPU transfersprofile: Enable performance profiling
format: Data format (CSV, NUMPY, HDF5, IMAGE)csv_config: CSV-specific settingsnormalize: Enable data normalizationnormalization_method: Normalization method to use
-
Memory Management
- Use streaming mode for datasets larger than available RAM
- Enable compression for large datasets with repetitive patterns
- Configure max_memory based on system resources
-
Performance Optimization
- Enable GPU cache for GPU-accelerated workloads
- Adjust num_workers based on CPU cores
- Use prefetching to reduce I/O latency
-
Error Handling
- Always check return values for error conditions
- Monitor performance metrics for bottlenecks
- Clean up resources properly
- Configure memory and performance settings
- Load dataset with appropriate configuration
- Split dataset into training/validation/test sets
- Monitor performance metrics
- Clean up resources when done
See examples/advanced/ai/quantum_data_pipeline_example.c for a complete example.
-
Streaming Mode
- Ideal for datasets > 75% of available RAM
- Adjust chunk_size based on memory constraints
- Enable compression for network storage
-
GPU Acceleration
- Enable for datasets < GPU memory size
- Use pin_memory for faster transfers
- Configure prefetch_size based on GPU memory
-
Memory Usage
- Monitor memory_usage metric
- Adjust max_memory based on system
- Enable compression for large datasets
Always check return values:
if (!quantum_configure_memory(memory_config)) {
fprintf(stderr, "Failed to configure memory\n");
return 1;
}
dataset_t* dataset = quantum_load_dataset(path, config);
if (!dataset) {
fprintf(stderr, "Failed to load dataset\n");
return 1;
}Always clean up resources:
// Clean up dataset
quantum_dataset_destroy(dataset);
// Reset performance metrics if needed
quantum_reset_performance_metrics();