A ready-to-use, out-of-the-box library for training scalable, two-tower network based recommendation systems. Built for accessibility and extensibility, this system can be adapted to any use case with minimal configuration.
- Universal: Works with any recommendation use case - e-commerce, content, social media, etc.
- Two-tower Architecture: Follows industry standard two-tower model architecture for user and items
- Highly Scalable: Designed to handle datasets from thousands to millions of interactions
- Accessible: Simple JSON configuration - no complex setup required
- Extensible: Modular architecture allows easy customization and feature expansion
- Production Ready: Includes training, inference, and model persistence out of the box
recommendkit/
βββ configs/ # Configuration files
β βββ correlated_dataset_config.json # SimpleFusion config
β βββ correlated_dataset_attention_config.json # Attention-based config
βββ datasets/
β βββ synthetic/ # Synthetic dataset generation
β βββ generate_correlated_dataset.py # Dataset generator script
β βββ correlated_dataset.json # Generated dataset
βββ encoders/ # Feature encoders (modular structure)
β βββ text/ # Text encoders (transformer, word2vec)
β βββ image/ # Image encoders (CNN, ResNet, ViT)
β βββ categorical/ # Categorical encoders (hash-based)
β βββ continuous/ # Continuous encoders (MLP-based)
β βββ base_encoder.py # Base encoder class
βββ encoders/temporal_encoder.py # Temporal/sequence encoder (user interaction history)
βββ interaction/ # Feature fusion and interaction modeling
βββ classifier/ # Classification heads and loss functions
βββ trainer/ # Training pipeline and data loading
βββ train.py # Main training script
βββ inference.py # Inference script
βββ quickstart.ipynb # Interactive quickstart notebook
The system uses an intuitive JSON format that supports multiple feature types:
{
"user_data": [
{
"user_id": 1,
"image": {"profile_pic": "/path/to/image.jpg"},
"text": {"bio": "User description", "summary": "Short summary"},
"categorical": {"country": "USA", "gender": "male"},
"continuous": {"age": 25.0, "income": 50000.0},
"temporal": {"prev_interactions": [1, 2, 3], "session_times": [5, 10, 15]}
}
],
"item_data": [
{
"item_id": 101,
"image": {"main_image": "/path/to/item.jpg"},
"text": {"title": "Product Name", "description": "Product description"},
"categorical": {"category": "electronics", "brand": "BrandName"},
"continuous": {"price": 99.99, "rating": 4.5},
"temporal": {"price_history": [99.99, 89.99], "view_counts": [10, 20, 30]}
}
],
"interactions": [
{"user_id": 1, "item_id": 101, "interaction_type": "purchase", "timestamp": "2024-01-15T10:30:00"}
]
}The system automatically handles positive and negative sample generation:
- Positive Samples: Extracted from your interaction data (purchases, clicks, views, etc.)
- Negative Samples: Intelligently generated using random sampling from non-interacted items
- Balanced Training: Configurable positive/negative ratios for optimal model performance
- No Data Leakage: Ensures users don't see items they've already interacted with in negative samples
The easiest way to get started is with our interactive Jupyter notebook that walks you through the entire process:
jupyter notebook quickstart.ipynbThe notebook covers:
- Data Loading: Load and explore user and item features with sample data points
- Configuration: Understand SimpleFusion config parameters
- Model Training: Train a recommendation model step-by-step
- Inference: Generate personalized recommendations directly in the notebook
Perfect for learning how RecommendKit works! π
If you want to test the system with synthetic data, generate a correlated dataset:
cd datasets/synthetic
python3 generate_correlated_dataset.py --num_users 1000 --num_items 100 --output correlated_dataset.jsonThis creates a realistic dataset with:
- 1000 users with diverse occupations, locations, ages, and salaries
- 100+ items across multiple categories (tech, medical, kitchen, etc.)
- Perfect correlations between user attributes and item preferences (e.g., software engineers prefer tech items, chefs prefer kitchen items)
- Temporal interaction history for each user
python3 train.py --config configs/correlated_dataset_config.json --data datasets/synthetic/correlated_dataset.jsonThis command will:
- Load your data and configuration
- Automatically generate positive/negative samples
- Train the recommendation model
- Save the trained model for inference
python3 inference.py --model_path models/your_trained_model.pth --config configs/correlated_dataset_config.json --data test_input.jsonThe inference script provides:
- User Embeddings: Generate vector representations for users
- Item Embeddings: Generate vector representations for items
- Similarity Scores: Calculate user-item compatibility scores
- Top-K Recommendations: Get ranked item recommendations for users
Built with flexible two-tower fusion architecture that adapts to your needs:
The system uses a two-tower architecture where user and item features are processed independently through their respective towers, then combined for interaction modeling and final classification.
- Optimized for Speed: Concatenation + MLP approach for fast training and inference
- Stable Training: No transformer collapse issues, clean gradients
- Small Feature Sets: Perfect for 2-4 features per entity
- Production Ready: Minimal computational overhead
- Complex Interactions: Transformer-based feature fusion for rich feature sets
- Scalable: Handles dozens of features with learned attention weights
- Flexible: Adaptive feature importance based on context
Switch between fusion methods directly in your config file - no code changes needed!
SimpleFusion Configuration:
{
"user_use_simple_fusion": true,
"item_use_simple_fusion": true,
"interaction_use_simple_fusion": true
}Attention-Based Fusion Configuration:
{
"user_use_simple_fusion": false,
"user_num_attention_layers": 2,
"user_num_heads": 8,
"user_dropout": 0.1,
"user_use_cls_token": true,
"item_use_simple_fusion": false,
"item_num_attention_layers": 2,
"item_num_heads": 8,
"item_dropout": 0.1,
"interaction_use_simple_fusion": false,
"interaction_num_attention_layers": 2,
"interaction_num_heads": 8,
"interaction_dropout": 0.1
}See configs/correlated_dataset_config.json (SimpleFusion) and configs/correlated_dataset_attention_config.json (Attention) for complete examples!
The system is designed for maximum extensibility:
The system uses a modular encoder architecture organized by feature type:
encoders/
βββ text/ # Text feature encoders
β βββ transformer_encoder.py # HuggingFace transformer models (BERT, RoBERTa, etc.)
β βββ word2vec_encoder.py # Word2Vec/FastText/GloVe models
β βββ factory.py # Auto-detects transformer vs word2vec from model_name
β βββ base_text_encoder.py # Base class for text encoders
βββ image/ # Image feature encoders
β βββ cnn_encoder.py # CNN-based image encoder
β βββ vit_encoder.py # Vision Transformer encoder
β βββ factory.py # Creates encoder based on model_type config
β βββ base_image_encoder.py # Base class for image encoders
βββ categorical/ # Categorical feature encoders
β βββ hash_encoder.py # Hash-based embedding encoder
β βββ factory.py # Factory for categorical encoders
β βββ base_categorical_encoder.py
βββ continuous/ # Continuous feature encoders
β βββ mlp_encoder.py # MLP-based continuous encoder
β βββ factory.py # Factory for continuous encoders
β βββ base_continuous_encoder.py
βββ temporal/ # Temporal/sequence encoders
β βββ lstm_temporal_encoder.py # LSTM-based temporal encoder (user interaction history)
β βββ factory.py # Factory for temporal encoders
β βββ base_temporal_encoder.py # Base class for temporal encoders
βββ base_encoder.py # Base class for all encoders
Key Features:
- Factory Pattern: Each encoder type has a factory function that creates encoders from config
- Auto-Detection: Text encoders automatically detect transformer vs word2vec from
model_name - HuggingFace Support: Text encoders support any HuggingFace model via
AutoModel - Modular Design: Easy to add new encoder types or implementations
The text encoder factory automatically detects the encoder type based on the model_name:
Transformer Encoders (any HuggingFace model):
{
"text_encoder_config": {
"model_name": "bert-base-uncased", // Any HF model name or path
"aggregation_strategy": "separate_concat",
"embedding_dim": 256,
"max_length": 512,
"pooling_strategy": "cls",
"freeze_bert": false
}
}Supported models:
- Standard HF models:
"bert-base-uncased","distilbert-base-uncased","roberta-base" - Sentence transformers:
"sentence-transformers/all-MiniLM-L6-v2" - Custom paths:
"/path/to/local/model"or"username/model-name"
Word2Vec Encoders (detected automatically):
{
"text_encoder_config": {
"model_name": "glove-wiki-gigaword-50", // Auto-detected as word2vec
"aggregation_strategy": "mean",
"embedding_dim": 64
}
}Choose between CNN, ResNet (lightweight), or Vision Transformer:
{
"image_encoder_config": {
"model_type": "resnet", // "cnn", "resnet", or "vit"
"model_name": "resnet18", // For ResNet: "resnet18", "resnet34", "resnet50", etc.
"aggregation_strategy": "concat",
"embedding_dim": 256,
"pretrained": true, // For ResNet/ViT
"num_cnn_layers": 3 // For CNN only
}
}ResNet (Recommended for lightweight use):
- Fast inference with pretrained ImageNet weights
- Supports ResNet18/34/50/101/152 variants
- ResNet18 is ~11M parameters (much lighter than ViT)
- Default choice for production deployments
CNN:
- Custom lightweight architecture
- No pretrained weights
- Good for small datasets or custom architectures
ViT:
- Vision Transformer for high accuracy
- Larger model size (~86M parameters for ViT-B)
- Best for complex visual understanding tasks
To add a custom encoder:
- Create encoder class inheriting from the appropriate base class:
from encoders.base_encoder import BaseEncoder
class MyCustomEncoder(BaseEncoder):
def __init__(self, embedding_dim: int):
super().__init__(embedding_dim)
# Your encoder architecture
def forward(self, input_data):
# Process input and return {"features": tensor}
return {"features": encoded_tensor}- Add factory function in the appropriate subdirectory:
# encoders/mytype/factory.py
def create_mytype_encoder(config: Dict[str, Any]):
return MyCustomEncoder(
embedding_dim=config.get('embedding_dim', 256)
)- Integrate into pipeline - the system will automatically detect and use it!
- Feature Encoders:
encoders/- Modular structure for text, image, categorical, continuous, temporal - Fusion Layers:
interaction/feature_fusion.py- Custom feature combination strategies - Interaction Models:
interaction/interaction_modeling.py- User-item interaction architectures - Classification Heads:
classifier/- Custom loss functions and output layers
- Standard Interfaces: All components follow consistent input/output contracts
- Auto-Discovery: New encoders are automatically detected and integrated
- Config-Driven: Add new components without touching core training code
- Backward Compatible: Extensions don't break existing functionality
- Custom Loss Functions: Implement ranking losses, contrastive learning, etc.
- Multi-Task Learning: Add auxiliary prediction tasks
- Domain-Specific Features: Industry-specific encoders (NLP, computer vision, time series)
- Distributed Training: Scale across multiple GPUs and nodes
Designed to grow with your needs:
- Memory Efficient: Optimized data loading and batch processing
- GPU Accelerated: Full CUDA support for faster training
- Distributed Ready: Architecture supports multi-GPU and distributed training
- Production Deployment: Easy integration with serving frameworks
Ready to build your recommendation system? Start with SimpleFusion for quick results, then scale to attention-based fusion as your feature complexity grows! π―
