| title | Contributing Guide |
|---|---|
| description | How to contribute to SuperML Java - guidelines, setup, and best practices |
| layout | default |
| toc | true |
| search | true |
Thank you for your interest in contributing to SuperML Java! This guide will help you get started with contributing to the project.
- Fork the repository
- Create a new branch
- Open a pull request (PR)
- Describe the issue being fixed or feature added
- Wait for review and approval
We encourage clean, modular, well-documented code.
- New Algorithms: Implement additional ML algorithms
- Performance Improvements: Optimize existing algorithms
- Bug Fixes: Fix issues and improve stability
- Features: Add new functionality and capabilities
- API Documentation: Improve method and class documentation
- Tutorials: Create guides and examples
- Wiki Pages: Add comprehensive documentation
- Code Comments: Improve code readability
- Unit Tests: Add tests for new features
- Integration Tests: Test component interactions
- Performance Tests: Benchmark and profiling
- Edge Case Testing: Test boundary conditions
- Code Review: Review pull requests
- Issue Reporting: Report bugs and improvements
- Feature Requests: Suggest new functionality
- Performance Analysis: Identify bottlenecks
# Clone the repository
git clone https://github.com/supermlorg/superml-java.git
cd superml-java
# Build the project
mvn clean compile
# Run tests (excluding WIP modules)
mvn test -pl '!superml-kaggle,!superml-onnx,!superml-neural'
# Generate documentation
mvn javadoc:javadocsuperml-java/
├── superml-core/ # Base interfaces (BaseEstimator, Estimator, etc.)
├── superml-linear-models/ # LogisticRegression, Ridge, Lasso, etc.
├── superml-tree-models/ # DecisionTree, RandomForest, XGBoost, etc.
├── superml-clustering/ # KMeans
├── superml-preprocessing/ # StandardScaler, MinMaxScaler, etc.
├── superml-datasets/ # Built-in datasets and data loading
├── superml-metrics/ # Evaluation metrics
├── superml-pipeline/ # Pipeline chaining
├── superml-model-selection/ # CrossValidation, GridSearch, HyperparameterTuning
├── superml-autotrainer/ # AutoML framework
├── superml-persistence/ # Model save/load
├── superml-inference/ # Production inference engine
├── superml-visualization/ # XChart GUI + ASCII charts
├── superml-transformers/ # Transformer models
├── superml-pmml/ # PMML 4.4 export
├── superml-drift/ # Drift detection
├── superml-utils/ # Shared utilities
├── superml-bundle-all/ # All-in-one bundle
├── superml-examples/ # Example programs
├── superml-testcases/ # Integration test suite (172+ tests)
├── docs/ # Documentation
└── pom.xml # Parent Maven POM
- Use camelCase for variables and methods
- Use PascalCase for classes and interfaces
- Use ALL_CAPS for constants
- Maximum line length: 120 characters
- Indentation: 4 spaces (no tabs)
// Good: Descriptive and follows conventions
public double calculateMeanSquaredError(double[] actual, double[] predicted)
// Bad: Unclear abbreviations
public double calcMSE(double[] a, double[] p)/**
* Trains a logistic regression model using gradient descent.
*
* @param X Feature matrix with shape (n_samples, n_features)
* @param y Target values with shape (n_samples,)
* @return This estimator for method chaining
* @throws IllegalArgumentException if X and y have incompatible shapes
*/
public LogisticRegression fit(double[][] X, double[] y) {
// Implementation
}Create tests in the corresponding test package:
src/test/java/com/superml/
├── core/ # Core interface tests
├── linear_model/ # Algorithm tests
│ ├── LogisticRegressionTest.java
│ └── LinearRegressionTest.java
└── utils/ # Test utilities
└── TestDatasets.java
Test individual methods and components:
@Test
void testFitWithValidData() {
// Arrange
double[][] X = { {1, 2}, {3, 4}, {5, 6} };
double[] y = {0, 1, 0};
var model = new LogisticRegression();
// Act
model.fit(X, y);
// Assert
assertTrue(model.isFitted());
assertNotNull(model.getCoefficients());
}
@Test
void testPredictThrowsWhenNotFitted() {
// Arrange
double[][] X = { {1, 2}, {3, 4} };
var model = new LogisticRegression();
// Act & Assert
assertThrows(ModelNotFittedException.class, () -> model.predict(X));
}Test component interactions:
@Test
void testPipelineWithScalerAndClassifier() {
// Arrange
var dataset = TestDatasets.makeClassification(100, 5, 2);
var pipeline = new Pipeline()
.addStep("scaler", new StandardScaler())
.addStep("classifier", new LogisticRegression());
// Act
pipeline.fit(dataset.X, dataset.y);
double[] predictions = pipeline.predict(dataset.X);
// Assert
assertEquals(dataset.X.length, predictions.length);
double accuracy = Metrics.accuracy(dataset.y, predictions);
assertTrue(accuracy > 0.8, "Pipeline should achieve reasonable accuracy");
}Test algorithm efficiency:
@Test
void testTrainingPerformance() {
// Large dataset
var dataset = TestDatasets.makeClassification(10000, 20, 2);
var model = new LogisticRegression();
long startTime = System.currentTimeMillis();
model.fit(dataset.X, dataset.y);
long trainingTime = System.currentTimeMillis() - startTime;
// Should complete within reasonable time
assertTrue(trainingTime < 5000, "Training should complete within 5 seconds");
}Create reusable test utilities:
public class TestDatasets {
public static Dataset makeClassification(int samples, int features, int classes) {
return makeClassification(samples, features, classes, 42);
}
public static Dataset makeClassification(int samples, int features, int classes, int seed) {
Random random = new Random(seed);
double[][] X = new double[samples][features];
double[] y = new double[samples];
// Generate synthetic data
for (int i = 0; i < samples; i++) {
for (int j = 0; j < features; j++) {
X[i][j] = random.nextGaussian();
}
y[i] = random.nextInt(classes);
}
return new Dataset(X, y);
}
}package org.superml.linear_model;
import org.superml.core.BaseEstimator;
import org.superml.core.Classifier;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* My New Algorithm implementation.
*
* This algorithm does X by using technique Y.
*
* Parameters:
* - parameter1: Description of parameter1 (default: defaultValue)
* - parameter2: Description of parameter2 (default: defaultValue)
*
* Example:
* <pre>
* var model = new MyNewAlgorithm()
* .setParameter1(value1)
* .setParameter2(value2);
* model.fit(X, y);
* double[] predictions = model.predict(X_test);
* </pre>
*/
public class MyNewAlgorithm extends BaseEstimator implements Classifier {
private static final Logger logger = LoggerFactory.getLogger(MyNewAlgorithm.class);
// Model parameters (learned during training)
private double[] weights;
private double bias;
private double[] classes;
// Hyperparameters
private double parameter1 = 1.0;
private int parameter2 = 100;
private double tolerance = 1e-6;
private int maxIterations = 1000;
private int randomState = -1;
// Construction and configuration
public MyNewAlgorithm() {
// Initialize parameters map for base class
parameters.put("parameter1", parameter1);
parameters.put("parameter2", parameter2);
parameters.put("tolerance", tolerance);
parameters.put("maxIterations", maxIterations);
parameters.put("randomState", randomState);
}
// Fluent interface methods
public MyNewAlgorithm setParameter1(double parameter1) {
this.parameter1 = parameter1;
parameters.put("parameter1", parameter1);
return this;
}
public MyNewAlgorithm setParameter2(int parameter2) {
this.parameter2 = parameter2;
parameters.put("parameter2", parameter2);
return this;
}
// Additional fluent methods...
@Override
protected void updateInternalParameters() {
Object p1 = parameters.get("parameter1");
if (p1 != null) this.parameter1 = ((Number) p1).doubleValue();
Object p2 = parameters.get("parameter2");
if (p2 != null) this.parameter2 = ((Number) p2).intValue();
// Update other parameters...
validateParameters();
}
private void validateParameters() {
if (parameter1 <= 0) {
throw new IllegalArgumentException("parameter1 must be positive");
}
if (parameter2 <= 0) {
throw new IllegalArgumentException("parameter2 must be positive");
}
}
@Override
public MyNewAlgorithm fit(double[][] X, double[] y) {
validateInput(X, y);
validateParameters();
logger.info("Training {} with {} samples and {} features",
getClass().getSimpleName(), X.length, X[0].length);
// Initialize model state
initializeModel(X, y);
// Main training algorithm
trainModel(X, y);
this.fitted = true;
logger.info("Training completed in {} iterations", /* actual iterations */);
return this;
}
@Override
public double[] predict(double[][] X) {
checkFitted();
validateInput(X);
double[] predictions = new double[X.length];
for (int i = 0; i < X.length; i++) {
predictions[i] = predictSample(X[i]);
}
return predictions;
}
@Override
public double[][] predictProba(double[][] X) {
checkFitted();
validateInput(X);
double[][] probabilities = new double[X.length][classes.length];
for (int i = 0; i < X.length; i++) {
probabilities[i] = predictProbaSample(X[i]);
}
return probabilities;
}
@Override
public double[] getClasses() {
checkFitted();
return Arrays.copyOf(classes, classes.length);
}
// Algorithm-specific methods
private void initializeModel(double[][] X, double[] y) {
int features = X[0].length;
this.weights = new double[features];
this.bias = 0.0;
this.classes = findUniqueClasses(y);
// Initialize weights (e.g., random or zeros)
if (randomState != -1) {
Random random = new Random(randomState);
for (int i = 0; i < weights.length; i++) {
weights[i] = random.nextGaussian() * 0.01;
}
}
}
private void trainModel(double[][] X, double[] y) {
// Main training loop
for (int iteration = 0; iteration < maxIterations; iteration++) {
double previousLoss = computeLoss(X, y);
// Update parameters (gradient descent, etc.)
updateParameters(X, y);
// Check convergence
double currentLoss = computeLoss(X, y);
if (Math.abs(previousLoss - currentLoss) < tolerance) {
logger.debug("Converged after {} iterations", iteration + 1);
break;
}
}
}
private double predictSample(double[] sample) {
double logit = bias;
for (int i = 0; i < weights.length; i++) {
logit += weights[i] * sample[i];
}
return logit > 0 ? 1.0 : 0.0; // Simple threshold
}
private double[] predictProbaSample(double[] sample) {
double logit = bias;
for (int i = 0; i < weights.length; i++) {
logit += weights[i] * sample[i];
}
double prob1 = 1.0 / (1.0 + Math.exp(-logit));
return new double[]{1.0 - prob1, prob1};
}
// Utility methods
private double[] findUniqueClasses(double[] y) {
return Arrays.stream(y).distinct().sorted().toArray();
}
private double computeLoss(double[][] X, double[] y) {
// Implement loss function
return 0.0;
}
private void updateParameters(double[][] X, double[] y) {
// Implement parameter update (gradient computation, etc.)
}
// Getters for inspecting trained model
public double[] getWeights() {
checkFitted();
return Arrays.copyOf(weights, weights.length);
}
public double getBias() {
checkFitted();
return bias;
}
@Override
public String toString() {
return String.format("%s(parameter1=%.3f, parameter2=%d, maxIterations=%d)",
getClass().getSimpleName(), parameter1, parameter2, maxIterations);
}
}package org.superml.linear_model;
import org.superml.datasets.TestDatasets;
import org.superml.metrics.Metrics;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.BeforeEach;
import static org.junit.jupiter.api.Assertions.*;
class MyNewAlgorithmTest {
private MyNewAlgorithm algorithm;
private double[][] X;
private double[] y;
@BeforeEach
void setUp() {
algorithm = new MyNewAlgorithm();
var dataset = TestDatasets.makeClassification(100, 5, 2, 42);
X = dataset.X;
y = dataset.y;
}
@Test
void testDefaultConstruction() {
assertNotNull(algorithm);
assertFalse(algorithm.isFitted());
}
@Test
void testParameterManagement() {
algorithm.setParameter1(2.0).setParameter2(200);
Map<String, Object> params = algorithm.getParams();
assertEquals(2.0, (Double) params.get("parameter1"), 1e-6);
assertEquals(200, (Integer) params.get("parameter2"));
}
@Test
void testFitAndPredict() {
algorithm.fit(X, y);
assertTrue(algorithm.isFitted());
double[] predictions = algorithm.predict(X);
assertEquals(X.length, predictions.length);
// Check predictions are valid class labels
double[] classes = algorithm.getClasses();
for (double pred : predictions) {
assertTrue(Arrays.stream(classes).anyMatch(c -> c == pred));
}
}
@Test
void testPredictProba() {
algorithm.fit(X, y);
double[][] probabilities = algorithm.predictProba(X);
assertEquals(X.length, probabilities.length);
assertEquals(2, probabilities[0].length); // Binary classification
// Check probabilities sum to 1
for (double[] probs : probabilities) {
double sum = Arrays.stream(probs).sum();
assertEquals(1.0, sum, 1e-6);
// Check all probabilities are valid
for (double prob : probs) {
assertTrue(prob >= 0.0 && prob <= 1.0);
}
}
}
@Test
void testPerformanceOnSyntheticData() {
algorithm.fit(X, y);
double[] predictions = algorithm.predict(X);
double accuracy = Metrics.accuracy(y, predictions);
// Should achieve reasonable accuracy on synthetic data
assertTrue(accuracy > 0.7, "Algorithm should achieve > 70% accuracy");
}
@Test
void testParameterValidation() {
assertThrows(IllegalArgumentException.class,
() -> algorithm.setParameter1(-1.0));
assertThrows(IllegalArgumentException.class,
() -> algorithm.setParameter2(0));
}
@Test
void testInputValidation() {
// Test null inputs
assertThrows(IllegalArgumentException.class,
() -> algorithm.fit(null, y));
assertThrows(IllegalArgumentException.class,
() -> algorithm.fit(X, null));
// Test mismatched dimensions
double[] wrongY = new double[X.length + 1];
assertThrows(IllegalArgumentException.class,
() -> algorithm.fit(X, wrongY));
}
@Test
void testUnfittedModelThrows() {
assertThrows(ModelNotFittedException.class,
() -> algorithm.predict(X));
assertThrows(ModelNotFittedException.class,
() -> algorithm.predictProba(X));
assertThrows(ModelNotFittedException.class,
() -> algorithm.getClasses());
}
@Test
void testConvergence() {
algorithm.setMaxIterations(10).setTolerance(1e-3);
algorithm.fit(X, y);
// Should still work with limited iterations
double[] predictions = algorithm.predict(X);
assertNotNull(predictions);
}
}- Code compiles without warnings
- All tests pass (
mvn test) - New features have tests with good coverage
- Documentation is updated for new features
- Code follows style guidelines
- Commit messages are clear and descriptive
## Description
Brief description of the changes and their purpose.
## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] All existing tests pass
- [ ] Manual testing completed
## Performance Impact
- [ ] No performance impact
- [ ] Performance improved
- [ ] Performance impact acceptable (explain below)
## Documentation
- [ ] JavaDoc updated
- [ ] README updated
- [ ] Wiki/docs updated
- [ ] Examples added/updated
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Code compiles without warnings
- [ ] Meaningful commit messages- Automated checks must pass (build, tests, style)
- Code review by at least one maintainer
- Documentation review for user-facing changes
- Performance review for algorithm changes
- Final approval and merge
- Start with tests: Write failing tests first (TDD)
- Use synthetic data: Create reproducible test cases
- Benchmark performance: Compare against reference implementations
- Document complexity: Include time/space complexity in JavaDoc
- Handle edge cases: Empty data, single samples, etc.
- Single responsibility: Each class should have one purpose
- Immutable when possible: Prefer immutable data structures
- Fail fast: Validate inputs early and clearly
- Defensive copying: Protect internal state from modification
- Resource management: Use try-with-resources for I/O
- Test behavior, not implementation: Focus on public API
- Use meaningful test names: Test names should describe the scenario
- Arrange-Act-Assert: Structure tests clearly
- Test edge cases: Null inputs, empty data, boundary conditions
- Performance tests: Ensure algorithms scale appropriately
Contributors will be recognized in:
- README: List of contributors
- Release notes: Acknowledgment of contributions
- Documentation: Author attribution for major features
- GitHub: Contributor graphs and statistics
- Be respectful: Treat all contributors with respect
- Be constructive: Provide helpful feedback and suggestions
- Be patient: Reviews take time, especially for complex changes
- Ask questions: Don't hesitate to ask for clarification
- Help others: Review pull requests and answer questions
Thank you for contributing to SuperML Java! Your contributions help make machine learning more accessible to the Java community. 🚀