Thank you for your interest in contributing to the NTSB Aviation Accident Database Analysis project! This document provides guidelines for contributing to this repository.
- Code of Conduct
- How Can I Contribute?
- Getting Started
- Development Workflow
- Coding Standards
- Testing Guidelines
- Documentation
- Submitting Changes
- Community
This project adheres to a Code of Conduct that all contributors are expected to follow. Please be respectful, inclusive, and considerate in all interactions.
- Be welcoming and inclusive
- Be respectful of differing viewpoints and experiences
- Gracefully accept constructive criticism
- Focus on what is best for the community
- Show empathy towards other community members
Before creating bug reports, please check the existing issues to avoid duplicates. When creating a bug report, include:
- Clear descriptive title for the issue
- Detailed description of the problem
- Steps to reproduce the behavior
- Expected behavior vs. actual behavior
- Environment details: OS, Python version, Fish shell version
- Error messages or logs (if applicable)
- Screenshots (if relevant)
Enhancement suggestions are tracked as GitHub issues. When suggesting an enhancement:
- Use a clear and descriptive title
- Provide a detailed description of the proposed functionality
- Explain why this enhancement would be useful
- Include examples of how it would work
- Consider implementation details (if applicable)
We welcome contributions in several areas:
-
Analysis Scripts
- Add new Python analysis examples
- Create Jupyter notebooks demonstrating specific techniques
- Develop visualization dashboards (Streamlit, Dash, Plotly)
-
Data Processing Tools
- Improve extraction scripts (Fish shell)
- Add data cleaning/preprocessing utilities
- Optimize query performance
- Add support for new data formats
-
Documentation
- Improve existing documentation
- Add tutorials and how-to guides
- Create example use cases
- Translate documentation (if multilingual support desired)
-
Testing
- Add unit tests for Python scripts
- Create integration tests for Fish shell scripts
- Improve test coverage
-
Infrastructure
- CI/CD improvements
- Docker containerization
- Package management enhancements
Before contributing, ensure you have:
- CachyOS/Arch Linux (or compatible distribution)
- Fish shell installed
- Python 3.11+ with venv
- Git for version control
- A GitHub account
-
Fork the repository on GitHub
-
Clone your fork:
git clone https://github.com/YOUR_USERNAME/NTSB-Dataset_Analysis.git cd NTSB-Dataset_Analysis -
Add upstream remote:
git remote add upstream https://github.com/ORIGINAL_OWNER/NTSB-Dataset_Analysis.git
# Run setup script
./setup.fish
# Activate Python virtual environment
source .venv/bin/activate.fish
# Extract sample data for testing
./scripts/extract_all_tables.fish datasets/avall.mdbmain- Stable production-ready codedevelop- Integration branch for features (if using Git Flow)feature/*- New features or enhancementsbugfix/*- Bug fixesdocs/*- Documentation updatestest/*- Test additions or improvements
# Update your fork
git checkout main
git pull upstream main
# Create feature branch
git checkout -b feature/your-feature-name- Make your changes in the feature branch
- Test your changes thoroughly
- Follow the coding standards (see below)
- Update documentation as needed
- Add or update tests if applicable
Use clear, descriptive commit messages following conventional commits format:
type(scope): brief description
Detailed explanation (if needed)
- Additional details
- Breaking changes (if any)
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, no logic change)refactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks, dependencies, etc.
Examples:
feat(scripts): add parallel extraction for large databases
Implement parallel table extraction to improve performance when
processing multiple large MDB files simultaneously.
- Uses Fish shell background jobs
- Reduces extraction time by ~60% for avall.mdb
- Maintains compatibility with existing scripts
fix(analysis): correct fatal accident count calculation
The previous calculation included non-fatal injuries.
Now properly filters for inj_tot_f > 0 only.
Fixes #123
docs(readme): update installation instructions for macOS
Add specific instructions for Homebrew installation on macOS,
including mdbtools formula and Python setup.
- Follow PEP 8 style guide
- Use type hints for function parameters and return values
- Write docstrings for all functions and classes (Google or NumPy style)
- Keep functions focused and small (single responsibility)
- Use meaningful variable names
- Add comments for complex logic
Example:
def calculate_accident_rate(events: pd.DataFrame, year: int) -> float:
"""
Calculate the accident rate for a specific year.
Args:
events: DataFrame containing event records with 'ev_year' column
year: The year to calculate the rate for
Returns:
Accident rate as a float (accidents per 1000 flight hours)
Raises:
ValueError: If year is not present in the dataset
"""
year_events = events[events['ev_year'] == year]
if len(year_events) == 0:
raise ValueError(f"No events found for year {year}")
return len(year_events) / 1000.0 # Simplified calculation- Use clear variable names
- Add comments explaining complex logic
- Include usage examples in script headers
- Handle errors gracefully with informative messages
- Test scripts on fresh environments
Example:
#!/usr/bin/env fish
# Extract a single table from an MDB database
# Usage: ./extract_table.fish <database.mdb> <table_name>
# Example: ./extract_table.fish datasets/avall.mdb events
# Check arguments
if test (count $argv) -ne 2
echo "Error: Requires exactly 2 arguments"
echo "Usage: $argv[0] <database.mdb> <table_name>"
exit 1
end
set db_file $argv[1]
set table_name $argv[2]
# Validate database file exists
if not test -f $db_file
echo "Error: Database file not found: $db_file"
exit 1
end
# Extract table
mdb-export $db_file $table_name > "data/$table_name.csv"- Use Markdown for all documentation
- Include code examples where appropriate
- Keep documentation up-to-date with code changes
- Use clear headings and table of contents for long documents
- Add links to related documentation
Use pytest for Python testing:
# Install pytest
pip install pytest pytest-cov
# Run tests
pytest tests/
# Run with coverage
pytest --cov=. tests/Test Fish scripts manually:
# Test extraction script
./scripts/extract_table.fish datasets/avall.mdb events
# Verify output
ls -lh data/events.csv
head -n 5 data/events.csv- Use small sample datasets for testing when possible
- Do not commit large test data to the repository
- Document test data requirements clearly
-
Code Comments
- Explain why, not what (code shows what)
- Document non-obvious logic
- Add TODO or FIXME comments for future work
-
Docstrings
- All public functions, classes, and modules
- Parameters, return values, exceptions
- Usage examples
-
README Updates
- New features or capabilities
- Changed installation requirements
- Updated usage examples
-
CHANGELOG
- Add entry to
CHANGELOG.mdunder[Unreleased] - Follow Keep a Changelog format
- Include breaking changes prominently
- Add entry to
-
Update Documentation
- Update README.md if needed
- Add entry to CHANGELOG.md
- Update relevant documentation files
-
Test Your Changes
- Run all tests
- Test manually in a clean environment
- Verify no regressions
-
Create Pull Request
- Push to your fork
- Open PR against
mainbranch (ordevelopif using Git Flow) - Use descriptive PR title and description
- Link related issues with "Fixes #123" or "Closes #456"
-
PR Description Template
## Description Brief description of changes ## Type of Change - [ ] Bug fix (non-breaking change fixing an issue) - [ ] New feature (non-breaking change adding functionality) - [ ] Breaking change (fix or feature causing existing functionality to change) - [ ] Documentation update ## Testing - [ ] Tested locally - [ ] Added/updated tests - [ ] All tests pass ## Checklist - [ ] Code follows project style guidelines - [ ] Self-review completed - [ ] Documentation updated - [ ] CHANGELOG.md updated - [ ] No new warnings generated ## Related Issues Fixes #123
-
Code Review
- Respond to review comments
- Make requested changes
- Re-request review after updates
-
Merge
- Once approved, a maintainer will merge your PR
- Delete your feature branch after merge
Maintainers will review your PR for:
- Code quality and adherence to standards
- Test coverage and passing tests
- Documentation completeness
- Compatibility with existing features
- Performance considerations
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For questions and general discussion
- Pull Requests: For code contributions
Contributors will be:
- Listed in repository contributors
- Mentioned in CHANGELOG.md
- Credited in release notes
By contributing to this project, you agree that your contributions will be licensed under the MIT License.
If you have questions about contributing:
- Check existing documentation
- Search closed issues and PRs
- Open a new issue with the "question" label
- Reach out to maintainers
Thank you for contributing to NTSB Dataset Analysis!