Welcome to soda-data contributor's guide.
This guide is inspired by the 🤗 contribution guide
This document focuses on getting any potential contributor familiarized with the development processes, but [other kinds of contributions] are also appreciated.
If you are new to using [git] or have never collaborated in a project previously, please have a look at [contribution-guide.org]. Other resources are also listed in the excellent [guide created by FreeCodeCamp] [^contrib1].
Please notice, all users and contributors are expected to be open, considerate, reasonable, and respectful. When in doubt, [Python Software Foundation's Code of Conduct] is a good reference in terms of behavior guidelines.
The main contributing ways to SODA-data are listed below:
- Fix outstanding issues with the existing code.
- Submit issues related to bugs or desired new features.
- Implement new AI ready-to-go datasets.
If you experience bugs or general issues with soda-data, please have a look
on the issue tracker.
If you don't see anything useful there, please feel free to fire an issue report.
:::{tip} Please don't forget to include the closed issues in your search. Sometimes a solution was already reported, and the problem is considered solved. :::
New issue reports should include information about your programming environment (e.g., operating system, Python version) and steps to reproduce the problem. Please try also to simplify the reproduction steps to a very minimal example that still illustrates the problem you are facing. By removing other factors, you help us to identify the root cause of the issue.
Open an issue and get in contact with us. We will support you.
If you have generated a new ready to go dataset using SourceData, please contact us through the issue tracker to add your implementation to the supported data.
For that, fork the repository, add your changes and open a pull request.
You can help improve soda-data docs by making them more readable and coherent, or
by adding missing information and correcting mistakes.
soda-data documentation uses [Sphinx] as its main documentation compiler.
This means that the docs are kept in the same repository as the project code, and
that any documentation update is done in the same way was a code contribution.
When working on documentation changes in your local machine, you can compile them using [tox] :
tox -e docs
and use Python's built-in web server for a preview in your web browser
(http://localhost:8000):
python3 -m http.server --directory 'docs/_build/html'
We wellcome any code contributions.
- If you want to add an API connection, add a file
your_api.pytosoda_data.apis - If you want to add a new fine-tuning task, add a file
your_task.pytosoda_data.dataproc
Before you work on any non-trivial code contribution it's best to first create a report in the issue tracker to start a discussion on the subject. This often provides additional considerations and avoids unnecessary work.
Before you start coding, we recommend creating an isolated [virtual environment] to avoid any problems with your installed Python packages. This can easily be done via either [virtualenv]:
virtualenv <PATH TO VENV>
source <PATH TO VENV>/bin/activate
or [Miniconda]:
conda create -n soda-data python=3 six virtualenv pytest pytest-cov
conda activate soda-data
-
Create an user account on GitHub if you do not already have one.
-
Fork the project repository: click on the Fork button near the top of the page. This creates a copy of the code under your account on GitHub.
-
Clone this copy to your local disk:
git clone git@github.com:YourLogin/soda-data.git cd soda-data -
You should run:
pip install -U pip setuptools -e .to be able to import the package under development in the Python REPL.
-
Install [pre-commit]:
pip install pre-commit pre-commit installsoda-datacomes with a lot of hooks configured to automatically help the developer to check the code being written.
-
Create a branch to hold your changes:
git checkout -b my-featureand start making changes. Never work on the main branch!
-
Start your work on this branch. Don't forget to add [docstrings] to new functions, modules and classes, especially if they are part of public APIs.
-
Add yourself to the list of contributors in
AUTHORS.md. -
When you’re done editing, do:
git add <MODIFIED FILES> git committo record your changes in [git].
Please make sure to see the validation messages from [pre-commit] and fix any eventual issues. This should automatically use [flake8]/[black] to check/fix the code style in a way that is compatible with the project.
:::{important} Don't forget to add unit tests and documentation in case your contribution adds an additional feature and is not just a bugfix.
Moreover, writing a [descriptive commit message] is highly recommended. In case of doubt, you can check the commit history with:
git log --graph --decorate --pretty=oneline --abbrev-commit --allto look for recurring communication patterns. :::
-
Please check that your changes don't break any unit tests with:
tox(after having installed [tox] with
pip install toxorpipx).You can also use [tox] to run several other pre-configured tasks in the repository. Try
tox -avto see a list of the available checks.
-
If everything works fine, push your local branch to the remote server with:
git push -u origin my-feature -
Go to the web page of your fork and click "Create pull request" to send your changes for review.
Find more detailed information in [creating a PR]. You might also want to open the PR as a draft first and mark it as ready for review after the feedbacks from the continuous integration (CI) system or any required fixes.
The following tips can be used when facing problems to build or test the package:
-
Make sure to fetch all the tags from the upstream repository. The command
git describe --abbrev=0 --tagsshould return the version you are expecting. If you are trying to run CI scripts in a fork repository, make sure to push all the tags. You can also try to remove all the egg files or the complete egg folder, i.e.,.eggs, as well as the*.egg-infofolders in thesrcfolder or potentially in the root of your project. -
Sometimes [tox] misses out when new dependencies are added, especially to
setup.cfganddocs/requirements.txt. If you find any problems with missing dependencies when running a command with [tox], try to recreate thetoxenvironment using the-rflag. For example, instead of:tox -e docsTry running:
tox -r -e docs -
Make sure to have a reliable [tox] installation that uses the correct Python version (e.g., 3.7+). When in doubt you can run:
tox --version # OR which toxIf you have trouble and are seeing weird errors upon running [tox], you can also try to create a dedicated [virtual environment] with a [tox] binary freshly installed. For example:
virtualenv .venv source .venv/bin/activate .venv/bin/pip install tox .venv/bin/tox -e all -
[Pytest can drop you] in an interactive session in the case an error occurs. In order to do that you need to pass a
--pdboption (for example by runningtox -- -k <NAME OF THE FALLING TEST> --pdb). You can also setup breakpoints manually instead of using the--pdboption.