Merged
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR integrates TCGA BRCA dataset updates with enhanced feature selection methods, adds a NetworkLoader utility, and updates documentation and versioning to reflect the changes.
- Added RandomForest-based feature selection support and updated module imports.
- Implemented phenotype preprocessing in SmCCNet and introduced a new NetworkLoader class for network file management.
- Updated README, notebook examples, and CHANGELOG for consistency.
Reviewed Changes
Copilot reviewed 33 out of 41 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| bioneuralnet/utils/init.py | Updated preprocess import to include RandomForest selection |
| bioneuralnet/external_tools/smccnet.py | Added phenotype_df validation and logger initialization |
| bioneuralnet/datasets/tcga_brca/README.md | Expanded and clarified TCGA BRCA data preprocessing and feature selection details |
| bioneuralnet/datasets/network_loader.py | Introduced new utility class for loading bundled network files |
| bioneuralnet/datasets/dataset_loader.py | Modified data loading to support different feature selection methods and file naming conventions |
| bioneuralnet/datasets/init.py | Updated all to export NetworkLoader |
| bioneuralnet/init.py | Updated version and module exports |
| README.md | Minor version update display |
| Cancer_example.ipynb | Expanded example with full pipeline demo for TCGA BRCA |
| CHANGELOG.md | Revised changelog to reflect version update and release notes |
Files not reviewed (8)
- MANIFEST.in: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_ae/size_13_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_14_net_4.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_rf/size_21_net_3.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_2.csv: Language not supported
- bioneuralnet/datasets/networks/brca_smccnet_var/size_14_net_3.csv: Language not supported
- bioneuralnet/datasets/tcga_brca/brca_pam50.csv: Language not supported
Comments suppressed due to low confidence (1)
bioneuralnet/datasets/dataset_loader.py:52
- [nitpick] The file naming convention in dataset_loader (e.g., 'brca_mirna.csv') differs from the uppercase style referenced in the README. Consider standardizing the naming convention for clarity.
self.data["brca_mirna"] = pd.read_csv(folder / "brca_mirna.csv", index_col=0)
|
|
||
| - **BUG**: A bug related to rdata files missing | ||
| - **New realease**: A new release will include documentation for the other updates. (1.0.3 or 1.0.2) No newline at end of file | ||
| - **New realease**: A new release will include documentation for the other updates. (1.1.0) No newline at end of file |
There was a problem hiding this comment.
Typo 'realease' should be corrected to 'release'.
Suggested change
| - **New realease**: A new release will include documentation for the other updates. (1.1.0) | |
| - **New release**: A new release will include documentation for the other updates. (1.1.0) |
SundousHussein
approved these changes
Apr 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added dataset from tcga-brca:
Initial data dimensions:
Feature Selection Methods
Performed separately on Methylation and RNA datasets (top 1,000 features each):
Unsupervised:
Supervised:
also set up network loader to load the networks generated by SmCCNet from these feature selection.
More details for the data preprocessing in the README inside tcga_brca