Skip to content

Fix: Prevent redundant OUT column creation and add null check#154

Open
fazhang-master wants to merge 1 commit intoBioPandas:mainfrom
fazhang-master:logic_repair
Open

Fix: Prevent redundant OUT column creation and add null check#154
fazhang-master wants to merge 1 commit intoBioPandas:mainfrom
fazhang-master:logic_repair

Conversation

@fazhang-master
Copy link
Copy Markdown

  • Moved OUT column creation outside inner loop to avoid redundant operations
  • Added null check for pdb_records[r] to prevent processing undefined records
  • Combined performance optimization with robustness improvement

Code of Conduct

Origin

for r in dfs:
    for col in pdb_records[r]:
        dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])
        dfs[r]["OUT"] = pd.Series("", index=dfs[r].index)

New

for r in dfs:
    if pdb_records[r]:
         dfs[r]["OUT"] = pd.Series("", index=dfs[r].index)
         for col in pdb_records[r]:
             dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])

Description

biopandas/biopandas/pdb/pandas_pdb.py
In sections 717-720and 948-951, the statement dfs[r]["OUT"] = pd.Series("", index=dfs[r].index) has no relation to the second for loop and will be executed multiple times within the second for loop, which is meaningless. It is recommended to move it to the first for loop.

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./biopandas

- Moved OUT column creation outside inner loop to avoid redundant operations
- Added null check for pdb_records[r] to prevent processing undefined records
- Combined performance optimization with robustness improvement
@fazhang-master
Copy link
Copy Markdown
Author

log3.9.txt
@wojdyr @ecederstrand @rasbt @dominiquesydow The build failed because AppVeyor cannot find the Python 3.11 package in the win-32 environment, which is a CI environment configuration issue. My code has passed tests on all Python versions from 3.7 to 3.10. I suggest updating the CI configuration to exclude unsupported platform combinations.
log3.7.txt
log3.8.txt
log3.9.txt
log3.10.txt
log3.11.txt
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant