-
Notifications
You must be signed in to change notification settings - Fork 24
refactor: Refactor trajectory data loading and improve the docstrings #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds typed flags to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
dptb/data/dataset/_default_dataset.py (1)
417-417: Remove unused variable assignment.The
pbcvariable is assigned but never used after line 417.Apply this diff:
assert all(attr in info for attr in ["r_max", "pbc"]) - pbc = info["pbc"] # not used? self.raw_data.append(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dptb/data/dataset/_default_dataset.py(4 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
dptb/data/dataset/_default_dataset.py
174-174: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
208-208: print() call in assert statement is likely unintentional
Remove print
(RUF030)
417-417: Local variable pbc is assigned to but never used
Remove assignment to unused variable pbc
(F841)
🔇 Additional comments (9)
dptb/data/dataset/_default_dataset.py (9)
195-197: LGTM!Good defensive programming with a clear, actionable error message for mutually exclusive options.
205-212: LGTM! Good use of context manager and frame validation.The context manager ensures proper resource cleanup, and the frame count validation with info update is a robust approach to handle inconsistencies.
214-222: LGTM! Robust atom count validation.The use of
np.unique()to ensure consistent atom counts across frames is good defensive programming, and the info update when mismatches are detected is appropriate.
224-240: LGTM! Comprehensive PBC handling.The validation logic properly handles various PBC configurations and edge cases, with appropriate warnings when info and trajectory settings differ.
242-269: Good data assembly with proper validation.The frame-by-frame data collection and final validation ensure data consistency. The cell handling logic (lines 264-268) correctly enforces that either all frames have cells or none do, preventing hybrid scenarios.
360-395: LGTM! Excellent documentation.The comprehensive docstring with detailed parameter descriptions significantly improves code maintainability and usability.
396-399: LGTM! Good abstraction.The local helper function cleanly centralizes the logic for selecting the appropriate data constructor based on
pos_type, improving maintainability.
418-425: LGTM! Consistent data loading.The use of the
build_datahelper ensures uniform data construction across differentpos_typevariants.
444-444: LGTM! Helpful type annotation.The type annotation improves code clarity and IDE support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dptb/data/dataset/_default_dataset.py (2)
216-244: Consider consolidating trajectory iterations.The trajectory is iterated three times (lines 216, 229, 244) to validate atom counts, PBC settings, and collect data. While the current approach provides clear validation, combining into a single pass would improve performance for large trajectories.
Consider refactoring to:
# Single-pass validation and collection natoms_list, pbc_list = [], [] atomic_numbers, positions, cell = [], [], [] for atoms in traj: natoms_list.append(len(atoms)) pbc_list.append(atoms.pbc.tolist()) atomic_numbers.append(atoms.get_atomic_numbers()) positions.append(atoms.get_positions()) cell_read = atoms.get_cell() cell.append(None if np.allclose(cell_read, np.zeros((3, 3)), atol=1e-6) else cell_read) # Validate consistency natoms = np.unique(natoms_list) assert len(natoms) == 1, "Number of atoms in trajectory file is not consistent." natoms = natoms[0] pbc_unique = np.unique(pbc_list, axis=0) assert len(pbc_unique) == 1, "PBC setting in trajectory file is not consistent." pbc = pbc_unique[0].tolist()
203-203: Remove unnecessary initialization.Line 203 initializes these variables to
None, but they're immediately reassigned to empty lists at line 243. The initial assignment is redundant.Apply this diff:
- atomic_numbers, positions, cell = None, None, None - # use the context manager to avoid memory leak with Trajectory(traj_file[0], 'r') as traj:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dptb/data/dataset/_default_dataset.py(4 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
dptb/data/dataset/_default_dataset.py
417-417: Local variable pbc is assigned to but never used
Remove assignment to unused variable pbc
(F841)
🔇 Additional comments (5)
dptb/data/dataset/_default_dataset.py (5)
170-194: LGTM! Well-documented method signature.The type hints and comprehensive docstring improve code clarity. The past review comments regarding type hints and the typo have been properly addressed.
195-201: LGTM! Clear validation logic.The assertion prevents conflicting data loads, and the single-trajectory enforcement is appropriate.
256-268: LGTM! Robust data validation and assembly.The validation ensures trajectory data is complete and consistent, and the cell handling correctly supports both periodic and non-periodic systems while rejecting hybrid cases.
360-395: LGTM! Excellent documentation.The comprehensive docstring and type hints significantly improve code clarity and maintainability.
396-399: LGTM! Clean abstraction.The
build_datahelper centralizes the loading logic and makes the code more maintainable by routing to the appropriate constructor based onpos_type.
| info=info) | ||
| self.raw_data.append(subdata) | ||
| assert all(attr in info for attr in ["r_max", "pbc"]) | ||
| pbc = info["pbc"] # not used? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused variable assignment.
Static analysis correctly identifies that pbc is assigned but never used. The comment acknowledges this uncertainty.
Apply this diff:
assert all(attr in info for attr in ["r_max", "pbc"])
- pbc = info["pbc"] # not used?
self.raw_data.append(🧰 Tools
🪛 Ruff (0.14.7)
417-417: Local variable pbc is assigned to but never used
Remove assignment to unused variable pbc
(F841)
🤖 Prompt for AI Agents
In dptb/data/dataset/_default_dataset.py around line 417 the code assigns pbc =
info["pbc"] but never uses it; remove the unused variable assignment (and any
accompanying comment that questions its usage) so no unused local remains, or if
pbc is intended to be used, replace the removal by using it appropriately—most
likely simply delete the pbc = info["pbc"] line.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
dptb/data/dataset/_default_dataset.py (1)
416-417: Remove unused variable assignment.The
pbcvariable is assigned but never used. The comment acknowledges this uncertainty. This was previously flagged but remains unaddressed.Apply this diff:
assert all(attr in info for attr in ["r_max", "pbc"]) - pbc = info["pbc"] # not used? self.raw_data.append(As per past review comments, this issue should be resolved.
🧹 Nitpick comments (1)
dptb/data/dataset/_default_dataset.py (1)
195-197: Consider usingandinstead of*for boolean logic.The multiplication operator works for the boolean check, but using
andwould be more idiomatic and clearer.Apply this diff:
- assert not get_Hamiltonian * get_DM, \ + assert not (get_Hamiltonian and get_DM), \ "Hamiltonian and Density Matrix can only loaded one at a time, " + \ "for which will occupy the same attribute in the AtomicData."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dptb/data/dataset/_default_dataset.py(4 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
dptb/data/dataset/_default_dataset.py
417-417: Local variable pbc is assigned to but never used
Remove assignment to unused variable pbc
(F841)
🔇 Additional comments (5)
dptb/data/dataset/_default_dataset.py (5)
170-194: LGTM! Excellent documentation and type safety improvements.The type hints and detailed docstring significantly improve code clarity and maintainability. The Optional[Dict] type hint is correctly applied, and the docstring thoroughly documents the method's behavior including the info update semantics.
203-269: Excellent memory-safe refactoring with thorough validation!The context manager prevents memory leaks from persistent Trajectory file handles. The refactored logic provides:
- Clear extraction of nframes and natoms with consistency validation
- Automatic info updates when trajectory data differs
- PBC handling with appropriate warnings for mismatches
- Consolidated data construction with proper shape validation
- Clean separation between cell present/absent cases
This is a significant improvement over ad-hoc per-frame accumulation.
360-395: LGTM! Clear documentation of the refactored constructor.The type hints and comprehensive docstring improve code maintainability. The documentation clearly explains each parameter's purpose, including transparent notes about unused parameters retained for compatibility.
396-399: LGTM! Clean abstraction centralizes data loading.The
build_datahelper provides a flexible dispatch mechanism that routes to the appropriate builder based onpos_type, with a sensible fallback tofrom_text_data. This improves maintainability and makes future extensions easier.
418-425: LGTM! Clean integration of the build_data helper.The call to
build_datacorrectly passes all necessary parameters including thepos_typedispatch key and flags for loading optional data. The code is clear and maintainable.
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.