-
Notifications
You must be signed in to change notification settings - Fork 555
Add BHCToAVS model for patient-friendly summaries #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Introduces the BHCToAVS model, which converts clinical Brief Hospital Course (BHC) notes into After-Visit Summaries (AVS) using a fine-tuned Mistral 7B model with LoRA adapters. Adds model implementation, documentation, an example usage script, and unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new clinical summarization model, BHCToAVS, that converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS) using a fine-tuned Mistral-7B LoRA adapter. The implementation integrates with PyHealth's model API and includes comprehensive documentation and examples.
- Implements a new text generation model wrapping a Hugging Face LoRA adapter
- Adds unit tests with graceful handling of model download failures
- Provides example usage demonstrating the model's clinical text summarization capabilities
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pyhealth/models/bhc_to_avs.py | Core model implementation with predict() method for generating patient-friendly summaries |
| pyhealth/models/init.py | Added BHCToAVS to module exports |
| tests/core/test_bhc_to_avs.py | Unit test validating the predict method with error handling for gated models |
| docs/api/models/pyhealth.models.BHCToAVS.rst | Sphinx autodoc configuration for the new model |
| docs/api/models.rst | Updated model index to include BHCToAVS |
| examples/bhc_to_avs_example.py | Example script demonstrating model usage with synthetic clinical text |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "text-generation", | ||
| model=model, | ||
| tokenizer=tokenizer, | ||
| device_map="auto", |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline is created with device_map="auto" parameter twice: once in the AutoModelForCausalLM.from_pretrained call (line 48) and again in the pipeline constructor (line 65). The second device_map parameter in the pipeline call is redundant since the model has already been placed on devices, and may cause conflicts or unexpected behavior.
| device_map="auto", |
| # Author: Charan Williams | ||
| # NetID: charanw2 | ||
| # Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model. |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module docstring header uses "# Description:" format which is not standard Python docstring style. The description should either be a proper module-level docstring (triple-quoted string) or follow a consistent comment format without the "Description:" label.
| # Author: Charan Williams | |
| # NetID: charanw2 | |
| # Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model. | |
| """Convert clinical brief hospital course (BHC) data to after-visit | |
| summaries using a fine-tuned Mistral 7B model.""" | |
| # Author: Charan Williams | |
| # NetID: charanw2 |
| @dataclass | ||
| class BHCToAVS(BaseModel): | ||
| base_model_id: str = field(default="mistralai/Mistral-7B-Instruct") | ||
| """HuggingFace repo containing the base Mistral 7B model.""" | ||
|
|
||
| adapter_model_id: str = field(default="williach31/mistral-7b-bhc-to-avs-lora") | ||
| """HuggingFace repo containing only LoRA adapter weights.""" | ||
|
|
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the BHCToAVS class itself. The class lacks a docstring explaining its purpose, parameters, and usage. Only the individual fields and methods have documentation.
| str | ||
| Patient-friendly summary. | ||
| """ | ||
|
|
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation for the bhc_text parameter. The method should validate that bhc_text is not None and is a non-empty string before processing to provide clearer error messages to users.
| # Validate input to provide clear error messages and avoid unexpected failures. | |
| if bhc_text is None: | |
| raise ValueError("bhc_text must not be None.") | |
| if not isinstance(bhc_text, str): | |
| raise TypeError(f"bhc_text must be a string, got {type(bhc_text).__name__}.") | |
| if not bhc_text.strip(): | |
| raise ValueError("bhc_text must be a non-empty string.") |
| @dataclass | ||
| class BHCToAVS(BaseModel): |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataclass decorator on a class inheriting from BaseModel (which inherits from nn.Module) may not properly initialize the parent class. The dataclass-generated init should include a post_init method that calls super().init() to ensure nn.Module is properly initialized. Without this, features like the _dummy_param used for device detection may not work correctly.
| _PROMPT = """Summarize for the patient what happened during the hospital stay: | ||
|
|
||
| ### Brief Hospital Course: | ||
| {bhc} | ||
|
|
||
| ### Patient Summary: | ||
| """ | ||
|
|
||
| # System prompt used during inference | ||
| _SYSTEM_PROMPT = ( | ||
| "You are a clinical summarization model. Produce accurate, patient-friendly summaries " | ||
| "using only information from the doctor's note. Do not add new details.\n\n" | ||
| ) | ||
|
|
||
| # Prompt used during fine-tuning | ||
| _PROMPT = ( | ||
| "Summarize for the patient what happened during the hospital stay based on this doctor's note:\n" | ||
| "{bhc}\n\n" | ||
| "Summary for the patient:\n" | ||
| ) |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _PROMPT variable is defined twice (lines 12-18 and lines 27-31), with the second definition overwriting the first. This creates dead code and potential confusion. Only one prompt definition should be kept, or they should be renamed to reflect their different purposes (e.g., _TRAINING_PROMPT and _INFERENCE_PROMPT).
| max_new_tokens=512, | ||
| temperature=0.0, | ||
| eos_token_id=[pipe.tokenizer.eos_token_id], | ||
| pad_token_id=pipe.tokenizer.eos_token_id, |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline is missing the return_full_text=False parameter in the generate call. By default, Hugging Face text-generation pipelines return the full text including the input prompt. To return only the newly generated text, you should either set return_full_text=False in the pipeline call or manually strip the prompt from the output.
| pad_token_id=pipe.tokenizer.eos_token_id, | |
| pad_token_id=pipe.tokenizer.eos_token_id, | |
| return_full_text=False, |
| # NetID: charanw2 | ||
| # Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model. | ||
|
|
||
| from typing import Dict, Any |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Dict' is not used.
Import of 'Any' is not used.
| from typing import Dict, Any |
|
The CI has failed. |
Contributor: Charan Williams (charanw2@illinois.edu)
Contribution Type: New Model
Description:
Added a new clinical summarization model, BHCToAVS, which converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS). The model wraps a fine-tuned Mistral-7B LoRA adapter hosted on Hugging Face and integrates with the PyHealth model API. This contribution includes the full model implementation, unit tests, documentation, and an example usage script.
Files to Review:
pyhealth/models/bhc_to_avs.py— Main model implementationpyhealth/models/__init__.py— Added import for the new modeltests/core/test_bhc_to_avs.py— Unit test for the BHCToAVS modeldocs/api/models/pyhealth.models.bhc_to_avs.rst— Sphinx documentation filedocs/api/models.rst— Updated model index to include BHCToAVSexamples/bhc_to_avs_example.py— Example usage demonstrating model prediction