Skip to content

Conversation

@charanw
Copy link

@charanw charanw commented Dec 10, 2025

Contributor: Charan Williams (charanw2@illinois.edu)

Contribution Type: New Model

Description:
Added a new clinical summarization model, BHCToAVS, which converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS). The model wraps a fine-tuned Mistral-7B LoRA adapter hosted on Hugging Face and integrates with the PyHealth model API. This contribution includes the full model implementation, unit tests, documentation, and an example usage script.

Files to Review:

  • pyhealth/models/bhc_to_avs.py — Main model implementation
  • pyhealth/models/__init__.py — Added import for the new model
  • tests/core/test_bhc_to_avs.py — Unit test for the BHCToAVS model
  • docs/api/models/pyhealth.models.bhc_to_avs.rst — Sphinx documentation file
  • docs/api/models.rst — Updated model index to include BHCToAVS
  • examples/bhc_to_avs_example.py — Example usage demonstrating model prediction

Introduces the BHCToAVS model, which converts clinical Brief Hospital Course (BHC) notes into After-Visit Summaries (AVS) using a fine-tuned Mistral 7B model with LoRA adapters. Adds model implementation, documentation, an example usage script, and unit tests.
@Logiquo Logiquo added the component: model Contribute a new model to PyHealth label Dec 18, 2025
@Logiquo Logiquo requested a review from Copilot December 27, 2025 10:24
@Logiquo Logiquo self-requested a review December 27, 2025 10:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new clinical summarization model, BHCToAVS, that converts Brief Hospital Course (BHC) notes into patient-friendly After-Visit Summaries (AVS) using a fine-tuned Mistral-7B LoRA adapter. The implementation integrates with PyHealth's model API and includes comprehensive documentation and examples.

  • Implements a new text generation model wrapping a Hugging Face LoRA adapter
  • Adds unit tests with graceful handling of model download failures
  • Provides example usage demonstrating the model's clinical text summarization capabilities

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pyhealth/models/bhc_to_avs.py Core model implementation with predict() method for generating patient-friendly summaries
pyhealth/models/init.py Added BHCToAVS to module exports
tests/core/test_bhc_to_avs.py Unit test validating the predict method with error handling for gated models
docs/api/models/pyhealth.models.BHCToAVS.rst Sphinx autodoc configuration for the new model
docs/api/models.rst Updated model index to include BHCToAVS
examples/bhc_to_avs_example.py Example script demonstrating model usage with synthetic clinical text

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"text-generation",
model=model,
tokenizer=tokenizer,
device_map="auto",
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline is created with device_map="auto" parameter twice: once in the AutoModelForCausalLM.from_pretrained call (line 48) and again in the pipeline constructor (line 65). The second device_map parameter in the pipeline call is redundant since the model has already been placed on devices, and may cause conflicts or unexpected behavior.

Suggested change
device_map="auto",

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
# Author: Charan Williams
# NetID: charanw2
# Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model.
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring header uses "# Description:" format which is not standard Python docstring style. The description should either be a proper module-level docstring (triple-quoted string) or follow a consistent comment format without the "Description:" label.

Suggested change
# Author: Charan Williams
# NetID: charanw2
# Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model.
"""Convert clinical brief hospital course (BHC) data to after-visit
summaries using a fine-tuned Mistral 7B model."""
# Author: Charan Williams
# NetID: charanw2

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +40
@dataclass
class BHCToAVS(BaseModel):
base_model_id: str = field(default="mistralai/Mistral-7B-Instruct")
"""HuggingFace repo containing the base Mistral 7B model."""

adapter_model_id: str = field(default="williach31/mistral-7b-bhc-to-avs-lora")
"""HuggingFace repo containing only LoRA adapter weights."""

Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the BHCToAVS class itself. The class lacks a docstring explaining its purpose, parameters, and usage. Only the individual fields and methods have documentation.

Copilot uses AI. Check for mistakes.
str
Patient-friendly summary.
"""

Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for the bhc_text parameter. The method should validate that bhc_text is not None and is a non-empty string before processing to provide clearer error messages to users.

Suggested change
# Validate input to provide clear error messages and avoid unexpected failures.
if bhc_text is None:
raise ValueError("bhc_text must not be None.")
if not isinstance(bhc_text, str):
raise TypeError(f"bhc_text must be a string, got {type(bhc_text).__name__}.")
if not bhc_text.strip():
raise ValueError("bhc_text must be a non-empty string.")

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +34
@dataclass
class BHCToAVS(BaseModel):
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataclass decorator on a class inheriting from BaseModel (which inherits from nn.Module) may not properly initialize the parent class. The dataclass-generated init should include a post_init method that calls super().init() to ensure nn.Module is properly initialized. Without this, features like the _dummy_param used for device detection may not work correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +31
_PROMPT = """Summarize for the patient what happened during the hospital stay:

### Brief Hospital Course:
{bhc}

### Patient Summary:
"""

# System prompt used during inference
_SYSTEM_PROMPT = (
"You are a clinical summarization model. Produce accurate, patient-friendly summaries "
"using only information from the doctor's note. Do not add new details.\n\n"
)

# Prompt used during fine-tuning
_PROMPT = (
"Summarize for the patient what happened during the hospital stay based on this doctor's note:\n"
"{bhc}\n\n"
"Summary for the patient:\n"
)
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _PROMPT variable is defined twice (lines 12-18 and lines 27-31), with the second definition overwriting the first. This creates dead code and potential confusion. Only one prompt definition should be kept, or they should be renamed to reflect their different purposes (e.g., _TRAINING_PROMPT and _INFERENCE_PROMPT).

Copilot uses AI. Check for mistakes.
max_new_tokens=512,
temperature=0.0,
eos_token_id=[pipe.tokenizer.eos_token_id],
pad_token_id=pipe.tokenizer.eos_token_id,
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline is missing the return_full_text=False parameter in the generate call. By default, Hugging Face text-generation pipelines return the full text including the input prompt. To return only the newly generated text, you should either set return_full_text=False in the pipeline call or manually strip the prompt from the output.

Suggested change
pad_token_id=pipe.tokenizer.eos_token_id,
pad_token_id=pipe.tokenizer.eos_token_id,
return_full_text=False,

Copilot uses AI. Check for mistakes.
# NetID: charanw2
# Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model.

from typing import Dict, Any
Copy link

Copilot AI Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Dict' is not used.
Import of 'Any' is not used.

Suggested change
from typing import Dict, Any

Copilot uses AI. Check for mistakes.
@Logiquo
Copy link
Collaborator

Logiquo commented Dec 27, 2025

The CI has failed.

@Logiquo Logiquo added the status: wait response Pending PR author's response label Dec 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: model Contribute a new model to PyHealth status: wait response Pending PR author's response

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants