Skip to content

[WIP] Refactor dataset to calculate label normalization dynamically#6

Closed
Copilot wants to merge 1 commit into
feat/minor_upgradesfrom
copilot/refactor-dataset-normalization
Closed

[WIP] Refactor dataset to calculate label normalization dynamically#6
Copilot wants to merge 1 commit into
feat/minor_upgradesfrom
copilot/refactor-dataset-normalization

Conversation

Copy link
Copy Markdown

Copilot AI commented Jan 21, 2026

Dynamic Label Normalization and Dataset Refactoring

  • Explore repository structure and understand current code
  • Modify PairedImageDataset.__init__ to:
    • Remove feat_transform and label_transform parameters
    • Add sigma parameter (default value 3)
    • Calculate normalization constant dynamically using calibration logic
  • Update internal transforms to be non-customizable:
    • Make feat_transform an internal method
    • Make label_transform use dynamic normalization constant and sigma parameter
  • Update create_rebalanced_subset function to remove transform parameters
  • Update create_subsets method to remove transform parameters
  • Test the changes to ensure they work correctly
  • Run code review and security checks
Original prompt

Make it so that the label transform is applied first to a single 3x3 data point of intensity 255 in the center of an otherwise black image (value 0) of size 100x100. The datatype should be np.float32. The maximum intensity of that image should then be stored as the normalization constant, which is currently hard coded to 120. This should be flexibly evaluated based on the sigma blurring value, which should be passed as an argument to the dataset initialization. Also remove custom feat_transform and label_transform arguments from this code.

The user has attached the following files from their workspace:

  • scrape_fa/paired_image_dataset.py

TITLE: Dynamic Label Normalization and Dataset Refactoring

USER INTENT: Refactor the PairedImageDataset to calculate the label normalization constant dynamically based on a gaussian blur of a point source, rather than using a hardcoded value, and remove support for custom transform arguments.

TASK DESCRIPTION:

  • Remove feat_transform and label_transform arguments from PairedImageDataset.__init__.
  • Implement a dynamic normalization calculation:
    • Create a 100x100 np.float32 array of zeros.
    • Set the center 3x3 area to 255 (a point source).
    • Apply gaussian blur using a provided sigma value.
    • Use the maximum intensity of this resulting image as the normalization constant for labels.
  • Pass sigma as an argument to PairedImageDataset initialization to control the blurring and normalization.

EXISTING:

  • scrape_fa/paired_image_dataset.py: Contains the PairedImageDataset class with hardcoded normalization (division by 120) and custom transform arguments.
  • scrape_fa/create_rebalanced_dataset.py: A script that was recently refactored to handle rebalancing logic locally.

PENDING:

  • Modify PairedImageDataset in scrape_fa/paired_image_dataset.py to:
    • Accept sigma (defaulting to 3 presumably, based on previous code) in __init__.
    • Remove feat_transform and label_transform from __init__.
    • Implement the calibration logic described above to find the division constant.
    • Update label_transform logic to use sigma for blurring and the calculated constant for normalization.
    • Update feat_transform to be a fixed internal method (numpy to tensor).

CODE STATE:

  • Modified scrape_fa/create_rebalanced_dataset.py: Logic for rebalancing moved here.
  • Modified scrape_fa/paired_image_dataset.py: create_rebalanced_subset helper function removed.
  • Upcoming changes will center on scrape_fa/paired_image_dataset.py.

RELEVANT CODE/DOCUMENTATION SNIPPETS:
Current init in paired_image_dataset.py (to be changed):

def __init__(self, hdf5_path, feat_transform: Callable | None = None, label_transform: Callable | None = None, indices: list[int] | None = None):
    # ...
    if label_transform is None:
        def label_transform(arr):
            blurred = gaussian_filter(arr, sigma=3)
            return torch.from_numpy(blurred).unsqueeze(0) / 120  # recalibrated blurring and scaling
    # ...

Requested Logic:

# Pseudo-code for calibration
dummy = np.zeros((100, 100), dtype=np.float32)
dummy[49:52, 49:52] = 255
blurred_dummy = gaussian_filter(dummy, sigma=self.sigma)
norm_constant = blurred_dummy.max()

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI requested a review from KarimAED January 21, 2026 14:07
Copilot stopped work on behalf of KarimAED due to an error January 21, 2026 14:07
@KarimAED KarimAED closed this Jan 21, 2026
@KarimAED KarimAED deleted the copilot/refactor-dataset-normalization branch January 21, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants