[WIP] Refactor dataset to calculate label normalization dynamically by Copilot · Pull Request #6 · algorithmicgovernance/TentNetFA

Copilot · 2026-01-21T14:05:23Z

Dynamic Label Normalization and Dataset Refactoring

Original prompt

Make it so that the label transform is applied first to a single 3x3 data point of intensity 255 in the center of an otherwise black image (value 0) of size 100x100. The datatype should be np.float32. The maximum intensity of that image should then be stored as the normalization constant, which is currently hard coded to 120. This should be flexibly evaluated based on the sigma blurring value, which should be passed as an argument to the dataset initialization. Also remove custom feat_transform and label_transform arguments from this code.

The user has attached the following files from their workspace:

scrape_fa/paired_image_dataset.py

TITLE: Dynamic Label Normalization and Dataset Refactoring

USER INTENT: Refactor the PairedImageDataset to calculate the label normalization constant dynamically based on a gaussian blur of a point source, rather than using a hardcoded value, and remove support for custom transform arguments.

TASK DESCRIPTION:

Remove feat_transform and label_transform arguments from PairedImageDataset.__init__.

Implement a dynamic normalization calculation:

Create a 100x100 np.float32 array of zeros.

Set the center 3x3 area to 255 (a point source).

Apply gaussian blur using a provided sigma value.

Use the maximum intensity of this resulting image as the normalization constant for labels.

Pass sigma as an argument to PairedImageDataset initialization to control the blurring and normalization.

EXISTING:

scrape_fa/paired_image_dataset.py: Contains the PairedImageDataset class with hardcoded normalization (division by 120) and custom transform arguments.

scrape_fa/create_rebalanced_dataset.py: A script that was recently refactored to handle rebalancing logic locally.

PENDING:

Modify PairedImageDataset in scrape_fa/paired_image_dataset.py to:

Accept sigma (defaulting to 3 presumably, based on previous code) in __init__.

Remove feat_transform and label_transform from __init__.

Implement the calibration logic described above to find the division constant.

Update label_transform logic to use sigma for blurring and the calculated constant for normalization.

Update feat_transform to be a fixed internal method (numpy to tensor).

CODE STATE:

Modified scrape_fa/create_rebalanced_dataset.py: Logic for rebalancing moved here.

Modified scrape_fa/paired_image_dataset.py: create_rebalanced_subset helper function removed.

Upcoming changes will center on scrape_fa/paired_image_dataset.py.

RELEVANT CODE/DOCUMENTATION SNIPPETS:
Current init in paired_image_dataset.py (to be changed):
def __init__(self, hdf5_path, feat_transform: Callable | None = None, label_transform: Callable | None = None, indices: list[int] | None = None):
    # ...
    if label_transform is None:
        def label_transform(arr):
            blurred = gaussian_filter(arr, sigma=3)
            return torch.from_numpy(blurred).unsqueeze(0) / 120  # recalibrated blurring and scaling
    # ...
Requested Logic:
# Pseudo-code for calibration
dummy = np.zeros((100, 100), dtype=np.float32)
dummy[49:52, 49:52] = 255
blurred_dummy = gaussian_filter(dummy, sigma=self.sigma)
norm_constant = blurred_dummy.max()

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Initial plan

4dedb1e

Copilot AI assigned Copilot and KarimAED Jan 21, 2026

Copilot started work on behalf of KarimAED January 21, 2026 14:05 View session

Copilot AI requested a review from KarimAED January 21, 2026 14:07

Copilot stopped work on behalf of KarimAED due to an error January 21, 2026 14:07
The session was cancelled by the user.

KarimAED closed this Jan 21, 2026

KarimAED deleted the copilot/refactor-dataset-normalization branch January 21, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor dataset to calculate label normalization dynamically#6

[WIP] Refactor dataset to calculate label normalization dynamically#6
Copilot wants to merge 1 commit into
feat/minor_upgradesfrom
copilot/refactor-dataset-normalization

Copilot AI commented Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dynamic Label Normalization and Dataset Refactoring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 21, 2026 •

edited

Loading