Machine-Learning-Foundations · CarlaMue · Feb 9, 2026 · Feb 9, 2026
diff --git a/README.md b/README.md
@@ -1,15 +1,24 @@
 # Exercise: Interpretable machine learning
 
 # Task 1: Input optimization.
-Open the `src/input_opt.py` file. The network `./data/weights.pth` contains network weights pre-trained on MNIST. Turn the network optimization problem around, and find an input that makes a particular output neuron extremely happy. In other words maximize,
+Open `src/input_opt.py`. In this exercise we will turn the network optimization problem around. Instead of updating weights to minimize loss, we will keep the weights fixed and update the input image to maximize the activation of a specific output neuron.
 
+The network `./data/weights.pth` contains network weights pre-trained on MNIST. We want to generate an image $\mathbf{x}$ that the network strongly believes shows a specific digit.
+
+Mathematically, we want to maximize:
 ```math
 \max_\mathbf{x} y_i = f(\mathbf{x}, \theta) .
 ```
+where $f$ is the neural network function, $\mathbf{x}$ is the input image, $\theta$ are the fixed weights of the network, and $y_i$ is the output of the target neuron corresponding to the digit we want to visualize.
+
+
+1. Complete `forward_pass`: Implement the function to return the scalar output of the target neuron.
+
+The gradients are computed using `torch.func.grad`. Start with a network input image $\mathbf{x}$ of shape `[1, 1, 28, 28]`.
+
+2. Write an optimization loop to iteratively update the input image $\mathbf{x}$ based on the computed gradients.
 
-Use `torch.func.grad` to find the gradients of the network input $\mathbf{x}$.
-Start with a network input of shape `[1, 1, 28, 28]`. Compare a random initialization
-to starting from an array filled with ones, iteratively optimize it. Execute your script with `python src/input_opt.py`.
+3. Compare and visualize the results of starting with a random noise image versus starting with a image filled with ones.
 
 # Task 2 Integrated Gradients (Optional):
 
@@ -20,9 +29,9 @@ Reuse your MNIST digit recognition code. Implement IG as discussed in the lectur
 \text{IntegratedGrads}_i(x) = (x_i - x_i') \cdot \frac{1}{m} \sum_{k=1}^m \frac{\partial F (x' + \frac{k}{m} \cdot (x - x'))}{\partial x_i}.
 ```
 
-F partial xi denotes the gradients with respect to the input color-channels i.
-x prime denotes a baseline black image. And x symbolizes an input we are interested in.
-Finally, m denotes the number of summation steps from the black baseline image to the interesting input.
+$\frac{\partial F}{\partial x_i}$ denotes the gradients with respect to the input color-channels $i$.
+$x'$ denotes a baseline black image. And $x$ symbolizes an input we are interested in.
+Finally, $m$ denotes the number of summation steps from the black baseline image to the interesting input.
 
 Follow the todos in `./src/mnist_integrated.py` and then run `scripts/integrated_gradients.slurm`.
 
@@ -57,19 +66,16 @@ The desired outcome is to have a folder called `ffhq_style_gan` in the project d
 The `load_folder` function from the `util` module loads both real and fake data.
 Code to load the data is already present in the `deepfake_interpretation.py` file.
 
-Compute log-scaled frequency domain representations of samples from both sources via
+1. Implement the `transform` function to compute log-scaled frequency domain representations of samples from both sources via
 
-``` math
-\mathbf{F}_I =  \log_e (| \mathcal{F}_{2d}(\mathbf(I)) | + \epsilon ), \text{ with } \mathbf{I} \in \mathbb{R}^{h,w,c}, \epsilon \approx 0 .
-```
-
-Above `h`, `w` and `c` denote image height, width and columns. `Log` denotes the natural logarithm, and bars denote the absolute value. A small epsilon is added for numerical stability.
-
-Use the numpy functions `np.log`, `np.abs`, `np.fft.fft2`. By default, `fft2` transforms the last two axes. The last axis contains the color channels in this case. We are looking to transform the rows and columns.
+   ``` math
+   \mathbf{F}_I =  \log_e (| \mathcal{F}_{2d}(\mathbf(I)) | + \epsilon ), \text{ with } \mathbf{I} \in \mathbb{R}^{h,w,c}, \epsilon \approx 0 .
+   ```
 
-Plot mean spectra for real and fake images as well as their difference over the entire validation or test sets. For that complete the TODOs in `src/deepfake_interpretation.py` and run the script `scripts/train.slurm`.
+   Above `h`, `w` and `c` denote image height, width and columns. `Log` denotes the natural logarithm, and bars denote the absolute value. A small epsilon is added for numerical stability.
 
+   Use the numpy functions `np.log`, `np.abs`, `np.fft.fft2`. By default, `fft2` transforms the last two axes. The last axis contains the color channels in this case. We are looking to transform the rows and columns.
 
-## 3.3 Training and interpreting a linear classifier
-Train a linear classifier consisting of a single `nn.Linear`-layer on the log-scaled Fourier coefficients using Torch. Plot the result. What do you see?
+2. Plot mean spectra for real and fake images as well as their difference over the entire validation or test sets. For that run the script `scripts/train.slurm`.
 
+3. `scripts/train.slurm` also trains a linear classifier (consisting of a single `nn.Linear`-layer) to distinguish real from fake images on the log-scaled Fourier coefficients. We want to visualize the weights of the trained classifier. For that go to `src/deepfake_interpretation.py` and implement the TODO at the end of the file. What do you see?
diff --git a/src/deepfake_interpretation.py b/src/deepfake_interpretation.py
@@ -87,7 +87,7 @@ def eval_step(net, loss, img, labels):
 
 def transform(image_data):
     """Transform image data."""
-    # TODO: Implement the function given in the readme
+    # 3.2.1 TODO: Implement the function given in the readme
     return np.zeros_like(image_data)
 
 
@@ -249,7 +249,7 @@ def transform(image_data):
         plt.colorbar()
         plt.savefig("mean_freq_difference.jpg")
 
-        # TODO: Visualize the weight array `net.dense.weight`.
+        # 3.2.3 TODO: Visualize the weight array `net.dense.weight`.
         # By reshaping and plotting the weight matrix.
 
     if type(net) is CNN:

diff --git a/src/input_opt.py b/src/input_opt.py
@@ -38,13 +38,16 @@ def forward(self, x):
 
     net = CNN()
     net.load_state_dict(weights)
-    neuron = 3
+    neuron = 3  # Target neuron index
 
     def forward_pass(x):
         """Make single forward pass."""
-        # TODO: Compute and return the activation value of a single neuron.
-        return 0.
+        # 1.1 TODO: Compute and return the activation value of a single neuron.
+        return 0.0
 
     get_grads = grad(forward_pass)
 
-    # TODO: Optimize an input to maximize that output.
+    # 1.2 TODO: Write an optimization loop to optimize an input image to maximize the output of the target neuron.
+
+    # 1.3 TODO: Plot the optimized input image.
+    # Compare the results from a random initialization and an initialization with ones.
diff --git a/src/mnist_integrated.py b/src/mnist_integrated.py
@@ -115,20 +115,20 @@ def integrate_gradients(net, test_images, output_digit, steps_m=300):
     g_list = []
     for test_image_x in tqdm(test_images, desc="Integrating Gradients"):
 
-        # TODO: create a list for the gradients.
+        # list for the gradients
         step_g_list = []
-        
+
         # TODO: create a black reference image via `zeros_like`` .
-        
+
         # TODO: Loop over the integration steps.
         for current_step_k in range(steps_m):
             pass
             # TODO: compute the input to F from equation 5 in the slides.
-        
+
             # TODO: define a forward pass for torch.func.grad
-        
+
             # TODO: use torch.grad to find the gradient with repsect to the input image.
-            
+
             # TODO: append the gradient to your list
 
         # TODO: Return the sum of the of the list elements.