Skip to content

Confusion about the gradient matrix used #13

@MohammadHossein-Bahari

Description

@MohammadHossein-Bahari

Hello,

Thanks for the great work.
As asked before (here) I do not see why in several methods like GraNd and submodular functions, you use the concatenation of loss gradient and its multiplication with the last feature embedding as shown here:

            bias_parameters_grads = torch.autograd.grad(loss, outputs)[0] #size: batch_size,num_classes
            weight_parameters_grads = self.model.embedding_recorder.embedding.view(batch_num, 1,
                                    self.embedding_dim).repeat(1, self.args.num_classes, 1) *\
                                    bias_parameters_grads.view(batch_num, self.args.num_classes,
                                    1).repeat(1, 1, self.embedding_dim)
            gradients.append(torch.cat([bias_parameters_grads, weight_parameters_grads.flatten(1)],
                                        dim=1).cpu().numpy()) 

You are basically using the last layer features scaled by the gradient. Do you have any reasons why you choose this instead of the ones common in the literature, like gradient with respect to the last layer parameters?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions