Confusion about the gradient matrix used

Hello,

Thanks for the great work.
As asked before ([here](https://github.com/PatrickZH/DeepCore/issues/4#issuecomment-1441100968)) I do not see why in several methods like GraNd and submodular functions, you use the concatenation of loss gradient and its multiplication with the last feature embedding as shown here:

                bias_parameters_grads = torch.autograd.grad(loss, outputs)[0] #size: batch_size,num_classes
                weight_parameters_grads = self.model.embedding_recorder.embedding.view(batch_num, 1,
                                        self.embedding_dim).repeat(1, self.args.num_classes, 1) *\
                                        bias_parameters_grads.view(batch_num, self.args.num_classes,
                                        1).repeat(1, 1, self.embedding_dim)
                gradients.append(torch.cat([bias_parameters_grads, weight_parameters_grads.flatten(1)],
                                            dim=1).cpu().numpy()) 

You are basically using the last layer features scaled by the gradient. Do you have any reasons why you choose this instead of the ones common in the literature, like gradient with respect to the last layer parameters?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about the gradient matrix used #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Confusion about the gradient matrix used #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions