The current formula is something like
$h(t)=W_{ih}X(t)+b_{ih}+W_{hh}X(t-1)+b_{hh}$
but it should be something like
$h(t)=W_{ih}X(t)+b_{ih}+W_{hh}h(t-1)+b_{hh}$
The difference is that it is the previous state, not the previous inputs that are fed back into the network.
See official torch docs for confirmation: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNN.html
The current formula is something like
$h(t)=W_{ih}X(t)+b_{ih}+W_{hh}X(t-1)+b_{hh}$
$h(t)=W_{ih}X(t)+b_{ih}+W_{hh}h(t-1)+b_{hh}$
but it should be something like
The difference is that it is the previous state, not the previous inputs that are fed back into the network.
See official torch docs for confirmation: https://docs.pytorch.org/docs/stable/generated/torch.nn.RNN.html