- Try various hidden layer sizes
- Try different dataset
- Try randomizing the
w2_randomfor each batch or epoch.- Tried it for each batch and the network was not learning
- Also tried it for each epoch. This is slightly less bad than doing it for each batch