Back into the neural foundry. Found this repo, which is a fine, simple piece of help to accompany Mital’s tougher tutorials (still at session 3 unfortunately, on autoencoders).

Going through a few of those I discovered a technique called ‘dropout’, which helps reducing overfitting (the situation where the networks creates a model which is too close to the data, thus not producing a general enough abstraction): in dropout some units (neurons) are randomly deactivated in the network, forcing it to keep working only with a reduced capacity. This has the effect of making the network ‘rely’ less on each and every one of its neurons, making it both more robust and less prone to overfitting.

 

Also, I went back to something that I am still to grasp more fully, the concept of momentum in gradient descent: