Advice for building good models
- Zero-mean data is generally better than nonzero-mean data. If the inputs of a neuron were always positive, the gradient that’d flow out of the neuron would either be all positive or all negative. In 2D, this means gradient descent would take a zig-zaggy path, which is a constrained path. See CS231n Lecture 5 by Andrej Karpathy for more details.
- Deep Learning Book by Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Neural networks and deep learning by Michael Nielson
- “It is the loss function that will direct what the intermediate hidden variables should be, so as to do a good job at predicting the targets for the next layer.” Link
- Intuitive YouTube explanation of the entropy equation
- Intuitive explanation of cross entropy
- References for new deep learning students at the Montreal Institute for Learning Algorithm
- Neural networks with at least one hidden layer are universal approximators.
- Prefer L2 regularization, dropout, and input noise over smaller neural networks to prevent overfitting. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting.
- Read this CS231n note about representational power of neural networks for tips on best practices.
- Reading roadmap of important deep learning papers
- Deep learning links from Ujjwal Karn, an NLP and deep learning engineer
- Introduction to Automatic Differentiation
- A Neural Parametric Singing Synthesizer with samples
- Speech synthesis technology by Lyrebird
- zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks
- Deepmind Differenetial Neural Computer