Machine Learning and Deep Learning Resources

Apr 29, 2017

Advice for building good models

Zero-mean data is generally better than nonzero-mean data. If the inputs of a neuron were always positive, the gradient that’d flow out of the neuron would either be all positive or all negative. In 2D, this means gradient descent would take a zig-zaggy path, which is a constrained path. See CS231n Lecture 5 by Andrej Karpathy for more details.
Textbooks
Deep Learning Book by Ian Goodfellow, Yoshua Bengio, Aaron Courville
Neural networks and deep learning by Michael Nielson
“It is the loss function that will direct what the intermediate hidden variables should be, so as to do a good job at predicting the targets for the next layer.” Link

References

Intuitive YouTube explanation of the entropy equation
Intuitive explanation of cross entropy
References for new deep learning students at the Montreal Institute for Learning Algorithm
Neural networks with at least one hidden layer are universal approximators.
Prefer L2 regularization, dropout, and input noise over smaller neural networks to prevent overfitting. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting.
Read this CS231n note about representational power of neural networks for tips on best practices.
Reading roadmap of important deep learning papers
Deep learning links from Ujjwal Karn, an NLP and deep learning engineer
Introduction to Automatic Differentiation
A Neural Parametric Singing Synthesizer with samples
Speech synthesis technology by Lyrebird
zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks
Deepmind Differenetial Neural Computer