# Machine Learning and Deep Learning Resources

# Advice for building good models

- Zero-mean data is generally better than nonzero-mean data. If the inputs of a neuron were always positive, the gradient that’d flow out of the neuron would either be all positive or all negative. In 2D, this means gradient descent would take a zig-zaggy path, which is a constrained path. See CS231n Lecture 5 by Andrej Karpathy for more details.
# Textbooks

- Deep Learning Book by Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Neural networks and deep learning by Michael Nielson
- “It is the
**loss function**that will direct what the intermediate hidden variables should be, so as to do a good job at predicting the targets for the next layer.” Link

# References

- Intuitive YouTube explanation of the entropy equation
- Intuitive explanation of cross entropy
- References for new deep learning students at the Montreal Institute for Learning Algorithm
- Neural networks with at least one hidden layer are universal approximators.
**Prefer L2 regularization, dropout, and input noise over smaller neural networks to prevent overfitting**. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting.- Read this CS231n note about representational power of neural networks for tips on best practices.
- Reading roadmap of important deep learning papers
- Deep learning links from Ujjwal Karn, an NLP and deep learning engineer
- Introduction to Automatic Differentiation
- A Neural Parametric Singing Synthesizer with samples
- Speech synthesis technology by Lyrebird
- zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks
- Deepmind Differenetial Neural Computer