Posted on: 2021-07-28, in Category: research, tags: books references deep_learning machine_learning neural_networks

So you want to Sheep

I’ve been doing this Sheep Learning ¹ for a while and there often I have to prescribe some references. I’ll write this in the hope that it may be useful to others. Preferably the topics should be followed in order, but one can choose to explore as they wish.

Prerequisites

Probability and Statisitcs

A lot of people tend to jump into the whole area without minimal mathematical prerequisites. However, primarily these are statistical models with a strong focus on Linear Algebra. For undergraduate students I recommend (Hogg et al. 2019) for the bare minimum of that area.

There are too many online references these days and (read noise) it becomes almost impossible to discern between good and bad ones. I recall that when looking for supplemental resources the entire website https://online.stat.psu.edu/ was excellent.

For a review of basic concepts you can check their page https://online.stat.psu.edu/statprogram/reviews.
After that you can see their undergraduate https://online.stat.psu.edu/statprogram/undergraduate-studies and graduate https://online.stat.psu.edu/statprogram/graduate-programs courses for help on specific topics. They are self contained and explain in fair amount of detail.

Linear Algebra

People like Strang https://math.mit.edu/linearalgebra/, but I never really followed that. I suppose I had covered introductory algebra from other books which are now not available on the market but that is a good reference.

There are various books and resources available but essentially one should be very clear about the following concepts:

What is a matrix and what is a Linear Transformation and the difference between the two.
Rank, Nullity, Similarity and related properties of a matrix.
Vector Spaces and Basis and Dimension of a Vector Space
Matrix Transformations, Unitary Matrices
Eigenvalues of a matrix
Diagonalization of a matrix, SVD
Basic properties of matrix calculus

Calculus

A solid grasp of calculus is also required. Most of these Neural Network methods are based on finding minima of a function. As such they are extensions of Newton’s method. However the high dimensional nature means that the student should be familiar with Jacobian, Hessian and the their properties.

More advanced topics delve into Functional Analysis. A good intro to Functional Analysis is (Hirsch et al. 1999).

Machine Learning

I won’t cover books on advanced topics here. Machine Learning that we mention here is mostly Statistical Learning. A good introduction to that is (Hastie et al. 2009). Hastie and Tibshirani are of course noted statisticians and if one is so inclined, one can refer to other books by them. The book is available to download from its homepage.

A basic understanding of Bayesian Methods can also prove useful. I’ve found (Hoff 2009) a really good introduction to that.

Interested students can refer to (Hart et al. 1973) as a supplemental tract. Larry Wasserman’s (Wasserman 2004) is another book covering statistics from an inferential perspective. (Bishop 2007) is from a more Bayesian perspective. While some people like (nil?), I’ve found it too terse and riddled with errors.

Some topics from (Haykin 2010) can be studied also. I’ve found their chapter on Regularization to be especially good. They approach the Machine Learning paradigm and the Neural Networks paradigm from a more Signal Processing perspective. (Hart et al. 1973) has some chapters towards the end on topics like No Free Lunch or the difficultly in learning in High Dimensional spaces and No Free Lunch theorem etc.

An Excellent book on Learning Theory is (Shalev-Shwartz and Ben-David 2014). However a new student may find it daunting so I advise it to be referred only after some familiarity with the whole Machine Learning paradigm and the need to understand some deeper questions arises.

Neural Networks

Neural Networks: one of the most misleading phrases in modern science. “Deep” Neural Networks is even more confounding. The Neural Networks are essentially compositions of nonparameteric models linear models with nonlinear activation units.

Linear Models

A really good introduction to Linear Models and Generalized Additive Models, which gives a great insight into how such nonparametric methods may work is (Wood 2006) (homepage of book https://www.maths.ed.ac.uk/~swood34/igam/index.html). A good handle on Linear Models is essential to understand Neural Networks.

Introductory Books

My favourite introductory tract is (Rojas 1996). It covers from the beginning of history of Neural Networks and goes through boolean models, backpropagation (another contentious term), advanced optimization methods and some variants of Neural Networks.

Earlier version of (Haykin 1998) is suitable as a supplemental reading, while there are some good chapters in (Haykin 2010) which can be used for some advanced Machine Learning concepts also (as I’ve mentioned in Machine Learning).

Intermediate Books

(Goodfellow et al. 2015) is a good introduction to modern Neural Networks. First five chapters of it cover some topics useful later also. And chapters on CNNs and RNNs can be useful.

References

[1]

R. Hogg, E. A. Tanis, and D. Zimmerman, “Probability and statistical inference. ed.9.” Oct-2019.

[2]

F. Hirsch, G. Lacombe, and S. Levy, Elements of functional analysis. 1999.

[3]

T. Hastie, R. Tibshirani, and J. Friedman, “The elements of statistical learning: Data mining, inference, and prediction, 2nd edition,” Springer Series in Statistics, 2009, pp. I–XXII, 1–745.

[4]

P. D. Hoff, A first course in bayesian statistical methods. 2009.

[5]

P. Hart, R. Duda, and D. Stork, Pattern classification. 1973.

[6]

L. Wasserman, “All of statistics: A concise course in statistical inference.” Sep-2004.

[7]

C. Bishop, “Pattern recognition and machine learning (information science and statistics), 1st edn. 2006. Corr. 2nd printing edn,” Springer, New York, 2007.

[8]

S. Haykin, “Neural networks and learning machines.” 2010.

[9]

S. Shalev-Shwartz and S. Ben-David, Understanding machine learning - from theory to algorithms. 2014, pp. I–XVI, 1–397.

[10]

S. Wood, Generalized additive models: An introduction with r. CRC press, 2006.

[11]

R. Rojas, Neural networks - a systematic introduction. 1996.

[12]

S. Haykin, Neural networks: A comprehensive foundation. 1998.

[13]

I. Goodfellow, Y. Bengio, and A. C. Courville, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.

I call “Deep Learning” Sheep Learning because I feel a lot of people are doing it like sheep including me perhaps.↩︎

Are we there yet?