ML, Statistics and Mathematics

I have just started getting my hands wet in ML and every time I try delving deeper into the concepts/code, I face the challenges of the mathematics and its cryptic notations. Coming from a Computer Science background, I do understand bit of them but majority goes tangent.

Say, for example below formulae from this page -

I try and really want to understand them but somehow get confused and leave it everytime.

Can you please suggest how to start with it? Any starting pointers or advise please.

Topic data-science-model mathematics statistics machine-learning

Category Data Science


If you're trying too hard, ask yourself if you're enjoying it. If you're asking yourself if you're enjoying it, then maybe you should ask yourself what it is that you really enjoy.

Otherwise, try the Schaum series in Calculus, Linear Algebra, Statistics. Those are excellent books for beginners: https://www.amazon.com/Schaums-Outline-Linear-Algebra-Outlines/dp/1260011445


I would recommend a TOP-DOWN learning path:

  1. get a first grasp about what algorithms types there are based on possible use cases (classification, regression, clustering, etc); this way, you know the WHAT CAN I SOLVE WITH THIS
  2. for the algorithms you are interested in (a basic one could be a linear regression trained via gradient descent optimizer), you can get a first feeling using libraries like scikit-learn which wrap all the math in between, but give you results which you can quickly check and play with --> HOW CAN I SOLVE IT
  3. after you have played around it, you can have a deeper look at how the algorithms work, with the linear algebra, statistics and calculus concepts you need to really understand them (basically, the math fomulae you said) --> HOW IT WORKS

Good sources:

  • Python Machine Learning book, by Sebastian Raschka (good balance between theory and practice)
  • Jason Brownlee blog and books (very applied use cases)
  • scikit-learn documentation, which includes the math used in their code

It is quite true that papers or books use notations that sometimes seem obvious to people who are used to dealing with the mathematical aspects, but are meaningless for the others. Ways of understanding the math include:

  • Following theoretical courses or trainings
  • Reading dedicated books
  • Asking people around you
  • Asking people on forums such as this one, or Cross Validated for stats formulae
  • Getting it by yourself upon re-reading parts of the paper/book you didn't get at the first time

There are some notations/conventions that are implicitly accepted in data science / machine learning papers, such as:

  • Using $X$ as input, $y$ as output, $\theta$ as model parameters
  • Using $\hat{y}$ for the estimator of the true $y$
  • Assuming that vectors are column vectors

The list would be too long to include here.


Regarding the example above, what we face is a constrained optimization.

The $max$ statement means that we are looking for a maximum value of the expression that follows. What is below (namely, the $\Delta_{ij}$ values) the $max$ is the list of "free" parameters that change the value of the expression.

The $max$ statement is prefixed by $arg$, which means that we do not have interest in the expression's maximum value, but rather in the $\Delta_{ij}$ set that leads to that value.

Then we face a $s.t.$ statement, because this is no ordinary optimization, we also have to respect the several constraints expressed after $s.t.$. Those can be inequations, equations, belonging constraints, etc., either explicit ($\Delta_{ij} > 0$) or more implicit.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.