Bda
Mango Inference
Statistical models create a self-contained logical world of their own which we then dry to deploy into the real world. Navigating between these two worlds is the central challenge of statistcal modeling. - Richard McElearth
Imagine that we are mango devotee’s and have arrived on a new island looking for tasty mango’s. This island happens to have a lot of different wild mango’s. Let us denote the set of all possible mango’s as $\Omega$, i.e
$\Omega$ is the sample space - set of all possible datasets..
Now we start eating these mango’s and to our sorrow only some of them turn out to be tasty so now we’re confused whether to stay on this island and continue eating more mango’s or leave. We decide to stay on this island only if there exist a relatively high number of tasty mango’s. However, determining if a Mango is tasty might be a complicated task since it would require assesing a Mango over it’s different charecterstics.
Formally, we presume that this population is charecterized by a joint probability distribution.
Since we are a mango enthusiast, based on our year’s of experience with mangos we know that softness of a mangos can help determine it’s taste pretty well and thus have also created a special device that gives a softenss value $[0, \infty)$ for each mango. Formally, this device can be simply thought of as a function, $f_d: \text{Mango} \rightarrow R^+$. What did we do here? By using this function to quantify each mango’s taste we have now created a mechanism to numerically describe $\Omega$. Since, $\Omega$ is now a set in $R^+$ we can presume the existence of a probability space, $(\Omega, \mathcal{F}, P)$ on it.
This probability space on $\Omega$ is called the data generating process.
Since, there exist uncountable many mango’s on this island it is infeasible to gather all of them to learn about the data generating process. This is the large world - to large and complex to practically quantify. But we still need to decide whether to stay on this island or not, so how do we do it?
References
[1] - Hoff, Peter D. A first course in Bayesian statistical methods. Vol. 580. New York: Springer, 2009.
[2] - McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press, 2020.
[3] - Wasserman, Larry. All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.
[4] - Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.