In Machine Learning, given some features $X$, we aim to predict a target variable $Y$. If we consider the squared loss function, the optimal predictor is the Bayes predictor: $\mathbb{E}[Y \mid X].$ If we know the joint distribution $P(X, Y)$, then we can directly compute $\mathbb{E}[Y \mid X]$.
Thus, under squared error loss, Machine Learning can be viewed as approximating $P(X, Y)$ as accurately as possible from a given set of observations $(X_i, Y_i)$.
Now, in the hypothetical case where the observed samples are i.i.d., we can apply results from doi.org/10.1016/j.spl.2021.109088 ("On the tight constant in the multivariate Dvoretzky–Kiefer–Wolfowitz inequality"). This paper suggests that, given enough samples, the empirical probability distribution converges rapidly to the true probability distribution, even in high dimensions.
If this is the case doesn't the DKW inequality imply that, in the hypothetical i.i.d. setting, Machine Learning is trivially solved? That is, since the empirical distribution quickly approximates the true one, we should be able to compute the Bayes predictor directly, which is optimal in this scenario.
Given this, why is Machine Learning so focused on developing new models (e.g., parametric models, decision trees, XGBoost, neural networks) to estimate $\mathbb{E}[Y \mid X]$? If the above argument holds, can't we simply approximate the true distribution empirically?
I suspect the reason this reasoning does not directly apply in practice is that real-world data is dynamic over time:
- The distribution $P(X, Y)$ may change.
- The samples are not truly i.i.d.
- The process $Y$ may depend on its own past values (e.g., stock returns)
However, these issues are not necessarily solved by inventing new models such as neural networks. These models are simply function classes designed to approximate $\mathbb{E}[Y \mid X]$ more effectively.
Thus, my main question is: Am I correct in my analysis? And if that's the case, why is Machine Learning focused on developing new models rather than solving these fundamental issues?