I'm reading a paper, Learning in High Dimension Always Amounts to Extrapolation, that provides a result I don't understand.
It provides this theorem which I do understand:
Theorem 1: (Bárány and Füredi (1988)). Given a $d$-dimensional dataset $\mathbf{X} \triangleq \{x_1, \ldots, x_N\}$ with i.i.d. samples uniformly drawn from a hyperball, the probability that a new sample $\mathbf{x}$ is in interpolation regime (i.e. within the convex hull of the data) has the following asymptotic behavior \begin{eqnarray} \lim_{d \to \infty} > p(\mathbf{x} \in \text{Hull}(\mathbf{X})) = \begin{cases} 1 & \iff N > > d^{-1}2^{d/2} \\ 0 & \iff N < d^{-1}2^{d/2} \end{cases} \end{eqnarray}
It then claims the following, and I don't understand where the "As such" follows from:
What matters however, is the dimension $d^∗$ of the smallest affine subspace that includes all the data manifold, or equivalently, the dimension of the convex hull of the data. As such, in the presence of a nonlinear manifold, we can see that the exponential requirement from Thm. 1 in the number of samples required to preserve a constant probability to be in interpolation grows exponentially with $d^∗$.
Thrm 1 concerns a uniform sampling from a hyperball, if there is heavy clustering of where the points in the actual data lies in the manifold I don't understand how theorem 1 applies. Am I missing some crucial information here about high dimensional data? Particularly, what if large amounts, but not all, of the data can be described in even lower dimensional spaces?*
(e.g. positional data for people would fall almost perfectly along a 2-d manifold given by lat/lon, except for the small amount of people who live up on the space station, making the manifold largely 2-d but 3-d in practice)