1
$\begingroup$

I'm reading a paper, Learning in High Dimension Always Amounts to Extrapolation, that provides a result I don't understand.

It provides this theorem which I do understand:

Theorem 1: (Bárány and Füredi (1988)). Given a $d$-dimensional dataset $\mathbf{X} \triangleq \{x_1, \ldots, x_N\}$ with i.i.d. samples uniformly drawn from a hyperball, the probability that a new sample $\mathbf{x}$ is in interpolation regime (i.e. within the convex hull of the data) has the following asymptotic behavior \begin{eqnarray} \lim_{d \to \infty} > p(\mathbf{x} \in \text{Hull}(\mathbf{X})) = \begin{cases} 1 & \iff N > > d^{-1}2^{d/2} \\ 0 & \iff N < d^{-1}2^{d/2} \end{cases} \end{eqnarray}

It then claims the following, and I don't understand where the "As such" follows from:

What matters however, is the dimension $d^∗$ of the smallest affine subspace that includes all the data manifold, or equivalently, the dimension of the convex hull of the data. As such, in the presence of a nonlinear manifold, we can see that the exponential requirement from Thm. 1 in the number of samples required to preserve a constant probability to be in interpolation grows exponentially with $d^∗$.

Thrm 1 concerns a uniform sampling from a hyperball, if there is heavy clustering of where the points in the actual data lies in the manifold I don't understand how theorem 1 applies. Am I missing some crucial information here about high dimensional data? Particularly, what if large amounts, but not all, of the data can be described in even lower dimensional spaces?*

(e.g. positional data for people would fall almost perfectly along a 2-d manifold given by lat/lon, except for the small amount of people who live up on the space station, making the manifold largely 2-d but 3-d in practice)

$\endgroup$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.