1
$\begingroup$

Let $P$ be a probability distribution on $\mathbb R^d \times \mathbb R$, and let $(x_1,y_1), \ldots, (x_n,y_n)$ be an iid sample of size $n$ from $P$. Fix $\epsilon,t\gt 0$. For any unit-vector $w \in \mathbb R^n$ and $i \in \{1,2,\ldots,n\}$, define $$ z_i(w) := 1[|y_ix_i^\top w - \epsilon| \ge t] = \begin{cases} 1,&\mbox{ if }|y_i x_i^\top w - \epsilon| \ge t,\\ 0,&\mbox{ else.} \end{cases} $$ Finally, define $\Delta_{n,\epsilon,t}(P) := \sup_{\|w\|=1} |\frac{1}{n}\sum_{i=1}^n z_i(w) - z(w)|$, where $z(w) = \mathbb P(|y_1x_1^\top w - \epsilon| \ge t)$ is the expected value of $z_i(w)$.

Using VC theory, one can show that $\Delta_{n,\epsilon,t}(P) \le C\sqrt{d/n}$ (ignoring log-factors) w.p $1-o(1)$ in the limit $n \to \infty$. This is because the function class $\{(x,y) \mapsto 1[|yx^\top w - 1| \ge t] \mid w \in \mathbb R^d\text{ with }\|w\|=1\}$ has VC-dimension $O(d)$.

Question. What kinds of assumptions on the distribution $P$ can allow the $\sqrt{d/n}$ rate to be improved to something like $Cd/n$ or even $C/n$ where the constant $C$ is now allowed to depend on $\epsilon$ and $t$ ? For example, what if $P$ is given by $x \mid y \sim N(\mu_y,\Sigma_y)$ (i.e the marginal distribution of $x$ is a Gaussian mixture), and $\mathbb P(y=1) = \pi \in (0,1)$, where $\mu_{\pm 1} \in \mathbb R^d$ and $\Sigma_{\pm 1}$ are positive-definite $d \times d$ matrices ?

$\endgroup$
2

2 Answers 2

4
$\begingroup$

It is not possible to get $|\frac1n\sum_i z_i(w) - E[z_i(w)]|=O_P(r_n)$ with $r_n \lll n^{-1/2}$ even for a single $w$, since it would contradict the CLT whenever Var$[z_i(w)]>0$.

$\endgroup$
1
  • $\begingroup$ ok indeed, that makes sense. thanks ! $\endgroup$ Commented Oct 23, 2023 at 15:12
1
$\begingroup$

If you pick $P$ to be a Dirac distribution on any pair $(x,y) \in \mathbb{R}^{d+1}$, then you trivially have $\Delta_{n, \epsilon, t}(P)=0$. This is consistent with the previous answer because $\text{var}(z_i(w)) =0$ in this case. However, I assume that you are asking this question for a non-trivial distribution $P$. In this case, the bound of $\sqrt{d/n}$ probably cannot be improved.

On the other hand, if you are okay with the upperbound on excess risk of a particular estimator instead of the uniform convergence bound, then $d/n$ rate is possible for all realizable distributions.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.