In the setting of this problem, $\eta(\vec{x})$ is $P(Y=1|\vec{X}=\vec{x})$, $Y \in \{0,1\}$, and $X \in R^d$. Being the true probability know, the classification rule is simply $\eta(\vec{x})>0.5 \Rightarrow \hat{Y}=1$. The risk of misclassification is, as usual, $E[\min(\eta(\vec{x}),1-\eta(\vec{x}))]$. However, I am asked to prove that if the Xs are iid conditional on $Y$, with $X_i|Y \sim p(X|Y) \;\forall X_i$, then $$R^*_d \leqslant e^{-cd}$$ I have tried anything I could think of but I achieved very little. I see the intuition (if the features are all conditionally independent, increasing them adds new information to the estimation and allows for a lower minimal error), but I have no idea of how to prove it formally. Big thanks in advance to anybody who is able to crack it!
1 Answer
$\newcommand\x{{\vec{x}}}\newcommand\X{\vec{X}}\newcommand\R{\mathbb R}$Looking at $P(Y=1|\X=\x)$, I will assume that $\X=(X_1,\dots,X_d)$ is a discrete random point in $\R^d$. Then, by the Bayes formula, \begin{equation*} \eta(\x)=P(Y=1|\X=\x) =\frac{P(\X=\x|Y=1)P(Y=1)}{P(\X=\x|Y=1)P(Y=1)+P(\X=\x|Y=0)P(Y=0)}. \end{equation*} For $j=0,1$ and $\x=(x_1,\dots,x_d)\in\R^d$, let $q_j:=P(Y=j)$ and $p_j(\x):=P(\X=\x|Y=j)$. Then \begin{equation*} \eta(\x) =\frac{q_1p_1(\x)}{q_1p_1(\x)+q_0p_0(\x)}. \end{equation*} We are using the conventions that $\frac b0:=\infty$ for any $b\in[0,\infty]$ and $0\cdot\text{anything}:=0$.
Hence, \begin{align*} R^*_d&=E\min(\eta(\X),1-\eta(\X)) \\ &=P(\eta(\X)>\tfrac12,Y=0)+P(\eta(\X)\le\tfrac12,Y=1) \\ &=q_0 P(\eta(\X)>\tfrac12|Y=0)+q_1 P(\eta(\X)\le\tfrac12|Y=1) \\ &=q_0 \sum_\x p_0(\x)\,1(\eta(\x)>\tfrac12)+q_1 \sum_\x p_1(\x)\,1(\eta(\x)\le\tfrac12) \\ &=q_0 \sum_\x p_0(\x)\,1\Big(\frac{q_1p_1(\x)}{q_0p_0(\x)}>1\Big) +q_1 \sum_\x p_1(\x)\,1\Big(\frac{q_1p_1(\x)}{q_0p_0(\x)}\le1\Big) \\ &\le q_0 \sum_\x p_0(\x)\,\sqrt{\frac{q_1p_1(\x)}{q_0p_0(\x)}} +q_1 \sum_\x p_1(\x)\,\sqrt{\frac{q_0p_0(\x)}{q_1p_1(\x)}} \\ &=2\sqrt{q_0q_1}\, \sum_\x \sqrt{p_1(\x)p_0(\x)}. \tag{1} \end{align*} Next, \begin{equation*} 2\sqrt{q_0q_1}\le1, \tag{2} \end{equation*} since $q_0=1-q_1\in[0,1]$. Recall now that the $X_i$'s are iid given $Y$, so that
\begin{equation*} p_j(\x):=P(\X=\x|Y=j)=\prod_{i=1}^n p_j(x_i) \end{equation*} for $j=0,1$ and $\x=(x_1,\dots,x_d)\in\R^d$, where $p_j(x):=P(X_1=x|Y=j)$. So, \begin{equation*} \sum_\x \sqrt{p_1(\x)p_0(\x)}=a^n, \tag{3} \end{equation*} where \begin{equation*} a:=\sum_x \sqrt{p_1(x)p_0(x)}\in[0,1], \end{equation*} by the Cauchy--Scwarz inequality. Moreover, $a<1$ unless $p_1=p_0$.
Thus, collecting (1)--(3), we see that, whenever $p_1\ne p_0$, we have
\begin{equation} R^*_d\le e^{-cn} \end{equation} for some real $c>0$ and all natural $n$, as desired.