2
$\begingroup$

In the setting of this problem, $\eta(\vec{x})$ is $P(Y=1|\vec{X}=\vec{x})$, $Y \in \{0,1\}$, and $X \in R^d$. Being the true probability know, the classification rule is simply $\eta(\vec{x})>0.5 \Rightarrow \hat{Y}=1$. The risk of misclassification is, as usual, $E[\min(\eta(\vec{x}),1-\eta(\vec{x}))]$. However, I am asked to prove that if the Xs are iid conditional on $Y$, with $X_i|Y \sim p(X|Y) \;\forall X_i$, then $$R^*_d \leqslant e^{-cd}$$ I have tried anything I could think of but I achieved very little. I see the intuition (if the features are all conditionally independent, increasing them adds new information to the estimation and allows for a lower minimal error), but I have no idea of how to prove it formally. Big thanks in advance to anybody who is able to crack it!

$\endgroup$

1 Answer 1

2
$\begingroup$

$\newcommand\x{{\vec{x}}}\newcommand\X{\vec{X}}\newcommand\R{\mathbb R}$Looking at $P(Y=1|\X=\x)$, I will assume that $\X=(X_1,\dots,X_d)$ is a discrete random point in $\R^d$. Then, by the Bayes formula, \begin{equation*} \eta(\x)=P(Y=1|\X=\x) =\frac{P(\X=\x|Y=1)P(Y=1)}{P(\X=\x|Y=1)P(Y=1)+P(\X=\x|Y=0)P(Y=0)}. \end{equation*} For $j=0,1$ and $\x=(x_1,\dots,x_d)\in\R^d$, let $q_j:=P(Y=j)$ and $p_j(\x):=P(\X=\x|Y=j)$. Then \begin{equation*} \eta(\x) =\frac{q_1p_1(\x)}{q_1p_1(\x)+q_0p_0(\x)}. \end{equation*} We are using the conventions that $\frac b0:=\infty$ for any $b\in[0,\infty]$ and $0\cdot\text{anything}:=0$.

Hence, \begin{align*} R^*_d&=E\min(\eta(\X),1-\eta(\X)) \\ &=P(\eta(\X)>\tfrac12,Y=0)+P(\eta(\X)\le\tfrac12,Y=1) \\ &=q_0 P(\eta(\X)>\tfrac12|Y=0)+q_1 P(\eta(\X)\le\tfrac12|Y=1) \\ &=q_0 \sum_\x p_0(\x)\,1(\eta(\x)>\tfrac12)+q_1 \sum_\x p_1(\x)\,1(\eta(\x)\le\tfrac12) \\ &=q_0 \sum_\x p_0(\x)\,1\Big(\frac{q_1p_1(\x)}{q_0p_0(\x)}>1\Big) +q_1 \sum_\x p_1(\x)\,1\Big(\frac{q_1p_1(\x)}{q_0p_0(\x)}\le1\Big) \\ &\le q_0 \sum_\x p_0(\x)\,\sqrt{\frac{q_1p_1(\x)}{q_0p_0(\x)}} +q_1 \sum_\x p_1(\x)\,\sqrt{\frac{q_0p_0(\x)}{q_1p_1(\x)}} \\ &=2\sqrt{q_0q_1}\, \sum_\x \sqrt{p_1(\x)p_0(\x)}. \tag{1} \end{align*} Next, \begin{equation*} 2\sqrt{q_0q_1}\le1, \tag{2} \end{equation*} since $q_0=1-q_1\in[0,1]$. Recall now that the $X_i$'s are iid given $Y$, so that
\begin{equation*} p_j(\x):=P(\X=\x|Y=j)=\prod_{i=1}^n p_j(x_i) \end{equation*} for $j=0,1$ and $\x=(x_1,\dots,x_d)\in\R^d$, where $p_j(x):=P(X_1=x|Y=j)$. So, \begin{equation*} \sum_\x \sqrt{p_1(\x)p_0(\x)}=a^n, \tag{3} \end{equation*} where \begin{equation*} a:=\sum_x \sqrt{p_1(x)p_0(x)}\in[0,1], \end{equation*} by the Cauchy--Scwarz inequality. Moreover, $a<1$ unless $p_1=p_0$.

Thus, collecting (1)--(3), we see that, whenever $p_1\ne p_0$, we have
\begin{equation} R^*_d\le e^{-cn} \end{equation} for some real $c>0$ and all natural $n$, as desired.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.