Upper-bound of the tail of a weighted sum of iid random variables

Question

I have a question related to this one. $X_i$ are n iid random variables with CDF $1_{[0,+\infty[}(x) \Phi(x)$, i.e. it is a mixture between a folded Gaussian and a delta in $0$, both with weight $1/2$.

I have a vector $a$ with $\parallel a \parallel_ 2 = 1$, and I want to find a function $F : \mathbb{R} \rightarrow [0,1]$ so that $\forall a$ with $\parallel a \parallel_ 2 = 1$, $\forall t \in \mathbb{R}$, $p(\sum_{i=1}^n a_i X_i \leq t) \geq F(t)$, i.e $F$ is the CDF of an upper bound of $\sum_{i=1}^n a_i X_i$ for the stochastic order.

For example, with $Y_i$ iid random variable distributed as a folded Gaussian, using the fact that $p(X_i\leq t)\geq p(Y_i\leq t)$ and the theorem 2 from Yu (2011), we get by conditioning on the numbers of $0$ in $(X_i)$ that

$p(\sum_{i=1}^n a_i X_i \leq t) \geq \left( \frac{1}{2} \right)^n + \sum_{k=1}^n \left( \frac{1}{2} \right)^n C_n^k p(\sum_{i=1}^k \frac{1}{\sqrt{k}} Y_i \leq t)$

Can we find a tighter bound, especially in the tail of the distribution of $\sum_{i=1}^n a_i X_i$ ?

Yu, Y. (2011). Some stochastic inequalities for weighted sums. Bernoulli, 17(3). https://arxiv.org/abs/0910.0544

When conditioning, you lose the condition $\|a\|_2=1$. So, I think you should re-examine your lower bound. Also, you don't seem to be actually using the stochastic domination of $X_i$ by $Y_i$. — Iosif Pinelis
– Iosif Pinelis, Commented Feb 22, 2024 at 21:42
Thanks @IosifPinelis for your comment. You are right, I do not use the stochastic domination, only the fact that $X_i$ has same distribution than $Y_i$ when conditioned to be > 0. It was for another inequality that I did not write here. However, I do not understand your first comment. I see that when conditioning I loose the condition $||a||_2 = 1$ but I still have $||a||_2 \le 1$ (understood as the $a_i$ where $X_i$ is non-zero) and so the inequality holds. Do you think that the inequality is not correct, or do you mean that I can use a tighter bound than $||a||_2 \le 1$ for the inequality? — odile
– odile, Commented Feb 23, 2024 at 14:11
$\|a\|_2\le1$ is not good enough: Think e.g. of the case $a=0$. — Iosif Pinelis
– Iosif Pinelis, Commented Feb 23, 2024 at 15:00
I am not sure what you mean by "in the tail of the distribution of $\sum a_i X_i$. Do you mean $t$ small? Your inequality seems to me correct (due to monotonicity of the event in }a}) but there is room for improvement in some asymptotics, precisely by considering the norm of }a} after the conditioning. For that, one would need to know the asymptotics you care about. — ofer zeitouni
– ofer zeitouni, Commented Feb 26, 2024 at 6:24
@oferzeitouni , what I mean is that I am interested in $p(\sum_{i=1}^n a_i X_i \le t)$ close to $1$ (in practice I consider t around 5 to 7). I don't understand how to write a better bound for a after conditioning since I am looking for a bound for all a. I thought about ordering the $a_i$'s, and then ||a|| < sqrt((n-l)/n) (where l is the number of consecutive $X_i=0$ for the largest $a_i$'s) but I don't get a significant numerical improvement. Do you see something better ? — odile
– odile, Commented Feb 26, 2024 at 9:06

ofer zeitouni · Accepted Answer · 2024-02-28 21:07:34Z

This is a partial answer, in the regime that the probability in question is close to $1$. I normalize so that $EY_i=1$. The example $a_i=1/\sqrt{n}$ shows that you need to take $t\geq \sqrt{n}/2+O(1)$ in order to have the probability near $1$). I hope that I got the constants right.

Note that the threshold obtained by the OP solution (normalized so that $E|Y_i|=1$) is $\sqrt{n/2}(1+o(1))$, which is a loss of a factor $\sqrt{2}$ compared to the above example.

Let us first consider the case where all $a_i<\epsilon$. Wlog, we can assume the $a_i$s are nonegative. Let $B=\{i:X_i>0\}$. Then, the expectation of $W:=\sum_{i\in B} a_i^2$ is $1/2$, and the variance is $\sum_{i\in B} a_i^4 /4 \leq \epsilon^2 /4$. In particular, the norm of the surviving indices in this case is with high probability about $(1+o_\epsilon(1))/\sqrt{2}$. So the threshold for probability near $1$ has improved by a factor of $\sqrt{2}$, i.e. we obtain the right constant.

Now, for the case some indices are $>\epsilon$, the situation is better- just consider the residual norm, and get an even better lower bound. I hope this is clear enough.

Thank you very much for your answer. From what I understand, I can make some statistics on the $\lVert a \rVert_B$ (the norm of the surviving indices) in order to refine the bound $\lVert a \rVert_B$ < 1 in my formula. If the cardinal of B is k, then I have mean($\lVert a \rVert_B$) = $\sqrt{k/n}$ and var($\lVert a \rVert_B$) $\le$ $k(n-k)/n^2$ if I am not mistaken. But I need to have some bound on the proportion of $\lVert a \rVert_B$ lower than say $\epsilon$. How do I get to that? — odile
– odile, Commented Feb 29, 2024 at 12:53
You don't need that, since the entries of large a_i contribute order 1, while the t you care about is proportional to sqrt{n}; also since you care only for probability close to 1, you can a-priori only consider $k=n/2+O(\sqrt{n})$. — ofer zeitouni
– ofer zeitouni, Commented Feb 29, 2024 at 19:13

Stack Exchange Network

Upper-bound of the tail of a weighted sum of iid random variables

1 Answer 1

You must log in to answer this question.

Linked

Upper-bound of the tail of a weighted sum of iid random variables

1 Answer 1

You must log in to answer this question.

Linked

Related