Suppose I have a continuous random variable whose distribution $f$ is some parametric form (normal, exponential, etc.) that is known to me. If I draw many independent samples $x_i$ from $f$, I can estimate the parameters of $f$ using these samples (using maximum likelihood estimation, for example), which gives me an "estimator distribution" $\hat{f}$ (which has the same parametric form as $f$). If I perturb my samples by small random amounts $\epsilon_i$, I would get another "estimator distribution" $\tilde f$. My question is: are there any sufficient conditions on $f$ and the estimator that I use, such that I can bound the difference between $\hat{f}$ and $\tilde{f}$ (measured in, say, the $L_1$ sense), as a function of the perturbations $\epsilon_i$? Are there any general theorems of this kind that talk about how sensitive an estimator distribution is to perturbations of the samples?
-  1$\begingroup$ are you familiar with robust statistics ? It uses the same setup, i.e., a contaminated sample, but it's focus is on stats that behave well even under contamination, not the performance of mle or other stats under contamination. $\endgroup$user83457– user834572016-10-06 06:50:48 +00:00Commented Oct 6, 2016 at 6:50
-  $\begingroup$ You may have a better chance if you measure the difference between $\tilde{f}$ and $\hat{f}$ in the Komogorov-Smyrnoff sense. $\endgroup$Mark Fischler– Mark Fischler2016-10-06 14:49:50 +00:00Commented Oct 6, 2016 at 14:49
1 Answer
Robust statistics methodology by the likes of researchers like Rousseeuw, Tukey, Huber use such terminology in describing how much a sample can be perturbed by aberrant observations and outliers before affecting an estimator. They have developed some estimators robust to withstand different amounts of bad or overly influential data. One use of such estimators is to run a data sample using such to be able to identify, isolate, and remove such corrupted data from a data set and then rerun the cleaned up sample using traditional estimators. Such techniques are useful in multivariate regression analysis because outliers and such get normalized and carefully hidden biasing least squares estimates.
-  1$\begingroup$ Could you clarify how these rather generic statements apply to the particular question "are there any sufficient conditions on $f$ and the estimator that I use, such that ..." $\endgroup$Yemon Choi– Yemon Choi2024-03-04 07:17:25 +00:00Commented Mar 4, 2024 at 7:17
-  $\begingroup$ @YemonChoi: Guess what? I have flagged this post as NaN, but "a moderator reviewed your flag, but found no evidence to support it", so it got declined. With 3 undeserved upvotes (!), it will be very difficult now to delete this non-answer... And a message to future readers: when you review answers, always remember to check them against the questions that they claim to answer. In this particular case, there is almost no connection between the question and the answer. $\endgroup$Alex M.– Alex M.2024-03-04 11:10:57 +00:00Commented Mar 4, 2024 at 11:10
-  1$\begingroup$ For future reviewers, the question is: "are there any sufficient conditions on $f$ and the estimator that I use, such that I can bound the difference between $\hat{f}$ and $\tilde{f}$ (measured in, say, the $L_1$ sense), as a function of the perturbations $\epsilon_i$? Are there any general theorems of this kind that talk about how sensitive an estimator distribution is to perturbations of the samples?" $\endgroup$Alex M.– Alex M.2024-03-04 11:13:11 +00:00Commented Mar 4, 2024 at 11:13
-  $\begingroup$ @AlexM. While I had similar impressions I thought I would give the writer of the answer the chance to explain themselves. $\endgroup$Yemon Choi– Yemon Choi2024-03-04 21:27:34 +00:00Commented Mar 4, 2024 at 21:27
-  $\begingroup$ @YemonChoi: If it were not for the clumsy formulation "by the likes of researchers like", this answer is so generic that I would believe it to be AI-generated. Sadly, with a score of 2, it is impossible to flag it as VLQ anymore, and I have exhausted my NaN flag. Maybe you could flag it as NaN yourself, assuming that you have not exhausted that option, like me? $\endgroup$Alex M.– Alex M.2024-03-05 07:43:34 +00:00Commented Mar 5, 2024 at 7:43