I am trying to develop an algorithm that is very similar to one that would find the best rank one approximation to a matrix $A\in\mathbb R^{m\times n}$, and this is very similar to how SVD works. I am running into instability when the condition number of the matrix is very high. The problem I am trying to solve is the following : \begin{align*} \min_{\substack{u,v\\\forall i,u_i\geq 0}} \| A-uv^T \|_F^2 \end{align*} This problem is quadratic in $u$ for fixed $v$ and quadratic on $v$ for fixed $u$ and I chose to therefore alternate between minimization of $u$ and $v$. It is not too hard to see that minimization over $v$ given $u$ yields $v=\frac{1}{\|u\|^2}A^T u$ and in order to do $u$ for given $v$ we can set the Lagragian to $0$ and express the KKT conditions (which are both necessary and sufficient), $\mu$ is the vector of Lagrange multipliers and $0\preccurlyeq u$ means $\forall i, 0\leq u_i$ : \begin{cases} 2 \| v\|^2 u - \mu = 2 A v\\ 0\preccurlyeq u\\ 0\preccurlyeq \mu\\ \mu^T u = 0 \end{cases}
The last three conditions together imply that if for some $i$, $u_i>0$ then $\mu_i=0$, so in the first equation, the positive values of $2Av$ are represented by $2\| v\|^2 u$ and the negative values of $2Av$ are represented by $-\mu$, this means that our solution is $u=\left[ \frac{1}{\|v\|^2} Av \right]_+$ with $[\cdot]_+:\mathbb R^m\to\mathbb R^m$ the function that associate a vector to a vector with identical positive elements and negative elements cropped to $0$. In practice we can merge the two operation by running $u_t = [AA^T u_{t-1}]_+$ and normalize, then find $v$ at the end.
During experimentation it seems to be the case that this behaves badly when the condition number of $A$ is high, and this is to be expected since similar algorithms for SVD suffer from the same problem. Mainly SVD can work by solving iteratively \begin{align*} \min_{u,v} \| A-uv^T \|_F^2 \end{align*} Here minimzation of $v$ for fixed $u$ is the same and minimization of $u$ for fixed $v$ is $\frac{1}{\| v\|^2} A v$.
I could not find much information about how they handle the cases where there are bad condition number or check the convergence of the algorithm. I do remember something like this, instead of doing this for $AA^T$, we do it $B=AA^T+\varepsilon I$, indeed the eigen vector are the same and the eigen values are increased by $\varepsilon$ which will reduce the condition number when it is high and therefore might decrease the ill conditioning problem. Is this a way to prevent problems for SVD algorithm ? Also if this is a nice way of fixing the problem, I am not sure how I can try to apply this to my original problem.