2
$\begingroup$

I am trying to develop an algorithm that is very similar to one that would find the best rank one approximation to a matrix $A\in\mathbb R^{m\times n}$, and this is very similar to how SVD works. I am running into instability when the condition number of the matrix is very high. The problem I am trying to solve is the following : \begin{align*} \min_{\substack{u,v\\\forall i,u_i\geq 0}} \| A-uv^T \|_F^2 \end{align*} This problem is quadratic in $u$ for fixed $v$ and quadratic on $v$ for fixed $u$ and I chose to therefore alternate between minimization of $u$ and $v$. It is not too hard to see that minimization over $v$ given $u$ yields $v=\frac{1}{\|u\|^2}A^T u$ and in order to do $u$ for given $v$ we can set the Lagragian to $0$ and express the KKT conditions (which are both necessary and sufficient), $\mu$ is the vector of Lagrange multipliers and $0\preccurlyeq u$ means $\forall i, 0\leq u_i$ : \begin{cases} 2 \| v\|^2 u - \mu = 2 A v\\ 0\preccurlyeq u\\ 0\preccurlyeq \mu\\ \mu^T u = 0 \end{cases}

The last three conditions together imply that if for some $i$, $u_i>0$ then $\mu_i=0$, so in the first equation, the positive values of $2Av$ are represented by $2\| v\|^2 u$ and the negative values of $2Av$ are represented by $-\mu$, this means that our solution is $u=\left[ \frac{1}{\|v\|^2} Av \right]_+$ with $[\cdot]_+:\mathbb R^m\to\mathbb R^m$ the function that associate a vector to a vector with identical positive elements and negative elements cropped to $0$. In practice we can merge the two operation by running $u_t = [AA^T u_{t-1}]_+$ and normalize, then find $v$ at the end.


During experimentation it seems to be the case that this behaves badly when the condition number of $A$ is high, and this is to be expected since similar algorithms for SVD suffer from the same problem. Mainly SVD can work by solving iteratively \begin{align*} \min_{u,v} \| A-uv^T \|_F^2 \end{align*} Here minimzation of $v$ for fixed $u$ is the same and minimization of $u$ for fixed $v$ is $\frac{1}{\| v\|^2} A v$.

I could not find much information about how they handle the cases where there are bad condition number or check the convergence of the algorithm. I do remember something like this, instead of doing this for $AA^T$, we do it $B=AA^T+\varepsilon I$, indeed the eigen vector are the same and the eigen values are increased by $\varepsilon$ which will reduce the condition number when it is high and therefore might decrease the ill conditioning problem. Is this a way to prevent problems for SVD algorithm ? Also if this is a nice way of fixing the problem, I am not sure how I can try to apply this to my original problem.

$\endgroup$

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.