Skip to main content
deleted 32 characters in body
Source Link

I think what you want is the alternating direction method of multipliers (ADMM), which is closely related to thea type of proximal algorithms you've alludedmethod. The formulas you've written down are really close to ADMM. Rather than try to minimize the difficult objective $f(x) = g(x) + h(x)$ directly, we instead introduce an auxiliary variable $y$ and minimize

$$f(x, y) = g(x) + h(y)$$

subject to the constraint that $x = y$. Now, there are several ways we might enforce this constraint; to do so, we'll use the augmented Lagrangian

$$\mathscr L_\rho(x, y, \lambda) = g(x) + h(y) - \langle \lambda, x - y\rangle + \frac{\rho}{2}\|x - y\|^2.$$

The vector $\lambda$ is a Lagrange multiplier and $\rho$ is a scalar penalty factor. We can complete the square to write this in a slightly more revealing form:

$$\ldots = g(x) + h(y) + \frac{\rho}{2}\left\|x - y - \rho^{-1}\lambda\right\|^2 - \frac{1}{2\rho}\|\lambda\|^2.$$

At this point we can see that finding a minimizer of the augmented Lagrangian just with respect to $x$ or $y$ separately amounts to evaluating a proximal operator of $g$ or $h$ with a shifted argument. You've already assumed that both of these proximal operators are computable.

Finally, there's the question of what to do about the Lagrange multipliers $\lambda$. For that you can use a gradient ascent step at every iteration. I found the fact that this works without any line search to be deeply mysterious and very few references dig into this detail at all. The book Constrained Optimization and Lagrange Multiplier Methods by Dmitri Bertsekas does a good job explaining why with some nice theory behind it and the author makes the book available for free on his website.

This is all described in further detail in section 4.4.1 of Proximal Algorithms by Parikh and Boyd. I also wrote a bit about using ADMM to solve inverse coefficient problems for PDE with total variation regularization here which, although distinct from the lasso problem you've described, still has some of the same $\ell^1$ kind of character to it.

I think what you want is the alternating direction method of multipliers (ADMM), which is closely related to the proximal algorithms you've alluded. The formulas you've written down are really close to ADMM. Rather than try to minimize the difficult objective $f(x) = g(x) + h(x)$ directly, we instead introduce an auxiliary variable $y$ and minimize

$$f(x, y) = g(x) + h(y)$$

subject to the constraint that $x = y$. Now, there are several ways we might enforce this constraint; to do so, we'll use the augmented Lagrangian

$$\mathscr L_\rho(x, y, \lambda) = g(x) + h(y) - \langle \lambda, x - y\rangle + \frac{\rho}{2}\|x - y\|^2.$$

The vector $\lambda$ is a Lagrange multiplier and $\rho$ is a scalar penalty factor. We can complete the square to write this in a slightly more revealing form:

$$\ldots = g(x) + h(y) + \frac{\rho}{2}\left\|x - y - \rho^{-1}\lambda\right\|^2 - \frac{1}{2\rho}\|\lambda\|^2.$$

At this point we can see that finding a minimizer of the augmented Lagrangian just with respect to $x$ or $y$ separately amounts to evaluating a proximal operator of $g$ or $h$ with a shifted argument. You've already assumed that both of these proximal operators are computable.

Finally, there's the question of what to do about the Lagrange multipliers $\lambda$. For that you can use a gradient ascent step at every iteration. I found the fact that this works without any line search to be deeply mysterious and very few references dig into this detail at all. The book Constrained Optimization and Lagrange Multiplier Methods by Dmitri Bertsekas does a good job explaining why with some nice theory behind it and the author makes the book available for free on his website.

This is all described in further detail in section 4.4.1 of Proximal Algorithms by Parikh and Boyd. I also wrote a bit about using ADMM to solve inverse coefficient problems for PDE with total variation regularization here which, although distinct from the lasso problem you've described, still has some of the same $\ell^1$ kind of character to it.

I think what you want is the alternating direction method of multipliers (ADMM), which is a type of proximal method. The formulas you've written down are really close to ADMM. Rather than try to minimize the difficult objective $f(x) = g(x) + h(x)$ directly, we instead introduce an auxiliary variable $y$ and minimize

$$f(x, y) = g(x) + h(y)$$

subject to the constraint that $x = y$. Now, there are several ways we might enforce this constraint; to do so, we'll use the augmented Lagrangian

$$\mathscr L_\rho(x, y, \lambda) = g(x) + h(y) - \langle \lambda, x - y\rangle + \frac{\rho}{2}\|x - y\|^2.$$

The vector $\lambda$ is a Lagrange multiplier and $\rho$ is a scalar penalty factor. We can complete the square to write this in a slightly more revealing form:

$$\ldots = g(x) + h(y) + \frac{\rho}{2}\left\|x - y - \rho^{-1}\lambda\right\|^2 - \frac{1}{2\rho}\|\lambda\|^2.$$

At this point we can see that finding a minimizer of the augmented Lagrangian just with respect to $x$ or $y$ separately amounts to evaluating a proximal operator of $g$ or $h$ with a shifted argument. You've already assumed that both of these proximal operators are computable.

Finally, there's the question of what to do about the Lagrange multipliers $\lambda$. For that you can use a gradient ascent step at every iteration. I found the fact that this works without any line search to be deeply mysterious and very few references dig into this detail at all. The book Constrained Optimization and Lagrange Multiplier Methods by Dmitri Bertsekas does a good job explaining why with some nice theory behind it and the author makes the book available for free on his website.

This is all described in further detail in section 4.4.1 of Proximal Algorithms by Parikh and Boyd. I also wrote a bit about using ADMM to solve inverse coefficient problems for PDE with total variation regularization here which, although distinct from the lasso problem you've described, still has some of the same $\ell^1$ kind of character to it.

Source Link

I think what you want is the alternating direction method of multipliers (ADMM), which is closely related to the proximal algorithms you've alluded. The formulas you've written down are really close to ADMM. Rather than try to minimize the difficult objective $f(x) = g(x) + h(x)$ directly, we instead introduce an auxiliary variable $y$ and minimize

$$f(x, y) = g(x) + h(y)$$

subject to the constraint that $x = y$. Now, there are several ways we might enforce this constraint; to do so, we'll use the augmented Lagrangian

$$\mathscr L_\rho(x, y, \lambda) = g(x) + h(y) - \langle \lambda, x - y\rangle + \frac{\rho}{2}\|x - y\|^2.$$

The vector $\lambda$ is a Lagrange multiplier and $\rho$ is a scalar penalty factor. We can complete the square to write this in a slightly more revealing form:

$$\ldots = g(x) + h(y) + \frac{\rho}{2}\left\|x - y - \rho^{-1}\lambda\right\|^2 - \frac{1}{2\rho}\|\lambda\|^2.$$

At this point we can see that finding a minimizer of the augmented Lagrangian just with respect to $x$ or $y$ separately amounts to evaluating a proximal operator of $g$ or $h$ with a shifted argument. You've already assumed that both of these proximal operators are computable.

Finally, there's the question of what to do about the Lagrange multipliers $\lambda$. For that you can use a gradient ascent step at every iteration. I found the fact that this works without any line search to be deeply mysterious and very few references dig into this detail at all. The book Constrained Optimization and Lagrange Multiplier Methods by Dmitri Bertsekas does a good job explaining why with some nice theory behind it and the author makes the book available for free on his website.

This is all described in further detail in section 4.4.1 of Proximal Algorithms by Parikh and Boyd. I also wrote a bit about using ADMM to solve inverse coefficient problems for PDE with total variation regularization here which, although distinct from the lasso problem you've described, still has some of the same $\ell^1$ kind of character to it.