Background
Say we have an optimization problem $$\min_x f(x) = g(x) + h(x)$$
where $g$ is differentiable and convex, and $h$ are convex but not necessarily differentiable. If $g$ is the mean squared error function, we can use proximal algorithms to solve optimizations of this type, using the proximal operator, given by $$\text{prox}(x) = \text{argmin}_z \frac{1}{2}\left\| x-z\right\|^2_2+h(z)$$
Problem
Suppose we know the proximal operator (so the solution to the optimization problem) for two problems of the same form such that $h_1$ and $h_2$ are convex but not necessarily differentiable:
- $\min_x \frac{1}{2}\left\| y-x\right\|^2 + h_1(x)$, so that we know $\text{prox}_{h_1}(x)$ for this problem.
- $\min_x \frac{1}{2}\left\| y-x\right\|^2 + h_2(x)$, so that we know $\text{prox}_{h_2}(x)$ for this problem.
If we now combine the two to give us a new optimization problem $$\min_x \frac{1}{2}\left\| y-x\right\|^2 +h_1(x)+ h_2(x)$$
My questions
- Can we work out a proximal operator $\text{prox}_{h_1, h_2}$ to solve this new problem using the two operators we already have?
- Do we require any assumptions on separability? Do we need the new penalty $h' = h_1 + h_2$ to be a separable function (i.e, are $h_1$ and $h_2$ separable to each other)? What happens if this is not the case?
Example
To give a more concrete example, we can look at the sparse-group lasso, given by $$\min_{\beta}\left( \frac{1}{2n}\left\|y-X\beta \right\| + (1-\alpha)\lambda\sum_{l=1}^m \sqrt{p_l}\left\|\beta^{(l)} \right\|_2 + \alpha \lambda \left\| \beta\right\|_1 \right),$$
where $h_1(\beta)=(1-\alpha)\lambda\sum_{l=1}^m \sqrt{p_l}\left\|\beta^{(l)} \right\|_2$ is the group lasso penalty and $h_2(\beta)=\alpha \lambda \left\| \beta\right\|_1$ is the lasso penalty. Therefore, could we use the proximal operators for the lasso and group lasso to derive a proximal operator for SGL? I understand that we can also derive the proximal operator directly for SGL, but I'm adding this as an example to make the question more concrete.
In this case, do we have separability between the lasso and group lasso penalties? Do we need it to help?