Functional derivative/Gateaux derivative of functions of probability densities

Question

Let $(\Omega, \mathcal{A})$ be a measure space and $\mathcal{P}$ be a convex set of probability distribution on that space. Furthermore let $\mu$ be a $\sigma$-finite measure that dominates every probability measure in $\mathcal{P}$. Call the set of all of these densities $\mathcal{Q}$. Let us now consider a real-values or extended-real valued function $f$ on the set $\mathcal{Q}$ (e.g., the differential entropy etc). Now I am interested in the Gateaux derivative of that function.

The general definition of Gateaux differentiability and derivative is: Let $X$ and $Y$ be banach spaces and $U\subseteq X$ and $F:U\rightarrow Y$. Then $F$ is Gateaux differentiable at $x\in U$ if $\delta F(x; h)$ defined by $$\delta F(x; h):=\lim_{t\rightarrow\infty}\frac{F(x+th)-F(x)}{t}=\frac{d}{dt}F(x+th)\biggr|_{t=0} =\langle F'(x), h\rangle$$ exists for all directions $h\in X$ and the map $h\mapsto \delta F(x; h)$ is linear and continuous. In that case we call $F'(x)$ the Gateaux derivative of $F$ in $x$.

So now coming back to my function $f$ on $\mathcal{Q}$ I have the following problems when I want to talk about Gateaux differentiablity of a specific such function $f$. So first of all, note that in this case the Banach space $X=L^1(\Omega, \mu)$ and $\mathcal{Q}\subset L^1$.

So now we would be able to get some $h\in L^1$ such that $q+th\in \mathcal{Q}$, but certainly not for all $h\in L^1$ (e.g., if $h\in \mathcal{Q}\subset L^1$ then, $q+th$ is not in $\mathcal{Q}$ anymore and thus $f(q+th)$ is not defined). The definition of the Gateaux derivative though requires the above formula hold for EVERY $h$. So it seems the Gateaux derivative cannot exist in this setting?

Is there any chance to even make this work somehow or is it impossible?

mlk · Accepted Answer · 2025-05-19 14:27:17Z

The general abstract answer is no, and you have correctly identified the main issue, namely that this linear structure can result in negative densities. However, depending on what you actually need this for, you might still be able to achieve some reasonable result. Here are three that come to mind:

1. First of all, some directional derivatives might already be enough for your application. E.g. if your density keeps away from zero, then $\delta f(q,h)$ might exist for all smooth enough $h$. Often you can prove that this is a bounded operator in some space and then argue with density.

2. If you have an explicit $f$, it might have an extension to negative densities, e.g. when it is of some integral form $f(q) = \int_\Omega g(q) d\mu$ where $g$ can be reasonably extended to negative values. Then you are just working on the affine space given by $\int q d\mu = 1$ and can try to find the Gateaux-derivative there.

Of course you never want to allow variation in the negative densities, so this turns it into an obstacle problem where the condition $q\geq 0$ gives you a Lagrange-multiplier. Note that this is a one-sided constraint, so what you get will be 1-homogeneous, but not linear.

3. Finally, the last question is if you want to take the straightforward derivative at all. In many applications, what you are actually after is some sort of transport derivative. I.e. the map $$ T: v \mapsto \frac{d}{dt}|_{t=0} f( (id+tv)_* q)$$ where $(\cdot)_*$ denotes the measure pushforward and $v$ nice enough (e.g. $C_c^\infty$, but the same closure trick as before might apply). This avoids the negativity issue and even works for non-absolutely continuous measures. This type of derivative is studied in the theory of optimal transport.

The thing is, in particular in the more applied direction, people see this form, and try to formally expand it using the chain rule. E.g. if also $\nabla \cdot v= 0$, you end up with something like $T(v) = \langle f'(q), v \cdot \nabla q\rangle$. Then three papers later in the mathematical game of telephone, the original form is completely forgotten and we ask ourselves how to define $f'$.

Thank you for your answer! So there are some things that I would like to ask you for better understanding: 1.) What would be your definition of directional derivative here? Is it just the same as Gateaux derivative but without requiring that the limit exists for all $h$? 2.) Regarding your second suggestion: If I look at, e.g., the entropy, then $g(q)=q \log (q)$ so this would not be "reasonably" extendable to negative values right? 3.) By the affine space given by $\int q d\mu =1$ do you mean the set that I call $\mathcal{Q}$? — guest1
– guest1, Commented May 20 at 7:52
4.) What would happen if - instead of defining my function $f$ on $\mathcal{Q}$ I define it on $L^1$? Then I could compute the derivatives. But I could then still for my purposes only actually pass arguments that are densities. Do you think I would run into problems like that or that there would be other disadvantages here? — guest1
– guest1, Commented May 20 at 7:52
@guest1 Ad 1) The directional derivative (in direction $h$) is just the expression you wrote for a fixed $h$. Even if it does not exist for all $h$, it might e.g. exist for a dense subset. 2) I would reasonably extend the integrand by 0 and just see what happens. 3) Your space Q as I understand it also has the condition $q \ge 0$ pointwise $\mu$ a.e. This is the main difficulty as Q is thus just a (convex) subset of that space. 4) If you define it on all of $L^1$, then you can find the derivative, but see my point about Lagrange multipliers. — mlk
– mlk, Commented May 20 at 9:41
Thanks for your answers! So why is it a problem that my $\mathcal{Q}$ is only a convex subset of the space $L^1$? What is better about the space proposed by you? That it is affine? And regarding to your answer to my suggestion of defining the functions on the entire space $L^1$: Why would I then still need to impose the boundary condition that $q\geq 0$ (via Lagrange multipliers)? Could I not just say: I define it for all functions on $L^1$ but then I just use it for densities anyway? Then I wouldnt need to use this constraint explicitely, no? — guest1
– guest1, Commented May 20 at 12:02
The issue is that not only it is just a subset, but also that it consists entirely of its boundary in an $L^1$-sense. As you more or less said in your question, for any point in Q, you can find a direction $h\in L^1$ such that $\int q +t h d\mu =1$ but $q+th \notin Q$ for any $t>0$. As the notion of derivative requires you to have "some room" in every direction, this will thus always be relevant to whatever you do. — mlk
– mlk, Commented May 21 at 8:08

Stack Exchange Network

Functional derivative/Gateaux derivative of functions of probability densities

1 Answer 1

You must log in to answer this question.

Functional derivative/Gateaux derivative of functions of probability densities

1 Answer 1

You must log in to answer this question.

Related