Motivation
The goal of this work is to develop a unified geometric framework for finite probability distributions and finite random variables, utilizing differential geometry and information geometry. By adopting a Riemannian perspective centered around the Fisher information metric, fundamental concepts such as expectation, variance, and entropy naturally emerge as geometric quantities. The motivations include:
- Statistical invariance: Classical statistical measures, like variance and entropy, are invariant under additive translations of random variables.
- Intrinsic geometry: The Fisher information metric, which arises naturally from the Kullback-Leibler divergence, provides a canonical geometric structure on the probability simplex.
The main questions addressed concern the originality of this geometric duality, potential relationships to established theories, and possibilities for generalization.
Part 1: Geometry of the Probability Simplex and its Tangent Space
We consider the open probability simplex: $$ \mathring\Delta_n = \{ p \in \mathbb{R}^n \mid p_i > 0, \sum_{i=1}^n p_i = 1 \},$$
endowed naturally with the Fisher information metric: $$ g_p(x,x) = \frac{1}{2}\sum_{i=1}^n \frac{x_i^2}{p_i}, \quad x \in T_p \Delta_n^\circ = \{x \in \mathbb{R}^n \mid \sum x_i = 0\}. $$
Random variables appear naturally as elements of the tangent space, identified modulo additive constants to account for statistical invariance. Thus, we have the canonical isomorphism: $$ H^n = \mathbb{R}^n / \mathbb{R} \mathbf{1}_n \cong T_p \mathring\Delta_n. $$
From this perspective, important statistical concepts are naturally geometric:
Expectation as a tangent vector: The expectation with respect to a distribution $p$ corresponds uniquely to a tangent vector $e_p$ in $T_p \Delta_n^\circ$, explicitly given by: $$ e_{p,i} = 2p_i \left(p_i - \sum_j p_j^2\right). $$ This vector vanishes if and only if $p$ is uniform, reflecting equilibrium conditions in statistical mechanics.
Orthogonal decomposition: Every tangent vector $x \in T_p \Delta_n^\circ$ can be decomposed into a component parallel to $e_p$ (thermodynamically observable) and an orthogonal component representing statistical fluctuations or noise.
Part 2: Spectral Duality between Distributions and Variables
A duality is established between probability distributions (points in $\mathring\Delta_n$) and random variables (elements in $H^n$) through natural spectral maps:
Spectral embedding of distributions into variables: Define the centered logarithmic embedding $\tilde{I}$ by: $$ \tilde{I}(p) = -\ln(p) + \frac{1}{n}\sum_{j=1}^n \ln(p_j), $$ which quantifies the informational deviation from uniformity.
Inverse embedding of variables into distributions: Define the softmax embedding $S$ by: $$ S(x) = \text{softmax}(-x). $$
These embeddings are mutually inverse maps: $$ \tilde{I} \circ S = \text{Id}_{H^n}, \quad S \circ \tilde{I} = \text{Id}_{\mathring\Delta_n}, $$ creating an explicit duality.
Consequently, the Fisher information metric induces a corresponding geometry on $H^n$, where tangent vectors at a point $x \in H^n$ naturally represent distributions. This duality is symmetric:
- Tangent vectors at distributions are identified with variables.
- Tangent vectors at variables are identified with distributions.
Algebraic Structure Induced from Linear Structure
The vector space structure of $H^n$ induces a corresponding algebraic structure on the simplex $\mathring\Delta_n$. Specifically, translating the linear addition from $H^n$ via the spectral maps yields a convolution-like operation: $$ (p \star q)_i = \frac{p_i q_i}{\sum_j p_j q_j}, $$ which inherits a natural identity element (the uniform distribution) and an inversion operation: $$ p^{-1}_i = \frac{1/p_i}{\sum_j 1/p_j}. $$ This algebraic structure is naturally induced by the linearity in $H^n$ and highlights classical distributions such as Bernoulli, Boltzmann, and Binomial as particular cases of inversion.
Dual Metric and its Geometric Implications
By considering random variables as elements of $H^n$ with tangent spaces identified as distributions, a dual metric $\tilde{g}$ emerges naturally: $$ \tilde{g}_x(p,q) = g_{S(x)}(\tilde{I}(p), \tilde{I}(q)). $$
This dual construction introduces an additional geometric structure on $H^n$ that mirrors the structure on $\mathring\Delta_n$. An open question is whether such dual metrics and their geometric properties have been previously studied.
Open Questions and Further Connections:
- Originality: Has the specific geometric duality between $\mathring\Delta_n$ and $H^n$, and particularly the induced metrics, been previously explored?
- Algebraic structures: Is the algebraic convolution structure $(\mathring\Delta_n, \star)$ recognized or previously analyzed? My initial intuition is that the softmax function acts as a kind of analogue to a Fourier series transformation.
- Physical interpretations: Could the tangent vector (e_p) be explicitly connected to linear-response theory or other established physical frameworks?
- Generalization: While I have already investigated aspects such as variance, covariance, and related geometric ideas, a broader question remains: if this framework is known, how might one naturally derive higher-order statistical structures from it?
- Applications The geometric duality and the metrics introduced here could, I believe, enable the development of efficient algorithms in machine learning, particularly for dimensionality reduction, robust distribution estimation, and improved variational inference techniques. Numerical experiments could assess the practical effectiveness of these new geometric representations in statistical analysis and both supervised and unsupervised learning settings. Are there any existing approaches in these domains that resemble what I have described?
Feedback, references, and suggestions on the novelty, relevance, and potential impact of this approach are highly welcomed. Could numerical and experimental tests help assess the practical and computational relevance of this geometric approach?
Note: This post is a significantly revised and extended version of an earlier question I had posed, now structured more clearly and with refined mathematical statements.