Parameter estimation for Gibbs distributions

David G. Harris and Vladimir Kolmogorov.

arXiv technical report, July 2020.


Abstract

We consider Gibbs distributions, which are families of probability distributions over a discrete space $Ω$ with probability mass function of the form $μ^Ω_β(x) \propto e^{βH(x)}$ for $β$ in an interval $[β_{\min}, β_{\max}]$ and $H(x) \in \{0 \} \cup [1, n]$. The \emph{partition function} is the normalization factor $Z(β)=\sum_{x\inΩ}e^{βH(x)}$.

Two important parameters of these distributions are the partition ratio $q = \log \tfrac{Z(β_{\max})}{Z(β_{\min})}$ and the counts $c_x = |H^{-1}(x)|$ for each value $x$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the values $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph.

As a key subroutine, we estimate the partition function $Z$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O(\frac{n^2}{\varepsilon^2})$ samples for integer-valued distributions. We construct a data structure capable of estimating $Z(β)$ for \emph{all} values $β$, without further samples. This improves over a prior algorithm of Kolmogorov (2018) which computes the single point estimate $Z(β_{\max})$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples. We show matching lower bounds, demonstrating that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms.


Links

arXiv