Methods: Fractal Strings, Zeta Functions, and the Riemann Zeta Distribution¶

This work is licensed under a Creative Commons Attribution 4.0 International License.
Fractal Strings and Geometric Zeta Functions¶
The Geometry of Fractal Strings¶
A fractal string \(\mathcal{L}\) is formally defined as a bounded open subset of the real line, \(\Omega \subset \mathbb{R}\). Since \(\Omega\) is open, it can be written as a disjoint union of open intervals. Let the lengths of these intervals be denoted by a sequence \(\ell_j\), sorted in non-increasing order:
Geometrically, one can view this as a one-dimensional "drum" with a fractal boundary. The boundary of \(\Omega\), denoted \(\partial \Omega\), is a compact subset of \(\mathbb{R}\). If \(\partial \Omega\) is a fractal set (like the Cantor set), the sequence of lengths \(\ell_j\) decays according to a power law.
The Cantor String Example¶
The standard Cantor set is formed by removing the middle third of the unit interval, then the middle thirds of the remaining segments, and so on. The lengths of the removed intervals (the "gaps") form the fractal string:
The multiplicities of the lengths \(3^{-k}\) are \(2^{k-1}\). This sequence encodes the geometry of the Cantor set.
The Geometric Zeta Function¶
The primary tool for analyzing a fractal string is its geometric zeta function, denoted \(\zeta_{\mathcal{L}}(s)\). This is defined as the Dirichlet series of the lengths:
For the Cantor string, this sum becomes a geometric series:
This series converges for \(\text{Re}(s) > D\), where \(D = \frac{\log 2}{\log 3}\) is the Minkowski dimension (or box-counting dimension) of the Cantor set.
Complex Dimensions as Poles¶
The critical insight of Lapidus's theory is that \(\zeta_{\mathcal{L}}(s)\) admits a meromorphic continuation to a larger region of the complex plane. The poles of this continuation are called the complex dimensions of the fractal string.
For the Cantor string, the poles are the solutions to \(1 - 2 \cdot 3^{-s} = 0\), which are:
Here we see the emergence of complex dimensions. The real part, \(\frac{\log 2}{\log 3}\), is the familiar fractal dimension \(D\). However, there is an infinite tower of poles extending vertically in the complex plane, spaced periodically by \(\frac{2\pi i}{\log 3}\).
Explicit Formulas and Oscillations¶
The physical and algorithmic significance of these complex dimensions is revealed through explicit formulas. These are fractal analogues of Riemann's explicit formula for the distribution of prime numbers. In number theory, the explicit formula relates the counting function of primes, \(\pi(x)\), to a sum over the zeros of the Riemann zeta function \(\zeta(s)\). Similarly, for a fractal string, the volume of the tubular neighborhood of the boundary, \(V(\varepsilon)\), is expressed as a sum over the residues of the geometric zeta function.
The tubular neighborhood \(V(\varepsilon)\) is the set of points within distance \(\varepsilon\) of the boundary \(\partial \Omega\). For a Minkowski measurable fractal of dimension \(D\), one expects \(V(\varepsilon) \sim C \varepsilon^{1-D}\). However, the explicit formula shows that the behavior is more complex:
Here, \(\mathcal{D}_{\mathcal{L}}\) is the set of complex dimensions (poles).
- Real Pole (\(D\)): The pole on the real axis with the largest real part gives the dominant term \(\varepsilon^{1-D}\).
- Complex Poles (\(D + i t_k\)): These poles contribute terms of the form \(\varepsilon^{1-(D+it_k)} = \varepsilon^{1-D} e^{-i t_k \log \varepsilon}\).
Since \(e^{-i t_k \log \varepsilon} = \cos(t_k \log \varepsilon) - i \sin(t_k \log \varepsilon)\), these complex dimensions manifest as geometric oscillations in the volume of the tube. The volume does not scale monotonically; it "wobbles" periodically as a function of \(\log \varepsilon\). The frequency of this wobble is determined by the imaginary part of the complex dimensions.
This result is profound: fractality is synonymous with oscillation. If a geometry is exactly self-similar (like the Cantor set), its complex dimensions form a lattice, and the oscillations are periodic. If the geometry is random or less structured, the poles may be scattered, leading to quasi-periodic or chaotic oscillations.
Fractal Sprays and Higher Dimensions¶
The theory extends to higher dimensions via the concept of fractal sprays. A fractal spray is a collection of scaled copies of a base shape \(B\) (e.g., a square, a circle, or a drum) distributed according to the lengths of a fractal string \(\mathcal{L}\).
For example, an Apollonian gasket can be viewed roughly as a fractal spray where the base shape is a circle, and the scaling factors follow the distribution of the radii of the Apollonian circles.
The spectral zeta function of a fractal spray in \(\mathbb{R}^d\) factorizes into the zeta function of the underlying string and the spectral zeta function of the base shape. If \(\zeta_{\text{string}}(s)\) is the zeta function of the scaling factors and \(\zeta_{\text{shape}}(s)\) is the spectral zeta function of the base shape (e.g., the Dirichlet Laplacian on a unit disk), then the spectral zeta function of the spray \(\zeta_{\nu}(s)\) is roughly the product.
This factorization implies that the complex dimensions of the string propagate into the spectral properties of the 2D packing. Consequently, the "wobble" in the gap distribution of a 2D packing is dictated by the same complex poles that govern the 1D string. The inverse spectral problem---"Can one hear the shape of a fractal drum?"---is answered affirmatively in terms of these dimensions: one hears the complex dimensions as oscillations in the eigenvalue counting function \(N(\lambda)\).
The Riemann Zeta Distribution: Zipf's Law¶
Before analyzing the behavior of algorithms, we must rigorously define the nature of the input data. The "Riemann Zeta Distribution" refers to the discrete Pareto distribution, widely known as Zipf's Law. This distribution is the statistical manifestation of self-affinity in data.
Mathematical Definition and Properties¶
A random variable \(X\) follows a Riemann Zeta distribution with parameter \(s > 1\) if its probability mass function (PMF) is given by:
where \(\zeta(s) = \sum_{n=1}^{\infty} n^{-s}\) is the Riemann zeta function, which serves as the normalization constant to ensure the probabilities sum to 1.
This distribution has several critical properties that distinguish it from the uniform or Gaussian distributions typically assumed in elementary algorithm analysis:
-
Heavy Tail: The probability decays polynomially (\(k^{-s}\)) rather than exponentially. This means that large values (or rare events) are much more probable than in a normal distribution.
-
Infinite Moments: If \(s \le 2\), the expectation \(E[X]\) is infinite. If \(s \le 3\), the variance is infinite. This breakdown of standard statistical moments poses severe challenges for algorithms that rely on variance minimization or central limit theorem behavior.
-
Scale Invariance: The distribution is self-similar. If we look at the distribution of values \(X > K\) for some large \(K\), the relative probabilities follow the same power law: \(P(X = cy \mid X > y) \approx c^{-s}\). This is the probabilistic equivalent of looking at a coastline under a microscope and seeing the same roughness.
Zipf's Law in Real-World Data¶
This distribution is fundamental because it empirically models a vast array of natural and human-generated phenomena. It is the signature of self-organized criticality and fractal processes:
-
Natural Language: George Zipf famously observed that the frequency of the \(k\)-th most common word in a corpus is proportional to \(1/k\) (which corresponds to a Zeta distribution with \(s \approx 1\), technically requiring a cutoff or modification to be normalizable, often modeled as \(s \approx 1+\epsilon\)).
-
Number Theory: The distribution of the number of distinct prime factors of a random integer chosen from a large interval follows a distribution related to the Zeta function. The "Riemann Zeta distribution" allows for the calculation of probabilities such as the likelihood that a random integer is square-free (\(1/\zeta(2) = 6/\pi^2\)).
-
Spatial Data: The sizes of cities, the areas of lakes, the diameters of craters, and the lengths of road segments all follow Zipfian/Zeta distributions. This is crucial for 2D spatial databases.
Information Theoretic Implications¶
The Zeta distribution represents a specific regime of information density. In a uniform distribution of size \(N\), the entropy is maximized (\(\log N\)). In a Zeta distribution, the entropy is significantly lower due to the high redundancy of the most frequent elements.
Algorithms can exploit this redundancy. For example, Huffman coding on Zeta-distributed symbols produces code lengths that are inversely related to the frequencies. The "average" code length is related to the entropy.
However, sorting algorithms like Quicksort, which rely on picking "good" pivots to divide the dataset evenly, can suffer when the data is Zeta-distributed. If a pivot is chosen from the "heavy tail" (a rare, large value) or the "head" (a frequent, small value), the partitions may be highly unbalanced. While randomized pivot selection mitigates this, the variance of the running time increases significantly due to the heavy tail of the input distribution.
The Connection to Self-Affinity¶
The Zeta distribution is the probabilistic dual of a self-affine set. A self-affine set in geometry is one that is invariant under an affine transformation (scaling by different amounts in different directions).
Consider the frequency-rank plot of a Zeta distribution. It is a straight line on a log-log plot with slope \(-s\). This linearity is the hallmark of fractal scaling.
When an algorithm processes a list of numbers drawn from a Zeta distribution, it is effectively traversing a probabilistic fractal. The structure of the data---its "clumpiness," the recurrence of small values, and the sparseness of large values---mirrors the geometry of a fractal string. The "gaps" between the occurrence of specific values in a sorted list are analogous to the "gaps" in a Cantor set. The algorithm "feels" this roughness. The smooth performance curves of uniform inputs are replaced by curves that exhibit lacunarity---gaps and oscillations derived from the input's dimension.