Category: Meta distributions

Taming the meta distribution

The derivation of meta distributions is mostly based on the calculation of the moments of the underlying conditional distribution. The reason is that except for highly simplistic scenarios, a direct calculation is elusive. Recall the definition of a meta distribution

\displaystyle\bar F(t,x)=\mathbb{P}(P_t>x),\qquad\text{\sf where }\;\;P_t=\mathbb{P}(X>t\mid\Phi).

Here X is the random variable we are interested in, and Φ is part of the random elements X depends on, usually a point process modeling the locations of wireless transceivers. The random variable Pt is a conditional distribution given Φ.

Using stochastic geometry, we can often derive moments of Pt. Since Pt has finite support, finding its distribution given the moments is a Hausdorff moment problem, which has a rich history in the mathematical literature but is not fully solved. The infinite sequence of integer moments uniquely defines the distribution, but in practice we are always restricted to a finite sequence, often less than 10 moments or so. This truncated version of the problem has infinitely many solutions, and the methods proposed to find or approximate the solution to the standard problem may or may not produce one of the solutions to the truncated one. In fact, they may not even provide an actual cumulative distribution function (cdf). On overview of the existing methods and some improvements can be found here.

In this blog, we focus on a method to find the infimum and supremum of the infinitely many possible cdfs that have a given finite moment sequence. The method is based on the Chebyshev-Markov inequalities. As the name suggests, it generalizes the well-known Markov and Chebyshev inequalities, which are based on moments sequences of length 1 or 2. The key step in this method is to find the maximum probability mass that can be concentrated at a point of interest, say z. This probability mass corresponds to the difference between the infimum and the supremum of the cdf at z. This technical report provides the details of the calculation.

Fig. 1 shows an example for the moment sequence mk =1/(k+1), for k ∈ [n], where the number n of given moments increases from 5 to 15. It can be observed how the infimum (red) and supremum (blue) curves approach each other as more moments are considered. For n → ∞, both would converge to the cdf of the uniform distribution on [0,1], i.e., F(x)=x. The supremum curve lower bounds the complementary cdf. For example, for n =15, ℙ(X>1/2)>0.4. This is the best bound that can be given since this value is achieved by a discrete distribution.

Fig. 1. Infima (red) and suprema (blue) of all the cdfs corresponding to the truncated moment problem where 5 to 15 moments are given.

The average of the infimum and supremum at each point naturally lends itself as an approximation of the meta distribution. It can be expected to be significantly more accurate than the usual beta approximation, which is based only on the first two moments.

An implementation is available in GitHub at https://github.com/ewa-haenggi/meta-distribution/blob/main/CMBounds.m.

Acknowledgment: I would like to thank my student Xinyun Wang for writing the Matlab code for the CM inequalities and providing the figure above.

Meta visualization

In this post we contrast the meta distribution of the SIR with the standard SIR distribution. The model is the standard downlink Poisson cellular network with Rayleigh fading and path loss exponent 4. The base station density is 1, and the users form a square lattice of density 5. Hence there are 5 users per cell on average.

Fig. 1: The setup. Base stations are red crosses, and users are blue circles. They are served by the nearest base station. Cell boundaries are dashed.

We assume the base stations transmit to the users in their cell at a rate such that the messages can be decoded if an SIR of θ =-3 dB is achieved. If the user succeeds in decoding, it is marked with a green square, otherwise red. We let this process run over many time slots, as shown next.

Fig. 2: Transmission success (green) and failure (red) at each user over 100 time slots.

The SIR meta distribution captures the per-user statistics, obtained by averaging over the fading, i.e., over time. In the next figure, the per-user reliabilities are illustrated using a color map from pure red (0% success) to pure green (100% success).

Fig. 3: Per-user reliabilities, obtained by averaging over fading. These are captured by the SIR meta distribution. Near the top left is a user that almost never succeeds since it is equidistant to three base stations.

The SIR meta distribution provides the fractions of users that are above (or below) a certain reliability threshold. For instance, the fraction of users that are at least dark green in the above figure.

In contrast, the standard SIR distribution, sometimes misleadingly called “coverage probability”, is just a single number, namely the average reliability, which is close to 70% in this scenario (see, e.g., Eqn. (1) in this post). Since it is obtained by averaging concurrently over fading and point process, the underlying network structure is lost, and the only information obtained is the average of the colors in Fig. 3:

Fig. 4: The standard SIR distribution only reveals the overall reliability of 70%.

The 70% reliability is that of the typical user (or typical location), which does not correspond to any user in our network realization. Instead, it is an abstract user whose statistics correspond to the average of all users.

Acknowledgment: The help of my Ph.D. student Xinyun Wang in writing the Matlab program for this post is greatly appreciated.

The transdimensional approach

In vehicular networks, transceivers are inherently confined to a subset of the two-dimensional Euclidean space. This subset is the street system where cars are allowed to move. Accordingly, stochastic geometry models for vehicular networks usually consist of two components: A set of streets and a set of point processes, one for each street, representing the vehicles. The most popular model is the Poisson line process (PLP) for the streets, combined with one-dimensional PPPs of vehicles on each line (street).

This PLP-PPP model does not have T-junctions, only intersections. Accordingly, all vehicles are of order 2 or 4, per the taxonomy introduced here. The order is determined by the number of directions in which a vehicle can move.

The PLP-PPP is a Cox process due to the independent one-dim. PPPs, and the underlying street system determining the intensity measure (the line process) is also based on the PPP. Consequently, the PLP-PPP inherits a certain level of tractability from the PPP, in the sense that exact expressions can be derived for some quantities of interest. In particular, the SIR distribution (complementary cumulative distribution function, ccdf) at the typical vehicle for a transmitter at a fixed distance can be derived without difficulty. However, the expression requires the evaluation of two nested improper integrals. While such a result certainly has its value, it does not give direct insight how the resulting SIR distribution depends on the network parameters. Also, other metrics that depend on the SIR often require further integration, most importantly the SIR meta distribution, which is calculated from the higher moments of the conditional SIR ccdf (given the point process).

This raises the question whether it is possible to find a closed-form result that is much more quickly evaluated and provides a tight approximation. Simply replacing the PLP-PPP by a two-dimensional PPP produces poor results, especially in the high-reliability regime (where the SIR ccdf is near 1). Similarly, considering only the one street that the typical vehicle lies on (i.e., using only a one-dimensional PPP) ignores all the interference from the vehicles on the other streets, which strongly affects the tail of the distribution.

How about a combination of the two – a superposition of a one-dimensional PPP for the typical vehicle’s street and a two-dimensional PPP for the other vehicles? In this approach, PPPs of two different dimensions are combined to a transdimensional PPP (TPPP). It accurately characterizes the interference from nearby vehicles, which are likely to lie on the same street as the typical vehicle, and captures the remaining interference without the complexity of the PLP. The three key advantages of this approach are:

  • The TPPP leads to closed-form results for the SIR ccdf that are asymptotically exact, both in the lower and upper tails (near 0 and near infinity).
  • The results are highly accurate over the entire range of the SIR ccdf, and they are obtained about 100,000 times faster than the exact results. Hence, if fast evaluation is key and a time limit of, say, one μs is specified, the transdimensional approach yields more accurate results than the exact expression. Put differently, the exact expression only leads to higher accuracy if ample computation time is available.
  • The simplicity of the TPPP extends to all moments of the conditional success probability, which greatly simplifies the calculation of the SIR meta distribution.

The TPPP approach is also applicable to other street systems, including the Poisson stick model (where streets are modeled as line segments of random length) and the Poisson lilypond model, which forms T-junctions (where vehicles are of order 3). For the stick model with independent lengths, the exact expression of the nearest-neighbor distance distribution involves six nested integrals, hence a transdimensional is certainly warranted. More details can be found here.

Averages, distributions, and meta distributions

In this post I would like to show how meta distributions naturally emerge as an important extension of the concepts of averages and distributions. For a random variable Z, we call 𝔼(Z) its average (or mean). If we add a parameter z to compare Z against and form the family of random variables 1(Z>z), we call their mean the distribution of Z (to be precise, the complementary cumulative distribution function, ccdf for short).
Now, if Z does not depend on any other randomness, then 𝔼1(Z>z) gives the complete information about all statistics of Z, i.e., the probability of any event can be expressed by adding or subtracting these elementary probabilities.
However, if Z is a function of other sources of randomness, then 𝔼1(Z>z) does not reveal how the statistics of Z depend on those of the individual random elements. In general Z may depend on many, possibly infinitely many, random variables and random elements (e.g., point processes), such as the SIR in a wireless network. Let us focus on the case Z=f(X,Y), where X and Y are independent random variables. Then, to discern how X and Y individually affect Z, we need to add a second parameter, say x, to extend the distribution to the meta distribution:

\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}[\mathbf{1}(Z>z) \mid Y]>x).

Alternatively,

\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}_X\mathbf{1}(Z>z)>x).

Hence the meta distribution (MD) is defined by first conditioning on part of the randomness. It has two parameters, the distribution has one parameter, and the average has zero parameters. There is a natural progression from averages to distributions to meta distributions (and back), as illustrated in this figure:

Figure 1: Relationship between mean (average), ccdf (distribution), and MD (meta distribution).

From the top going down, we obtain more information about Z by adding indicators and parameters. Conversely, we can eliminate parameters by integration (taking averages). Letting U be the conditional ccdf given Y, i.e., U=𝔼X1(Z>z)=𝔼[1(Z>z) | Y], it is apparent that the distribution of Z is the average of U, while the MD is the distribution of U.

Let us consider the example Z=X/Y , where X is exponential with mean 1 and Y is exponential with mean 1/μ, independent of X. The ccdf of Z is

\displaystyle \bar F_{Z}(z)=\frac{\mu}{\mu+z}.

In this case, the mean 𝔼(Z) does not exist. The conditional ccdf given Y is the random variable

\displaystyle U=\bar F_{Z\mid Y}(z)=\mathbb{E}\mathbf{1}(Z>z\mid Y)=e^{-Yz},

and its distribution is the meta distribution

\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)\!=\!\mathbb{P}(U\!>\!x)\!=\!\mathbb{P}(Y\!\leq\!-\log(x)/z)\!=\!1\!-\!x^{\mu/z}.

As expected, the ccdf of Z is retrieved by integration over x∈[0,1]. This MD has relevance in Poisson uplink cellular networks, where base stations (BSs) form a PPP Φ of intensity λ and the users are connected to the nearest BS. If the fading is Rayleigh fading and the path loss exponent is 2, the received power from a user at an arbitrary location is S=X/Y, where X is exponential with mean 1 and Y is exponential with mean 1/(λπ), exactly as in the example above. Hence the MD of the signal power S is

\displaystyle \qquad\qquad\qquad\bar F_{[\![S\mid \Phi]\!]}(z,x)=1-x^{\lambda\pi/z}.\qquad\qquad\qquad (1)

So what additional information do we get from the MD, compared to just the ccdf of S? Let us consider a realization of Φ and a set of users forming a lattice (any stationary point process of users would work) and determine each user’s individual probability that its received power exceeds 1:

Figure 2: Realization of a Poisson cellular network of density 1 where users (red crosses x) connect to the nearest base station (blue circles o). The number next to each user u is ℙ(Su>1 | Φ).

If we draw a histogram of all the user’s probabilities (the numbers in the figure), how does it look? This cannot be answered by merely looking at the ccdf of S. In fact ℙ(S>1)=π/(π+1)≈0.76 is merely the average of all the numbers. To know their distribution, we need to consult the MD. From (1) the MD (for λ=1 and z=1) is 1-xπ. Hence the histogram of the numbers has the form of the probability density function πxπ-1. In contrast, without the MD, we have no information about the disparity between the users. Their personal probabilities could all be well concentrated around 0.76, or some could have probabilities near 0 and others near 1. Put differently, only the MD can reveal the performance of user percentiles, such as the “5% user” performance, which is the performance that 95% of the users achieve but 5% do not.
This interpretation of the MD as a distribution over space for a fixed realization of the point process is valid whenever the point process is ergodic.

Another application of the MD is discussed in an earlier post on the fraction of reliable links in a network.