Tag: meta distribution

# Averages, distributions, and meta distributions

In this post I would like to show how meta distributions naturally emerge as an important extension of the concepts of averages and distributions. For a random variable Z, we call 𝔼(Z) its average (or mean). If we add a parameter z to compare Z against and form the family of random variables 1(Z>z), we call their mean the distribution of Z (to be precise, the complementary cumulative distribution function, ccdf for short).
Now, if Z does not depend on any other randomness, then 𝔼1(Z>z) gives the complete information about all statistics of Z, i.e., the probability of any event can be expressed by adding or subtracting these elementary probabilities.
However, if Z is a function of other sources of randomness, then 𝔼1(Z>z) does not reveal how the statistics of Z depend on those of the individual random elements. In general Z may depend on many, possibly infinitely many, random variables and random elements (e.g., point processes), such as the SIR in a wireless network. Let us focus on the case Z=f(X,Y), where X and Y are independent random variables. Then, to discern how X and Y individually affect Z, we need to add a second parameter, say x, to extend the distribution to the meta distribution:

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}[\mathbf{1}(Z>z) \mid Y]>x).$

Alternatively,

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}_X\mathbf{1}(Z>z)>x).$

Hence the meta distribution (MD) is defined by first conditioning on part of the randomness. It has two parameters, the distribution has one parameter, and the average has zero parameters. There is a natural progression from averages to distributions to meta distributions (and back), as illustrated in this figure:

From the top going down, we obtain more information about Z by adding indicators and parameters. Conversely, we can eliminate parameters by integration (taking averages). Letting U be the conditional ccdf given Y, i.e., U=𝔼X1(Z>z)=𝔼[1(Z>z) | Y], it is apparent that the distribution of Z is the average of U, while the MD is the distribution of U.

Let us consider the example Z=X/Y , where X is exponential with mean 1 and Y is exponential with mean 1/μ, independent of X. The ccdf of Z is

$\displaystyle \bar F_{Z}(z)=\frac{\mu}{\mu+z}.$

In this case, the mean 𝔼(Z) does not exist. The conditional ccdf given Y is the random variable

$\displaystyle U=\bar F_{Z\mid Y}(z)=\mathbb{E}\mathbf{1}(Z>z\mid Y)=e^{-Yz},$

and its distribution is the meta distribution

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)\!=\!\mathbb{P}(U\!>\!x)\!=\!\mathbb{P}(Y\!\leq\!-\log(x)/z)\!=\!1\!-\!x^{\mu/z}.$

As expected, the ccdf of Z is retrieved by integration over x∈[0,1]. This MD has relevance in Poisson uplink cellular networks, where base stations (BSs) form a PPP Φ of intensity λ and the users are connected to the nearest BS. If the fading is Rayleigh fading and the path loss exponent is 2, the received power from a user at an arbitrary location is S=X/Y, where X is exponential with mean 1 and Y is exponential with mean 1/(λπ), exactly as in the example above. Hence the MD of the signal power S is

$\displaystyle \qquad\qquad\qquad\bar F_{[\![S\mid \Phi]\!]}(z,x)=1-x^{\lambda\pi/z}.\qquad\qquad\qquad (1)$

So what additional information do we get from the MD, compared to just the ccdf of S? Let us consider a realization of Φ and a set of users forming a lattice (any stationary point process of users would work) and determine each user’s individual probability that its received power exceeds 1:

If we draw a histogram of all the user’s probabilities (the numbers in the figure), how does it look? This cannot be answered by merely looking at the ccdf of S. In fact ℙ(S>1)=π/(π+1)≈0.76 is merely the average of all the numbers. To know their distribution, we need to consult the MD. From (1) the MD (for λ=1 and z=1) is 1-xπ. Hence the histogram of the numbers has the form of the probability density function πxπ-1. In contrast, without the MD, we have no information about the disparity between the users. Their personal probabilities could all be well concentrated around 0.76, or some could have probabilities near 0 and others near 1. Put differently, only the MD can reveal the performance of user percentiles, such as the “5% user” performance, which is the performance that 95% of the users achieve but 5% do not.
This interpretation of the MD as a distribution over space for a fixed realization of the point process is valid whenever the point process is ergodic.

Another application of the MD is discussed in an earlier post on the fraction of reliable links in a network.

The fraction of reliable links is an important metric, in particular for applications with (ultra-)high reliability requirements. In the literature, we see that it is sometimes equated with the transmission success probability of the typical link, given by

$\displaystyle p_{\text{s}}=\mathbb{P}(\text{SIR}>\theta).$

This is the SIR distribution (in terms of the complementary cdf) at the typical link. In this post I would like to discuss whether it is accurate to call ps the fraction of reliable links.

Say someone claims “The fraction of reliable links in this network is ps=0.8″, and I ask “But how reliable are these links?”. The answer might be “They are (at least) 80% reliable of course, because ps=0.8.” Ok, so let us assume that the fraction links with reliability at least 0.8 is 0.8. Following the same logic, the fraction of links with reliability at least 0.7 would be 0.7. But clearly that fraction cannot be smaller than the fraction of links with reliability at least 0.8. There is an obvious contradiction. So how can we quantify the fraction of reliable links in a rigorous way?

First we note that in the expression for ps, there is no notion of reliability but ps itself. This leads to the wrong interpretation above that a fraction ps  of links has reliability at least ps. Instead, we want so specify a reliability threshold so that we can say, e.g., “the fraction of links that are at least 90% reliable is 0.8”. Naturally it then follows that the fraction of links that are at least 80% reliable must be larger than (or equal to) 0.8. So a meaningful expression for the fraction of reliable links must involve a reliability threshold parameter that can be tuned from 0 to 1, irrespective of how reliable the typical link happens to be.

Second, ps gives no indication about the reliability of individual links. In particular, it does not specify what fraction of links achieve a certain reliability, say 0.8. It could be all of them, or 2/3, or 1/2, or 1/5. ps=0.8 means that the probability of transmission success over the typical link is 0.8. Equivalently, in an ergodic setting, in every time slot, a fraction 0.8 of all links happens to succeed, in every realization of the point process. But some links will be highly reliable, while others will be less reliable.

Before getting to the definition of the fraction of reliable links, let us focus on Poisson bipolar networks for illustration, with the following concrete parameters: link distance 1/4, path loss exponent 4, target SIR threshold θ=1, and the fading is iid Rayleigh. The link density is λ, and we use slotted ALOHA with transmit probability is p. In this case, the well-known expression for ps is

$\displaystyle p_{\text{s}}=\exp(-c\lambda p),$

where c=0.3084 is a function of link distance, path loss exponent, and SIR threshold. We note that if we keep λp constant, ps remains unchanged. Now, instead of just considering the typical link, let us consider all the links in a realization of the network, i.e., for a given set of locations of all transceivers. The video below shows the histogram of the individual link reliabilities for constant λp=1 while varying the transmit probability p from 1.00 to 0.01 in steps of 0.01. The red line indicates ps, which is the average of all link reliabilities and remains constant at 0.735. Clearly, the distribution of link reliabilities changes significantly even with constant ps – as surmised above, ps does not reveal how disparate the reliabilities are. The symbol σ refers to the standard deviation of the reliability distribution, starting at 0.3 at p=1 and decreasing to less than 1/10 of that for p=1/100.

Equipped with the blue histogram (or pdf), we can easily determine what fraction of links achieves a certain reliability, say 0.6, 0.7, or 0.8. These are shown in the plot below. It is apparent that for small p, due to the concentration of the link reliabilities as p→0, the fraction of reliable links tends to 0 or 1, depending on whether the reliability threshold is above or below the average ps .

So how do we characterize the link reliability distribution theoretically? We start with the conditional SIR ccdf at the typical link, given the point process:

$\displaystyle P_{\text{s}}=\mathbb{P}(\text{SIR}>\theta\mid\Phi).$

Then ps=E(Ps), with the expectation taken over the point process. Hence ps is the mean of the conditional success probability, and if we consider its distribution, we arrive at the link reliability distribution, shown in blue in the video above. Mathematically,

$\displaystyle F(\nu)=\mathbb{P}(P_{\text{s}}>\nu).$

where ν is the target reliability. This distribution is a meta distribution, since it is the distribution of a conditional distribution. In ergodic settings, it specifies the fraction of links that achieve an SIR of θ with reliability at least ν, which is exactly what we set out to quantify.
In conclusion: The fraction of reliable links is not given by the standard (mean) success probability; it is given by the meta distribution of the SIR.

# What is “coverage”?

In the literature, the probability that the signal-to-interference ratio (SIR) at a given location and time exceeds a certain value is often referred to as the coverage probability. Is this sensible terminology, consistent with the way cellular operators define “coverage”? All publicly accessible cellular service coverage maps are static, i.e., their view of “coverage” is purely based on location and not on time. This seems natural since a rapidly changing coverage map, say at the level of seconds, would not be of much use to the user, apart from the fact that it would be very hard to collect the information at such time scales.

In contrast, the event SIR(x,t)>θ depends not only on the location x but also on the time t. A location may be “covered” at SIR level θ at one moment but “uncovered” just a little bit (one coherence time) later. Accordingly, a “coverage” map based on this criterion would have to be updated several times per second to accurately reflect this notion of coverage. Moreover, it would have to have a very high spatial resolution due to small-scale fading – one location may be “covered” at time t while another, half a meter away, may be “uncovered” at the same time t. Lastly, there is no notion of reliability. For each x and t, SIR(x,t)>θ either happens or not. It seems natural, though, to include reliability in a coverage definition; for example, by declaring that a location is covered if an SIR of θ is achieved with probability 95%, or 95 times out of 100 transmissions.

Hence there are three disadvantages of using SIR(x,t)>θ as the criterion for coverage:

• The event depends on time (at the level of the coherence time)
• The event depends on space at a very small scale (at the granularity of the coherence length of the small-scale fading)
• The event does not allow for a reliability threshold to define coverage.

How can we define “coverage” without these shortcomings? First, we interpret coverage as a purely spatial term, consistent with the coverage maps we find on the web; it should not include a temporal component, at least not in the short-term – hopefully cellular coverage keeps improving over the years, but it should not vary randomly many times per second. Put differently, coverage should only depend on the network geometry (locations of base stations relative to the position x) and shadowing, but not on the rapid signal strength fluctuations due to small-scale fading. The solution to eliminate the temporal component is fairly straightforward – we just need to average over the temporal randomness, i.e., the small-scale fading. Such averaging eliminates the other two shortcomings as well. For a base station point process Φ, we define the conditional SIR distribution at location x as

$\displaystyle P(x)=\mathbb{P}(\text{SIR}(x,t)>\theta\mid\Phi).$

Here, the probability is taken over the small-scale fading, which eliminates the dependence on time (assuming temporal ergodicity of the fading process, which means that the ensemble average here could be replaced by a time average over a suitable long period). If shadowing is present, it can be incorporated in Φ. Then, introducing a reliability threshold ν, we declare

$\displaystyle \{x \text{ covered}\}\quad\Leftrightarrow\quad \{P(x)>\nu\}$

The reliability threshold ν appears naturally in this definition. The probability that P(x)>ν is the meta distribution of the SIR, since it is the distribution of the conditional SIR distribution given Φ. For stationary and ergodic point processes Φ, it does not depend on x and gives the area fraction that is covered at SIR threshold θ and reliability threshold ν. The figure below shows a coverage map where the colors indicate the reliability threshold at which locations are covered, from dark blue (ν close to 0) to bright yellow (ν close to 1).
So if P(SIR>θ) is not the coverage probability, what is it? It is simply the complementary cumulative distribution (ccdf) of the SIR, often interpreted as the success probability of a transmission.