Tag: meta distribution

# Meta visualization

In this post we contrast the meta distribution of the SIR with the standard SIR distribution. The model is the standard downlink Poisson cellular network with Rayleigh fading and path loss exponent 4. The base station density is 1, and the users form a square lattice of density 5. Hence there are 5 users per cell on average.

We assume the base stations transmit to the users in their cell at a rate such that the messages can be decoded if an SIR of θ =-3 dB is achieved. If the user succeeds in decoding, it is marked with a green square, otherwise red. We let this process run over many time slots, as shown next.

The SIR meta distribution captures the per-user statistics, obtained by averaging over the fading, i.e., over time. In the next figure, the per-user reliabilities are illustrated using a color map from pure red (0% success) to pure green (100% success).

The SIR meta distribution provides the fractions of users that are above (or below) a certain reliability threshold. For instance, the fraction of users that are at least dark green in the above figure.

In contrast, the standard SIR distribution, sometimes misleadingly called “coverage probability”, is just a single number, namely the average reliability, which is close to 70% in this scenario (see, e.g., Eqn. (1) in this post). Since it is obtained by averaging concurrently over fading and point process, the underlying network structure is lost, and the only information obtained is the average of the colors in Fig. 3:

The 70% reliability is that of the typical user (or typical location), which does not correspond to any user in our network realization. Instead, it is an abstract user whose statistics correspond to the average of all users.

Acknowledgment: The help of my Ph.D. student Xinyun Wang in writing the Matlab program for this post is greatly appreciated.

# Realistic communication

Today’s blog is about realistic communication, i.e., what kind of performance can realistically be expected of a wireless network. To get started, let’s have a look at an excerpt from a recent workshop description:

Future wireless networks will have to support many innovative vertical services, each with its own specific requirements, e.g.

• End-to-end latency of 1 ns and reliability higher than 99.999% for URLLCs.
• Terminal densities of 1 million of terminals per square kilometer for massive IoT applications.
• Per-user data-rate of the order of Terabit/s for broadband applications.”

Let’s break this down, bullet by bullet.

First bullet: In 1 ns, light travels 30 cm in free space. So “end-to-end” here would mean a distance of at most 10 cm, to leave some fraction of a nanosecond for encoding, transmission, and decoding. But what useful wireless service is there where transceivers are within at most 10 cm? Next, a packet loss rate of 10-5 means that the spectral efficiency must be very low. Together with a latency constraint of 1 ns, ultrahigh bandwidths must be used, which, in turn, makes the design of circuitry and antenna arrays extremely challenging. At least the channel can be expected to be benign (line-of-sight).

Where does stochastic geometry come in? Assuming that these ultrashort links live in a network and not in isolation, interference will play a role. Let us consider a Poisson bipolar network with normalized link distance 1, a path loss exponent α and Rayleigh fading. What is the maximum density of links that can be supported that have an outage of at most ε? This quantity is known as the spatial outage capacity (SOC). For small ε, which is our regime of interest here, we have

$\displaystyle\text{SOC}\sim\left(\frac{\varepsilon}{\rho}\right)^\delta c_\delta,\quad \varepsilon\to 0,$

where δ=2/α and cδ is a constant that only depends on the path loss exponent 2/δ. ρ is the spectral efficiency (in bits/s/Hz or bps/Hz). This shows the fundamental tradeoff between outage and spectral efficiency: Reducing the outage by a factor of 10 reduces the rate of transmission by the same factor if the same link density is to be maintained. Compared to a more standard outage constraint of 5%, this means that the rate must be reduced by a factor 5,000 to accommodate the 99.999% reliability requirement. Now, say we have 0.5 ns for the transmission of a message of 50 bits, the rate is 100 Gbps. Assuming a very generous spectral efficiency of 100 bps/Hz for a system operating at 5% outage, this means that 100 Gbps must be achieved at a spectral efficiency of a mere 0.02 bps/Hz. So we would need 5 THz of bandwidth to communicate a few dozen bits over 10 cm.
Even relaxing the latency constraint to 1 μs still requires 5 GHz of bandwidth.

In cellular networks, the outage-rate relationship is governed by a very similar tradeoff.
For any stationary point process of base stations and Rayleigh fading, the SIR meta distribution asymptotically has the form

$\displaystyle \bar F(\rho,\varepsilon)\sim\left(\frac{\varepsilon}{\rho}\right)^\delta C_\delta,\quad \varepsilon\to 0,$

where Cδ again depends only on the path loss exponent. This is the fraction of users who achieve a spectral efficiency of ρ with an outage less than ε, remarkably similar to the bipolar result. To keep this fraction fixed at, say, 95%, again the spectral efficiency needs to be reduced in proportion to a reduction of the outage constraint ε.

Second bullet: Per the classification and nomenclature in a dense debate, this density falls squarely in the tremendously dense class, above super-high density and extremely high density. So what do the anticipated 100 devices in an average home or 10,000 devices in an average parking lot do? What kind of messages are they exchanging or reporting to a hub? How often? What limits the performance? These devices are often said to be “connected“, without any specification what that means. Only once this is clarified, a discussion can ensue whether such tremendous densities are realistic.

Third bullet: Terabit-per-second (Tbps) rates require at least 10 GHz of spectrum, optimistically. 5G in its most ambitious configuration, ignoring interference, has a spectral efficiency of about 50 bps/Hz, and, barring any revolutionary breakthrough, more than 100 bps/Hz does not appear feasible in the next decade. Similarly, handling a signal 10 GHz wide would be an order of magnitude beyond what is currently possible. Plus such large junks of spectrum are not even available at 60 GHz (the current mm-wave bands). At 100 GHz and above, link distances are even more limited and more strongly subject to blockages, and analog beamforming circuitry becomes much more challenging and power-hungry. Most importantly, though, peak rates are hardly achieved in reality. In the 5G standard, the user experienced data rate (the rate of the 5-th percentile user) is a mere 1% of the peak rate, and this fraction has steadily decreased over the cellular generations:

So even if 1 Tbps peak rates became a reality, users would likely experience between 1 Gbps to at most 10 Gbps – assuming their location is covered, which may vary over short spatial scales. Such user percentile performance can be analyzed using meta distributions.

In conclusion, while setting ambitious goals may trigger technological advances, it is important to be realistic of what is achievable and what performance the user actually experiences. For example, instead of focusing on 1 Tbps peak rates, we could focus on delivering 1 Gbps to 95% of the users, which may still be very challenging but probably achievable and more rewarding to the user. And speaking of billions of “connected devices” is just marketing unless it is clearly defined what being connected means.

For more information on the two analytical results above, please see this paper (Corollary 1) and this paper (Theorem 3).

# What to expect (over)

In performance analyses of wireless networks, we frequently encounter expectations of the form

$\displaystyle\mathbb{E}\log(1+{\rm SIR}),\qquad\qquad\qquad\qquad\qquad\qquad (*)$

called average (ergodic) spectral efficiency (SE) or mean normalized rate or similar, in units of nats/s/Hz. For networks models with uncertainty, its evaluation requires the use stochastic geometry. Sometimes the metric is also normalized per area and called area spectral efficiency. The SIR is expressed in the form

$\displaystyle {\rm SIR}=\frac{h_y \|y\|^{-\alpha}}{\sum_{x\in\Phi} h_x \|x\|^{-\alpha}},$

with Φ being the point process of interferers.
There are several underlying assumption made when claiming that (*) is a relevant metric:

• It is assumed that codewords are long enough and arranged in a way (interspersed in time, frequency, or across antennas) such that fading is effectively averaged out. This is reasonable for several current networks.
• It is assumed that desired signal and interference amplitudes are Gaussian. This is sensible since if a decoder is intended for Gaussian interference, then the SE is as if the interference amplitude were indeed Gaussian, regardless of its actual distribution.
• Most importantly and questionably, taking the expectation over all fading random variables hx implies that the receiver has knowledge of all of them. Gifting the receiver with all the information of the channels from all interferers is unrealistic and thus, not surprisingly, leads to (*) being a loose upper bound on what is actually achievable.

So what is a more realistic and accurate approach? It turns out that if the fading in the interferers’ channels is ignored, i.e., by considering

$\displaystyle {\rm SIR}^\prime=\frac{h_y \|y\|^{-\alpha}}{\sum_{x\in\Phi} \|x\|^{-\alpha}},$

we can obtain a tight lower bound on the SE instead of a loose upper bound. A second key advantage is that this formulation permits a separation of temporal and spatial scales, in the sense of the meta distribution. We can write

$\displaystyle {\rm SIR}^\prime=h_y\rho,\qquad\text{where }\; \rho=\frac{\|y\|^{-\alpha}}{\sum_{x\in\Phi} \|x\|^{-\alpha}}$

is a purely geometric quantity that is fixed over time and cleanly separated from the time-varying fading term hy. Averaging locally (over the fading), the SE follows as

$\displaystyle C(\rho)=\mathbb{E}_h \log(1+h\rho),$

which is a function of (conditioned on) the point process. For instance, with Rayleigh fading,

$\displaystyle C(\rho)=e^{1/\rho} {\rm Ei}_1(1/\rho),$

where Ei1 is an exponential integral. The next step is to find the distribution of ρ to calculate the spatial distribution of the SE – which would not be possible from (*) since it is an “overall average” that lumps all randomness together. In the case of Poisson cellular networks with nearest-base station association and path loss exponent 2/δ, a good approximation is

Here s* is given by

$\displaystyle s^{*\delta}\gamma(-\delta,s^*)=0,$

and γ is the lower incomplete gamma function. This approach lends itself to extensions to MIMO. It turns out that the resulting distribution of the SE is approximately lognormal, as illustrated in Fig. 1.

For SISO and δ=1/2 (a path loss exponent of 4), this (approximative) analysis shows that the SE achieved in 99% of the network is 0.22 bits/s/Hz, while a (tedious) simulation gives 0.24 bits/s/Hz. Generally, for small ξ, 1/ln(1/ξ) is achieved by a fraction 1-ξ of the network. As expected from the discussion above, this is a good lower bound.

In contrast, using the SIR distribution directly (and disregarding the separation of temporal and spatial scales), from

$\displaystyle \bar F_{\rm SIR}(\theta)=0.99 \quad\Longrightarrow\quad \theta=-20\text{ dB},$

we would obtain an SE of only log2(1.01)=0.014 bits/s/Hz for 99% “coverage”, which is off by a factor of 16! So it is important that coverage be gleaned from the ergodic SE rather than a quantity subject to the small-scale variations of fading. See also this post.

The take-aways for the ergodic spectral efficiency are:

• Avoid mixing time and spatial scales by expecting first over the fading and separately over the point process; this way, the spatial distribution of the SE can be obtained, instead of merely its average.
• Avoid gifting the receiver with information it cannot possibly have; this way, tight lower bounds can be obtained instead of loose upper bounds.

The details can be found here.

# Averages, distributions, and meta distributions

In this post I would like to show how meta distributions naturally emerge as an important extension of the concepts of averages and distributions. For a random variable Z, we call 𝔼(Z) its average (or mean). If we add a parameter z to compare Z against and form the family of random variables 1(Z>z), we call their mean the distribution of Z (to be precise, the complementary cumulative distribution function, ccdf for short).
Now, if Z does not depend on any other randomness, then 𝔼1(Z>z) gives the complete information about all statistics of Z, i.e., the probability of any event can be expressed by adding or subtracting these elementary probabilities.
However, if Z is a function of other sources of randomness, then 𝔼1(Z>z) does not reveal how the statistics of Z depend on those of the individual random elements. In general Z may depend on many, possibly infinitely many, random variables and random elements (e.g., point processes), such as the SIR in a wireless network. Let us focus on the case Z=f(X,Y), where X and Y are independent random variables. Then, to discern how X and Y individually affect Z, we need to add a second parameter, say x, to extend the distribution to the meta distribution:

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}[\mathbf{1}(Z>z) \mid Y]>x).$

Alternatively,

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)=\mathbb{E}\mathbf{1}(\mathbb{E}_X\mathbf{1}(Z>z)>x).$

Hence the meta distribution (MD) is defined by first conditioning on part of the randomness. It has two parameters, the distribution has one parameter, and the average has zero parameters. There is a natural progression from averages to distributions to meta distributions (and back), as illustrated in this figure:

From the top going down, we obtain more information about Z by adding indicators and parameters. Conversely, we can eliminate parameters by integration (taking averages). Letting U be the conditional ccdf given Y, i.e., U=𝔼X1(Z>z)=𝔼[1(Z>z) | Y], it is apparent that the distribution of Z is the average of U, while the MD is the distribution of U.

Let us consider the example Z=X/Y , where X is exponential with mean 1 and Y is exponential with mean 1/μ, independent of X. The ccdf of Z is

$\displaystyle \bar F_{Z}(z)=\frac{\mu}{\mu+z}.$

In this case, the mean 𝔼(Z) does not exist. The conditional ccdf given Y is the random variable

$\displaystyle U=\bar F_{Z\mid Y}(z)=\mathbb{E}\mathbf{1}(Z>z\mid Y)=e^{-Yz},$

and its distribution is the meta distribution

$\displaystyle \bar F_{[\![Z\mid Y]\!]}(z,x)\!=\!\mathbb{P}(U\!>\!x)\!=\!\mathbb{P}(Y\!\leq\!-\log(x)/z)\!=\!1\!-\!x^{\mu/z}.$

As expected, the ccdf of Z is retrieved by integration over x∈[0,1]. This MD has relevance in Poisson uplink cellular networks, where base stations (BSs) form a PPP Φ of intensity λ and the users are connected to the nearest BS. If the fading is Rayleigh fading and the path loss exponent is 2, the received power from a user at an arbitrary location is S=X/Y, where X is exponential with mean 1 and Y is exponential with mean 1/(λπ), exactly as in the example above. Hence the MD of the signal power S is

$\displaystyle \qquad\qquad\qquad\bar F_{[\![S\mid \Phi]\!]}(z,x)=1-x^{\lambda\pi/z}.\qquad\qquad\qquad (1)$

So what additional information do we get from the MD, compared to just the ccdf of S? Let us consider a realization of Φ and a set of users forming a lattice (any stationary point process of users would work) and determine each user’s individual probability that its received power exceeds 1:

If we draw a histogram of all the user’s probabilities (the numbers in the figure), how does it look? This cannot be answered by merely looking at the ccdf of S. In fact ℙ(S>1)=π/(π+1)≈0.76 is merely the average of all the numbers. To know their distribution, we need to consult the MD. From (1) the MD (for λ=1 and z=1) is 1-xπ. Hence the histogram of the numbers has the form of the probability density function πxπ-1. In contrast, without the MD, we have no information about the disparity between the users. Their personal probabilities could all be well concentrated around 0.76, or some could have probabilities near 0 and others near 1. Put differently, only the MD can reveal the performance of user percentiles, such as the “5% user” performance, which is the performance that 95% of the users achieve but 5% do not.
This interpretation of the MD as a distribution over space for a fixed realization of the point process is valid whenever the point process is ergodic.

Another application of the MD is discussed in an earlier post on the fraction of reliable links in a network.

The fraction of reliable links is an important metric, in particular for applications with (ultra-)high reliability requirements. In the literature, we see that it is sometimes equated with the transmission success probability of the typical link, given by

$\displaystyle p_{\text{s}}=\mathbb{P}(\text{SIR}>\theta).$

This is the SIR distribution (in terms of the complementary cdf) at the typical link. In this post I would like to discuss whether it is accurate to call ps the fraction of reliable links.

Say someone claims “The fraction of reliable links in this network is ps=0.8″, and I ask “But how reliable are these links?”. The answer might be “They are (at least) 80% reliable of course, because ps=0.8.” Ok, so let us assume that the fraction links with reliability at least 0.8 is 0.8. Following the same logic, the fraction of links with reliability at least 0.7 would be 0.7. But clearly that fraction cannot be smaller than the fraction of links with reliability at least 0.8. There is an obvious contradiction. So how can we quantify the fraction of reliable links in a rigorous way?

First we note that in the expression for ps, there is no notion of reliability but ps itself. This leads to the wrong interpretation above that a fraction ps  of links has reliability at least ps. Instead, we want so specify a reliability threshold so that we can say, e.g., “the fraction of links that are at least 90% reliable is 0.8”. Naturally it then follows that the fraction of links that are at least 80% reliable must be larger than (or equal to) 0.8. So a meaningful expression for the fraction of reliable links must involve a reliability threshold parameter that can be tuned from 0 to 1, irrespective of how reliable the typical link happens to be.

Second, ps gives no indication about the reliability of individual links. In particular, it does not specify what fraction of links achieve a certain reliability, say 0.8. It could be all of them, or 2/3, or 1/2, or 1/5. ps=0.8 means that the probability of transmission success over the typical link is 0.8. Equivalently, in an ergodic setting, in every time slot, a fraction 0.8 of all links happens to succeed, in every realization of the point process. But some links will be highly reliable, while others will be less reliable.

Before getting to the definition of the fraction of reliable links, let us focus on Poisson bipolar networks for illustration, with the following concrete parameters: link distance 1/4, path loss exponent 4, target SIR threshold θ=1, and the fading is iid Rayleigh. The link density is λ, and we use slotted ALOHA with transmit probability is p. In this case, the well-known expression for ps is

$\displaystyle p_{\text{s}}=\exp(-c\lambda p),$

where c=0.3084 is a function of link distance, path loss exponent, and SIR threshold. We note that if we keep λp constant, ps remains unchanged. Now, instead of just considering the typical link, let us consider all the links in a realization of the network, i.e., for a given set of locations of all transceivers. The video below shows the histogram of the individual link reliabilities for constant λp=1 while varying the transmit probability p from 1.00 to 0.01 in steps of 0.01. The red line indicates ps, which is the average of all link reliabilities and remains constant at 0.735. Clearly, the distribution of link reliabilities changes significantly even with constant ps – as surmised above, ps does not reveal how disparate the reliabilities are. The symbol σ refers to the standard deviation of the reliability distribution, starting at 0.3 at p=1 and decreasing to less than 1/10 of that for p=1/100.

Equipped with the blue histogram (or pdf), we can easily determine what fraction of links achieves a certain reliability, say 0.6, 0.7, or 0.8. These are shown in the plot below. It is apparent that for small p, due to the concentration of the link reliabilities as p→0, the fraction of reliable links tends to 0 or 1, depending on whether the reliability threshold is above or below the average ps .

So how do we characterize the link reliability distribution theoretically? We start with the conditional SIR ccdf at the typical link, given the point process:

$\displaystyle P_{\text{s}}=\mathbb{P}(\text{SIR}>\theta\mid\Phi).$

Then ps=E(Ps), with the expectation taken over the point process. Hence ps is the mean of the conditional success probability, and if we consider its distribution, we arrive at the link reliability distribution, shown in blue in the video above. Mathematically,

$\displaystyle F(\nu)=\mathbb{P}(P_{\text{s}}>\nu).$

where ν is the target reliability. This distribution is a meta distribution, since it is the distribution of a conditional distribution. In ergodic settings, it specifies the fraction of links that achieve an SIR of θ with reliability at least ν, which is exactly what we set out to quantify.
In conclusion: The fraction of reliable links is not given by the standard (mean) success probability; it is given by the meta distribution of the SIR.

# What is “coverage”?

In the literature, the probability that the signal-to-interference ratio (SIR) at a given location and time exceeds a certain value is often referred to as the coverage probability. Is this sensible terminology, consistent with the way cellular operators define “coverage”? All publicly accessible cellular service coverage maps are static, i.e., their view of “coverage” is purely based on location and not on time. This seems natural since a rapidly changing coverage map, say at the level of seconds, would not be of much use to the user, apart from the fact that it would be very hard to collect the information at such time scales.

In contrast, the event SIR(x,t)>θ depends not only on the location x but also on the time t. A location may be “covered” at SIR level θ at one moment but “uncovered” just a little bit (one coherence time) later. Accordingly, a “coverage” map based on this criterion would have to be updated several times per second to accurately reflect this notion of coverage. Moreover, it would have to have a very high spatial resolution due to small-scale fading – one location may be “covered” at time t while another, half a meter away, may be “uncovered” at the same time t. Lastly, there is no notion of reliability. For each x and t, SIR(x,t)>θ either happens or not. It seems natural, though, to include reliability in a coverage definition; for example, by declaring that a location is covered if an SIR of θ is achieved with probability 95%, or 95 times out of 100 transmissions.

Hence there are three disadvantages of using SIR(x,t)>θ as the criterion for coverage:

• The event depends on time (at the level of the coherence time)
• The event depends on space at a very small scale (at the granularity of the coherence length of the small-scale fading)
• The event does not allow for a reliability threshold to define coverage.

How can we define “coverage” without these shortcomings? First, we interpret coverage as a purely spatial term, consistent with the coverage maps we find on the web; it should not include a temporal component, at least not in the short-term – hopefully cellular coverage keeps improving over the years, but it should not vary randomly many times per second. Put differently, coverage should only depend on the network geometry (locations of base stations relative to the position x) and shadowing, but not on the rapid signal strength fluctuations due to small-scale fading. The solution to eliminate the temporal component is fairly straightforward – we just need to average over the temporal randomness, i.e., the small-scale fading. Such averaging eliminates the other two shortcomings as well. For a base station point process Φ, we define the conditional SIR distribution at location x as

$\displaystyle P(x)=\mathbb{P}(\text{SIR}(x,t)>\theta\mid\Phi).$

Here, the probability is taken over the small-scale fading, which eliminates the dependence on time (assuming temporal ergodicity of the fading process, which means that the ensemble average here could be replaced by a time average over a suitable long period). If shadowing is present, it can be incorporated in Φ. Then, introducing a reliability threshold ν, we declare

$\displaystyle \{x \text{ covered}\}\quad\Leftrightarrow\quad \{P(x)>\nu\}$

The reliability threshold ν appears naturally in this definition. The probability that P(x)>ν is the meta distribution of the SIR, since it is the distribution of the conditional SIR distribution given Φ. For stationary and ergodic point processes Φ, it does not depend on x and gives the area fraction that is covered at SIR threshold θ and reliability threshold ν. The figure below shows a coverage map where the colors indicate the reliability threshold at which locations are covered, from dark blue (ν close to 0) to bright yellow (ν close to 1).
So if P(SIR>θ) is not the coverage probability, what is it? It is simply the complementary cumulative distribution (ccdf) of the SIR, often interpreted as the success probability of a transmission.