In this post we contrast the meta distribution of the SIR with the standard SIR distribution. The model is the standard downlink Poisson cellular network with Rayleigh fading and path loss exponent 4. The base station density is 1, and the users form a square lattice of density 5. Hence there are 5 users per cell on average.
We assume the base stations transmit to the users in their cell at a rate such that the messages can be decoded if an SIR of θ =-3 dB is achieved. If the user succeeds in decoding, it is marked with a green square, otherwise red. We let this process run over many time slots, as shown next.
The SIR meta distribution captures the per-user statistics, obtained by averaging over the fading, i.e., over time. In the next figure, the per-user reliabilities are illustrated using a color map from pure red (0% success) to pure green (100% success).
The SIR meta distribution provides the fractions of users that are above (or below) a certain reliability threshold. For instance, the fraction of users that are at least dark green in the above figure.
In contrast, the standard SIR distribution, sometimes misleadingly called “coverage probability”, is just a single number, namely the average reliability, which is close to 70% in this scenario (see, e.g., Eqn. (1) in this post). Since it is obtained by averaging concurrently over fading and point process, the underlying network structure is lost, and the only information obtained is the average of the colors in Fig. 3:
The 70% reliability is that of the typical user (or typical location), which does not correspond to any user in our network realization. Instead, it is an abstract user whose statistics correspond to the average of all users.
Acknowledgment: The help of my Ph.D. student Xinyun Wang in writing the Matlab program for this post is greatly appreciated.
Today’s blog is about realistic communication, i.e., what kind of performance can realistically be expected of a wireless network. To get started, let’s have a look at an excerpt from a recent workshop description:
“Future wireless networks will have to support many innovative vertical services, each with its own specific requirements, e.g.
End-to-end latency of 1 ns and reliability higher than 99.999% for URLLCs.
Terminal densities of 1 million of terminals per square kilometer for massive IoT applications.
Per-user data-rate of the order of Terabit/s for broadband applications.”
Let’s break this down, bullet by bullet.
First bullet: In 1 ns, light travels 30 cm in free space. So “end-to-end” here would mean a distance of at most 10 cm, to leave some fraction of a nanosecond for encoding, transmission, and decoding. But what useful wireless service is there where transceivers are within at most 10 cm? Next, a packet loss rate of 10-5 means that the spectral efficiency must be very low. Together with a latency constraint of 1 ns, ultrahigh bandwidths must be used, which, in turn, makes the design of circuitry and antenna arrays extremely challenging. At least the channel can be expected to be benign (line-of-sight).
Where does stochastic geometry come in? Assuming that these ultrashort links live in a network and not in isolation, interference will play a role. Let us consider a Poisson bipolar network with normalized link distance 1, a path loss exponent α and Rayleigh fading. What is the maximum density of links that can be supported that have an outage of at most ε? This quantity is known as the spatial outage capacity (SOC). For small ε, which is our regime of interest here, we have
where δ=2/α and cδ is a constant that only depends on the path loss exponent 2/δ. ρ is the spectral efficiency (in bits/s/Hz or bps/Hz). This shows the fundamental tradeoff between outage and spectral efficiency: Reducing the outage by a factor of 10 reduces the rate of transmission by the same factor if the same link density is to be maintained. Compared to a more standard outage constraint of 5%, this means that the rate must be reduced by a factor 5,000 to accommodate the 99.999% reliability requirement. Now, say we have 0.5 ns for the transmission of a message of 50 bits, the rate is 100 Gbps. Assuming a very generous spectral efficiency of 100 bps/Hz for a system operating at 5% outage, this means that 100 Gbps must be achieved at a spectral efficiency of a mere 0.02 bps/Hz. So we would need 5 THz of bandwidth to communicate a few dozen bits over 10 cm. Even relaxing the latency constraint to 1 μs still requires 5 GHz of bandwidth.
In cellular networks, the outage-rate relationship is governed by a very similar tradeoff. For any stationary point process of base stations and Rayleigh fading, the SIR meta distribution asymptotically has the form
where Cδ again depends only on the path loss exponent. This is the fraction of users who achieve a spectral efficiency of ρ with an outage less than ε, remarkably similar to the bipolar result. To keep this fraction fixed at, say, 95%, again the spectral efficiency needs to be reduced in proportion to a reduction of the outage constraint ε.
Second bullet: Per the classification and nomenclature in a dense debate, this density falls squarely in the tremendously dense class, above super-high density and extremely high density. So what do the anticipated 100 devices in an average home or 10,000 devices in an average parking lot do? What kind of messages are they exchanging or reporting to a hub? How often? What limits the performance? These devices are often said to be “connected“, without any specification what that means. Only once this is clarified, a discussion can ensue whether such tremendous densities are realistic.
Third bullet: Terabit-per-second (Tbps) rates require at least 10 GHz of spectrum, optimistically. 5G in its most ambitious configuration, ignoring interference, has a spectral efficiency of about 50 bps/Hz, and, barring any revolutionary breakthrough, more than 100 bps/Hz does not appear feasible in the next decade. Similarly, handling a signal 10 GHz wide would be an order of magnitude beyond what is currently possible. Plus such large junks of spectrum are not even available at 60 GHz (the current mm-wave bands). At 100 GHz and above, link distances are even more limited and more strongly subject to blockages, and analog beamforming circuitry becomes much more challenging and power-hungry. Most importantly, though, peak rates are hardly achieved in reality. In the 5G standard, the user experienced data rate (the rate of the 5-th percentile user) is a mere 1% of the peak rate, and this fraction has steadily decreased over the cellular generations:
So even if 1 Tbps peak rates became a reality, users would likely experience between 1 Gbps to at most 10 Gbps – assuming their location is covered, which may vary over short spatial scales. Such user percentile performance can be analyzed using meta distributions.
In conclusion, while setting ambitious goals may trigger technological advances, it is important to be realistic of what is achievable and what performance the user actually experiences. For example, instead of focusing on 1 Tbps peak rates, we could focus on delivering 1 Gbps to 95% of the users, which may still be very challenging but probably achievable and more rewarding to the user. And speaking of billions of “connected devices” is just marketing unless it is clearly defined what being connected means.
For more information on the two analytical results above, please see this paper (Corollary 1) and this paper (Theorem 3).
In performance analyses of wireless networks, we frequently encounter expectations of the form
called average (ergodic) spectral efficiency (SE) or mean normalized rate or similar, in units of nats/s/Hz. For networks models with uncertainty, its evaluation requires the use stochastic geometry. Sometimes the metric is also normalized per area and called area spectral efficiency. The SIR is expressed in the form
with Φ being the point process of interferers. There are several underlying assumption made when claiming that (*) is a relevant metric:
It is assumed that codewords are long enough and arranged in a way (interspersed in time, frequency, or across antennas) such that fading is effectively averaged out. This is reasonable for several current networks.
It is assumed that desired signal and interference amplitudes are Gaussian. This is sensible since if a decoder is intended for Gaussian interference, then the SE is as if the interference amplitude were indeed Gaussian, regardless of its actual distribution.
Most importantly and questionably, taking the expectation over all fading random variables hx implies that the receiver has knowledge of all of them. Gifting the receiver with all the information of the channels from all interferers is unrealistic and thus, not surprisingly, leads to (*) being a loose upper bound on what is actually achievable.
So what is a more realistic and accurate approach? It turns out that if the fading in the interferers’ channels is ignored, i.e., by considering
we can obtain a tight lower bound on the SE instead of a loose upper bound. A second key advantage is that this formulation permits a separation of temporal and spatial scales, in the sense of the meta distribution. We can write
is a purely geometric quantity that is fixed over time and cleanly separated from the time-varying fading term hy. Averaging locally (over the fading), the SE follows as
which is a function of (conditioned on) the point process. For instance, with Rayleigh fading,
where Ei1 is an exponential integral. The next step is to find the distribution of ρ to calculate the spatial distribution of the SE – which would not be possible from (*) since it is an “overall average” that lumps all randomness together. In the case of Poisson cellular networks with nearest-base station association and path loss exponent 2/δ, a good approximation is
Here s* is given by
and γ is the lower incomplete gamma function. This approach lends itself to extensions to MIMO. It turns out that the resulting distribution of the SE is approximately lognormal, as illustrated in Fig. 1.
For SISO and δ=1/2 (a path loss exponent of 4), this (approximative) analysis shows that the SE achieved in 99% of the network is 0.22 bits/s/Hz, while a (tedious) simulation gives 0.24 bits/s/Hz. Generally, for small ξ, 1/ln(1/ξ) is achieved by a fraction 1-ξ of the network. As expected from the discussion above, this is a good lower bound.
In contrast, using the SIR distribution directly (and disregarding the separation of temporal and spatial scales), from
we would obtain an SE of only log2(1.01)=0.014 bits/s/Hz for 99% “coverage”, which is off by a factor of 16! So it is important that coverage be gleaned from the ergodic SE rather than a quantity subject to the small-scale variations of fading. See also this post.
The take-aways for the ergodic spectral efficiency are:
Avoid mixing time and spatial scales by expecting first over the fading and separately over the point process; this way, the spatial distribution of the SE can be obtained, instead of merely its average.
Avoid gifting the receiver with information it cannot possibly have; this way, tight lower bounds can be obtained instead of loose upper bounds.
Interference is the key performance-limiting factor in wireless networks. Due to the many unknown parts in a large network (transceiver locations, activity patterns, transmit power levels, fading), it is naturally modeled as a random variable, and the (only) theoretical tool to characterize its distribution is stochastic geometry. Accordingly, many stochastic geometry-based works focus on interference characterization, and some closed-form expressions have been obtained in the Poisson case.
If the path loss law exhibits a singularity at 0, such as the popular power-law r-α, the interference (power) may not have a finite mean if an interferer can be arbitrarily close to the receiver. For instance, if the interferers form an arbitrary stationary point process, the mean interference (at an arbitrary fixed location) is infinite irrespective of the path loss exponent. If α≤2, the interference is infinite in an almost sure sense.
This triggered questions about the validity of the singular path loss law and prompted some to argue that a bounded (capped) path loss law should be used, with α>2, to avoid such divergence of the mean. Of course the singular path loss law becomes unrealistic at some small distance, but is it really necessary to use a more complicated model that introduces a characteristic distance and destroys the elegant scale-free property of the singular (homogeneous) law?
The relevant question is to which extent the performance characterization of the wireless network suffers when using the singular model.
First, for practical densities, there are very few realizations where an interferer is within the near-field, and if it is, the link will be in outage irrespective of whether a bounded or singular model is used. This is because the performance is determined by the SIR, where the interference is in the denominator. Hence whether the interference is merely large or almost infinite makes no difference – for any reasonable threshold, the SIR will be too small for communication. Second, there is nothing wrong with a distribution with infinite mean. While standard undergraduate and graduate-level courses rarely discuss such distributions, they are quite natural to handle and pose no significant extra difficulty.
That said, there is a quantity that is very useful when it has a finite mean: the interference-to-(average)-signal ratio ISR, defined as
where x0 is the desired transmitter and the other points of Φ are interferers. The hx are the fading random variables (assumed to have mean 1), only present in the numerator (interference), since the signal power here is averaged over the fading. Taking the expectation of the ISR eliminates the fading, and we arrive at the mean ISR
which only depends on the network geometry. It follows that the SIR distribution is
where h is a generic fading random variable. If h is exponential (Rayleigh fading) and the MISR is finite,
Hence for small θ, the outage probability is proportional to θ with proportionality factor MISR. This simple fact becomes powerful in conjunction with the observation that in cellular networks, the SIR distributions (in dB) are essentially just shifted versions of the basic SIR distribution of the PPP (and of each other).
In Fig. 1, the blue curve is the standard SIR ccdf of the Poisson cellular network, the red one is that of the triangular lattice, which has the same shape but shifted by about 3 dB, with very little dependence on the path loss exponent. The other two curves may be obtained using base station silencing and cooperation, for instance. Since the shift is almost constant, it can be determined by calculating the ratios of the MISRs of the different deployments or schemes. The asymptotic gain relative to the standard Poisson network (as θ→0) is
The MISR in this expression is the MISR for an alternative deployment or architecture. The MISR for the PPP is not hard to calculate. Extrapolating to the entire distribution by applying the gain everywhere, we have
This approach of shifting a baseline SIR distribution was proposed here and here. It is surprisingly accurate (as long as the diversity order of the transmission scheme is unchanged), and it can be extended to other types of fading. Details can be found here.
Hence there are good reasons to focus on the reversed SIR, i.e., the ISR.
Naturally the locations of wireless transceivers are modeled as a point process on the plane or perhaps in the three-dimensional space. However, key quantities that determine the performance of a network do not directly nor exclusively depend on the locations but on the received powers. For instance, a typical SIR expression (at the origin) looks like
where y is the location of the intended transmitter and Φ is the point process of interferers. Px and hx are the transmit powers and fading coefficients of x, respectively. It is apparent that what matters are the distances raised to some power, not the locations themselves. So instead of working with Φ⊂ℝ2, we can focus on the one-dimensional process
called the path loss point process (PLPP) (with fading). The reason why the positive exponent α is preferred over -α is that otherwise the resulting point process is no longer locally finite (assuming Φ is stationary) since infinitely many points would fall in the interval [0,ε] for any ε>0. Transmit power levels could be included as displacements, either deterministically or randomly.
Path loss processes are particularly useful when Φ is a PPP. By the mapping and displacement theorems, the PLPPs are also PPPs whose intensity function is easy to calculate. For a stationary PPP Φ of intensity λ and iid fading, the intensity function of Ψ is
where h is a generic fading random variable. If h has mean 1, then for δ<1, which is necessary to keep the interference finite, 𝔼(hδ)<1 from Jensen’s inequality, hence the effect of fading is a reduction of the intensity function by a fixed factor.
As an immediate application we observe that fading reduces the expected number of connected nodes, defined as those whose received power is above a certain threshold, by the δ-th moment of the fading coefficients.
More importantly, PLPPs lead to two key insights for Poisson cellular networks. Let us assume the elements of Ψ are ordered and denoted as ξ1<ξ2<… . Then the SIR with instantaneously-strongest base station association (ISBA) is
First, it is not hard to show that for ISBA with Rayleigh fading, the SIR distribution does not depend on the density of the underlying PPP. But since the effect of fading is but a scaling of the density, it follows that the SIR distribution does not depend on the fading statistics, either. In particular, the result for Rayleigh fading also applies to the non-fading case (where ISBA corresponds to nearest-base station association, NBA), which is often hard to analyze in stochastic geometry models.
Second, the intensity function of the PLPP also shows that the SIR performance of the heterogeneous independent Poisson (HIP) model is the same as that of the simple PPP model. The HIP model consists of an arbitrary number n of tiers of base stations, each modeled as an independent PPP of arbitrary densities λk and transmitting at arbitrary (deterministic) power levels Pk. The point process of inverse received powers (i.e., the PLPP with transmit powers included) from tier k has intensity
Since the superposition of n PPPs is again a PPP, the overall intensity is just the sum of the μk, which is still proportional to rδ-1. This shows that the SIR performance (with ISBA or NBA) of any HIP model is the same as that of just a single PPP.
Let us consider a cellular network with Poisson distributed base stations (BSs). We assume that in each Voronoi cell, one user is located uniformly at random (and independently across cells), and, naturally, the user is connected to the nucleus of that cell. This is the user point process of type I defined in this article. In this case, the typical user, named Alice, does reside in the typical cell since there is no size-biased sampling involved in defining the typical user. The downlink SIR performance of this network has been analyzed here (SIR distribution) and here (SIR meta distribution).
Suddenly and unfortunately, Alice’s serving BS is malfunctioning. Her downlink is, well, down, and she gets reconnected to the next-nearest one. How does that affect her SIR performance? In another network, also with Poisson BSs, lives another type of typical user, namely that of a stationary point process of users that is independent of the base station process. This typical user’s name is Bob. Bob’s SIR performance is the same as the one measured at an arbitrary deterministic location on the plane, as discussed in this post. He resides in the 0-cell, not the typical cell. So in his case, the typical user does not reside in the typical cell.
Noticing that Alice’s original BS ceased to operate, Bob says: “I think now your SIR performance is the same as mine. After all, your cell was formed by adding a BS at the origin, while my network has no such BS. With that added BS removed, we are in the same situation.”
Alice responds: “You may have a point, but I am not sure that my location is uniform in the 0-cell, as yours is.”
Bob: “Good point, but wouldn’t that be the natural conjecture?”
Alice: “I am not sure. How about we verify? Let’s look at the distances.”
Bob: “Ok. For me, if Bn is the distance to my n-th nearest BS, we have
Alice: “For me, the distance A1 to the malfunctioning BS satisfies π𝔼(A12)≈10/13, by the properties of the typical cell. If you are correct, then the distribution of A2 should be the same as that of B1. But that’s not the case, see this figure.”
Bob: “I see – your new serving BS at distance A2 is quite a bit further away than mine at B1. So my conjecture was wrong.“
Alice: “Yes, but the question of how resilient a cellular network is to BS outages is an interesting one. How about we compare the SIR performance with and without BS outage in different networks, say Poisson networks and lattice networks? I bet Poisson networks are more robust, in the sense that the downlink SIR statistics change less when there is an outage and users need to be handed off to the next-nearest BS.“
Bob: “Hmm… that would make sense. But would that mean we should build clustered networks, to achieve even higher robustness?“
Alice: “Possibly – if all we worry about is a small loss when a user is offloaded. But we should take into account the absolute performance also, and clustered networks are worse in this regard. If the starting performance is much higher, it is acceptable to have a somewhat bigger loss due to outage and handover.“
Bob: “Makes sense. Sorry, I have to go. My SIR is so high that I just got a phone call.“
Let us consider a hypothetical situation where the authors of a paper promise the following:
“In the next figure, we compare our Quantum Ultra-Enhanced Superior-Throughput Intelligent Objectively-Novel Adaptive Beamformed Low-Latency Emission (QUESTIONABLE) scheme with the Arbitrarily-Massive Antenna Zero-Innovation Neural Grandiloquence (AMAZING) scheme, which is the best previously known scheme. As the figure shows, the QUESTIONABLE scheme outperforms the AMAZING scheme by 6122% at a threshold of 2 dB.”
They refer to this figure:
A gain of more than 6,000% sounds, well, amazing. But it also questionable: Is it good practice to state this figure of 6122%? Is it good marketing? Is it acceptable? Should it be discouraged? Is it even unethical? If claiming 6122% at 2 dB is acceptable, how about a 25119% gain at 10 dB? Here the absolute reliability of AMAZING is 6.3e-9, and that of QUESTIONABLE is 1.60e-6. Not a practical regime, but a more than 25000% gain nonetheless.
Conversely, at -1.5 dB, the improvement is from 0.996 (which is AMAZING) to 0.9999999 (which is QUESTIONABLE). This is a modest 0.4% improvement both in absolute and relative terms. However, focusing on outage instead of reliability, the factor between the two outage probabilities is 62763, or 6,276,267%. So the 0.4% gap becomes a 6M% gap (that’s mega-percent, admittedly an usual combination of a metric prefix with a unit).
Independently of such amazing and questionable transmission schemes, is an improvement from 10% to 13% a 3% improvement or a 30% improvement? Frequently we see such a gain reported as a 30% improvement. Is it fair to use the ratio to measure improvement, or should it be the gap (difference)? Equivalently, should the percentage improvement be measure in percent of the underlying quantity (in which case it is 3%) or in percent of the percentages (in which case it is 30%)? The problem is that terms like “gain”, “increase”, “improvement”, “boost”, “raise” are imprecise as they may refer to differences or ratios.
How about an improvement from 10 dB to 13 dB? Here, most people would agree that the gain is 3 dB. The reason why there is no ambiguity here is that 3 dB is both the difference 13 dB-10 dB and also the ratio of the underlying quantities expressed in dB, 20/10. Thanks to the logarithm in the dB scale, both lead to the same result. With percentages, this is not the case.
The ambiguity also manifests itself when we compare percentages with physical units such as meters or seconds. The difference between 10 m and 13 m is 3 m, and 13 m is 30% longer than 10 m. Percentage gains always measure ratios in this case. But if “meter” is replaced by “percent”, the situation becomes murky. By analogy then the difference between 10% and 13% is 3%, so 13% is 3% more than 10%, in the same way that 13 m is 3 m more than 10 m. But at the same time, 13% is 30% more than 10%, in the same way that 13 m is 30% more than 10 m.
So how do we resolve this? Here are some suggestions:
Always state whether we refer to ratios or difference. For instance, instead of saying “The increase is 30%” say the “increase is a factor of 1.3 or 30%”.
Avoid ambiguity by using fractions instead of percentages. For instance, the difference between 0.1 and 0.13 is 0.03, and the ratio is 1.3.
Avoid aggressive marketing when quantities are in very impractical regimes. For instance, in the example above, advertising huge gains at reliabilities below 1e-5 is not good practice. Conversely, in regimes of interest, such as the high-reliablity regime, it is perfectly acceptable to report a gain of 100 in the outage performance if the outage probability is reduced from 0.01 to 0.0001. (However, talking about 10,000% is not recommended.)
Consider using horizontal distances (gaps) instead of vertical ones. Especially with steep (almost step-like) curves like the one in Fig. 1, the vertical gap strongly depends on where it is measured along the x axis. Conversely, the horizontal gap may be fairly constant. In the case of Fig. 1, the horizontal gap is, in fact, constant. It is exactly 3 dB at all reliabilities. A quasi-constant gap is often observed in SIR distributions (cdfs) of cellular networks when different transmission schemes are compared. The theory behind is explained in this letter (here on IEEE Xplore) and in this paper (here on IEEE Xplore). In this case, the gains are either factors (in linear scale) or gaps (in the dB) scale. This method of reporting gains is the one adopted by the coding theory community – coding gains are reported in dB (horizontally), not as gains in the bit error performance (vertically).
Network slicing is a warm topic these days. Here we discuss cell slicing, where a polygon is cut in three pieces (sub-polygons) by two lines through its nucleus and a random point, respectively. First, as a sequel to this post, we focus on the 0-cell in the Poisson-Voronoi tessellation, which is the Voronoi cell of a PPP that contains the origin. As in that earlier post, we rotate and shift the 0-cell so that its nucleus is at the origin and the point located uniformly randomly in the cell is to the right (on the positive x axis). In a cellular network application, the nucleus (now at the origin) is the location of the base station, while the random point is the typical user. Then we draw vertical lines through the origin and the random point and calculate the areas of the polygons to the left of the origin and to the left of the random point. This movie shows a number of realizations of this setup:
For intensity λ=1, the 0-cell in the Poisson-Voronoi tessellation has a mean area of 1.280176, obtained by the numerical evaluation of an integral. The polygon to the left of the nucleus o, shown in green in the movie, has a mean area of 0.517649, also obtained by numerical integration. The red polygon, to the right of o and to the left of x, has a mean area of 0.529111, obtained by simulation. Relative to the total mean area, we thus have:
Mean area to the left of o: 40.4%
Mean area to the left of x: 81.8%
Mean area to the right of x: 18.2%
Hence, on average, almost 60% of the (other) users in the cell are on the same side of the base station as the typical user, and the mean area to the left of the typical user is larger than the entire mean area of the typical cell (which is 1 for λ=1). Also, more than 18% of the users are “behind” the typical user.
Not surprisingly, the area fractions are quite similar in the typical Poisson-Voronoi cell. (Note that the uniformly random point in the typical cell does not correspond to the typical user of a user point process that is independent of the base station process – see this post). These percentages (area fractions) for the typical cell are all simulated:
Mean area to the left of o: 39.8%
Mean area to the left of x: 81.3%
Mean area to the right of x: 18.7%
How unusual are these area fractions? Let us compare them with those in the disk, which is, in some sense, the ideal cell shape. In the disk, with the nucleus o at the center, the mean area fraction to the left of o is trivially 1/2, while the area to the left of a uniformly random point is easily determined to be 7/8. This shows that the (roughly) 2/5 – 2/5 – 1/5 split in the typical Poisson-Voronoi cells is relatively far from the 1/2 – 3/8 – 1/8 split of the disk. How about regular polygons with a finite number of sides? The second movie shows some realizations for 3,4, … ,10,12,15,20,32 sides.
As expected, for a larger number of sides the area fractions approach those of the disk. Here is a plot of the relative deviations to the disk of regular polygons, where o is the centroid:
For instance, for the triangle, the area fraction to the left of o is 0.516, which is 3.2% larger than for the disk. Hence the first blue point in Fig. 1 (top left), corresponding to 3 sides, is at 3.2. To better see how the deviations behave for 5 or more sides, here is another version of the figure that shows only pentagons and higher.
From the blue curve it is apparent that the area to the left of the center is always exactly one half if the number of sides is even. The reason is that with an even number of sides, the polygons to the left of o and to the right of o are always congruent irrespective of the location of the random point. We also see that the hexagon is quite close to the disk already, with a mere 0.15% deviation from the 7/8 area fraction (to the left of x) of the disk.
In this post I would like to show how meta distributions naturally emerge as an important extension of the concepts of averages and distributions. For a random variable Z, we call 𝔼(Z) its average (or mean). If we add a parameter z to compare Z against and form the family of random variables 1(Z>z), we call their mean the distribution of Z (to be precise, the complementary cumulative distribution function, ccdf for short). Now, if Z does not depend on any other randomness, then 𝔼1(Z>z) gives the complete information about all statistics of Z, i.e., the probability of any event can be expressed by adding or subtracting these elementary probabilities. However, if Z is a function of other sources of randomness, then 𝔼1(Z>z) does not reveal how the statistics of Z depend on those of the individual random elements. In general Z may depend on many, possibly infinitely many, random variables and random elements (e.g., point processes), such as the SIR in a wireless network. Let us focus on the case Z=f(X,Y), where X and Y are independent random variables. Then, to discern how X and Y individually affect Z, we need to add a second parameter, say x, to extend the distribution to the meta distribution:
Hence the meta distribution (MD) is defined by first conditioning on part of the randomness. It has two parameters, the distribution has one parameter, and the average has zero parameters. There is a natural progression from averages to distributions to meta distributions (and back), as illustrated in this figure:
From the top going down, we obtain more information about Z by adding indicators and parameters. Conversely, we can eliminate parameters by integration (taking averages). Letting U be the conditional ccdf given Y, i.e., U=𝔼X1(Z>z)=𝔼[1(Z>z) | Y], it is apparent that the distribution of Z is the average of U, while the MD is the distribution of U.
Let us consider the example Z=X/Y , where X is exponential with mean 1 and Y is exponential with mean 1/μ, independent of X. The ccdf of Z is
In this case, the mean 𝔼(Z) does not exist. The conditional ccdf given Y is the random variable
and its distribution is the meta distribution
As expected, the ccdf of Z is retrieved by integration over x∈[0,1]. This MD has relevance in Poisson uplink cellular networks, where base stations (BSs) form a PPP Φ of intensity λ and the users are connected to the nearest BS. If the fading is Rayleigh fading and the path loss exponent is 2, the received power from a user at an arbitrary location is S=X/Y, where X is exponential with mean 1 and Y is exponential with mean 1/(λπ), exactly as in the example above. Hence the MD of the signal power S is
So what additional information do we get from the MD, compared to just the ccdf of S? Let us consider a realization of Φ and a set of users forming a lattice (any stationary point process of users would work) and determine each user’s individual probability that its received power exceeds 1:
If we draw a histogram of all the user’s probabilities (the numbers in the figure), how does it look? This cannot be answered by merely looking at the ccdf of S. In fact ℙ(S>1)=π/(π+1)≈0.76 is merely the average of all the numbers. To know their distribution, we need to consult the MD. From (1) the MD (for λ=1 and z=1) is 1-xπ. Hence the histogram of the numbers has the form of the probability density function πxπ-1. In contrast, without the MD, we have no information about the disparity between the users. Their personal probabilities could all be well concentrated around 0.76, or some could have probabilities near 0 and others near 1. Put differently, only the MD can reveal the performance of user percentiles, such as the “5% user” performance, which is the performance that 95% of the users achieve but 5% do not. This interpretation of the MD as a distribution over space for a fixed realization of the point process is valid whenever the point process is ergodic.
When simulating a point process to characterize the performance of the typical point (typical user or receiver), a conditioned version of the point process given a point at the origin o may not be available. It is then tempting to choose the “next best” point as a substitute, which may be the point closest to the origin. (Whether the coordinates are then shifted so that this point is at o is irrelevant.) The goal of this post is to show that this point is not typical, i.e., producing many realizations of the point process and evaluating the performance at this point does not yield the performance of the typical point. I call the point closest to o after averaging over the point process the 0-point. Put differently, the 0-point is the typical point among all points closest to o across the realizations of the point process. In a cellular network, the 0-point is the nucleus of the 0-cell (see this post), hence the term.
For simplicity, let us consider the homogeneous PPP of intensity 1 and focus on the probability that a disk of radius r centered at a point contains no other points, which we refer to as the NOPID (no other point in disk) probability. Equivalently, it is the probability that the nearest neighbor is at distance at least r. For the typical point, the NOPID probability is exp(-πr2). For the 0-point, let D be its distance from o. Given D, the disk of radius D centered at o, denoted as b(o,D), is empty, so the points excluding the 0-point form a PPP on ℝ2\b(o,D), and the NOPID probability is the probability that b((D,0),r)\b(o,D) is empty. This region is shown in blue in the movie below for different r given that the 0-point is at (1,0), i.e., D=1. For r<2D, it is moon- or crescent-shaped, while for r>2D, it is a disk with a hole.
Letting A(r,d)=|b((d,0),r)\b(o,d)|, the (unconditioned) NOPID probability is 𝔼(exp(-A(r,D)), where D is Rayleigh distributed with mean 1/2. It can be expressed as
is the area A(r,d) for r<2d. For r>2d, A(r,d)=π(r2–d2), which results in the first term in (1).
The NOPID probabilities of the 0-point and the typical point are compared below. It is apparent that the 0-point is more isolated than the typical point.
By integrating the NOPID probability of the 0-point, we obtain the mean nearest-neighbor distance as 0.5953. This is almost 20% larger than that of the typical point, which is 1/2. The difference between the two NOPID probabilities is not just in the mean, though. They differ qualitatively in the tail. For large r, it follows from (1) that the ratio of the two NOPID probabilities approaches πr2/4. This implies that a Rayleigh distribution with adjusted mean will not provide a good fit to the NOPID probability at the 0-point.
The difference is even more pronounced if we consider directional nearest neighbors. If we consider a sector of angle π/2, then the nearest neighbor of the typical point is at distance 1 on average, irrespective of the orientation of the sector. For the 0-point, in the direction opposite from o, the mean distance is also 1, since on that side, the PPP is unaffected by the empty disk b(o,D). In the direction towards o, however, the distance is significantly larger, with a mean of 1.4205. The plot below shows the pdf of the directional nearest-neighbor distance of the 0-point oriented towards o (red) and the pdf of the directional nearest-neighbor distance of the typical point (blue), given by (π/2)r exp(-(π/4)r^2). The pdfs are the negative derivatives of the NOPIS (no other point in sector π/2) probabilities.
When applied to cellular networks (with nearest-base station association), the 0-point is the base station serving the typical user (at o). The discussion here reveals that the 0-base station behaves differently from the typical base station. In particular, the point process of the other base stations viewed from the 0-base station is highly non-isotropic. In the direction of the typical user, the nearest other base station is much further away than in the opposite direction. This fact is consistent with the conclusions from this post on the shape of the 0-cell in the Poisson-Voronoi tessellation.