Author: Martin Haenggi

The typical user does not reside in the typical cell

The analysis of cellular networks usually focuses on the typical user in the downlink and the typical base station (or, equivalently, the typical cell) in the uplink. It is important that if base station and user point processes are independent, the two notions of “typical” are not compatible – the typical user’s cell is statistically different from the typical cell. The difference is caused by the effect of size-biased sampling. The typical user’s performance corresponds to that of the average of all users, and there are more users in larger cells. Since a user model is not needed in the downlink as explained in this post, we can equivalently say that an arbitrary location is more likely to fall in a larger cell than a smaller cell.
The typical user’s cell, the so-called 0-cell, is the cell containing the origin, i.e., it is obtained by cell area-biased sampling, which gives larger cells more weight. As a result, the 0-cell is larger on average than the typical cell, which is the cell of the base station conditioned to be at the origin. The statistical properties of the typical cell correspond to the averages of all cells.

Such size-biased sampling is not restricted to cellular networks or stochastic geometry. If we throw a dart blindly on a world map until we hit land, the country we hit is quite likely to be a big one. In fact, there is a 50% chance that the dart lands on one of the 10 largest countries. Similarly, the typical country has 40 M inhabitants on average, but the typical person is likely to live in a country with more than 100 M people. The typical dollar is quite likely owned by a wealthy person, while the typical person is probably not rich. The typical human hair is likely to grow on a person with full hair, while the typical person has a 5-10% chance of being bald. The typical animal leg has a decent chance of belonging to a millipede or centipede, while the typical animal is very unlikely to have more than six feet.

Coming back to cellular networks, let us focus on a concrete example that is fully tractable in terms of the cell area distributions. Consider the lattice with holes shows in Fig. 1 below, obtained from a square lattice of density 1 by removing the four nearest neighbors of each 16th point. It is periodic with period 4 in both directions, its density is λ=3/4, and it has four different types of cells, with three different areas, 1, 3/2, and 2.

Fig. 1: Lattice hole process with density 3/4. Base stations (cell nuclei) are marked by red triangles, and those whose nearest neighbors were removed by stars, which makes these cell large. The grey box indicates one period of the lattice.

The typical cell has area 1 with probability 5/12, area 3/2 with probability 1/2, and area 2 with probability 1/12. The mean area follows as E(A)=5/12+1/2 3/2+1/12 2=4/3, which corresponds to 1/λ.

Now assume a stationary square lattice of density 1 as the user point process. Then the cells of area 1 always contain 1 user and those of area 2 always contain 2 users. Those of area 3/2 have 1 user or 2 users, each with probability 1/2. Deconditioning on the cell areas, we obtain the distribution of the number of users U in the typical cell as P(U=1)=2/3 and P(U=2)=1/3, for a mean number of users E(U)=4/3, which equals the mean area times the user density (chosen to be 1 here).

How about the typical user’s cell? This is where the size bias plays a role. The distribution of the area A0 of the 0-cell is P(A0=1)=5/16, P(A0=3/2)=9/16, and P(A0=2)=1/8. These are the fractions of the plane covered by cells of areas 1, 3/2, and 2. The mean area is E(A0)=45/32, which is about 5.5% bigger than the mean area of the typical cell. The number of users U0 in the 0-cell is distributed as P(U0=1)=5/16+1/2 9/16=19/32, P(U0=2)=1/2 9/16+1/8=13/32, resulting in a mean of E(U0)=45/32, which is the user density times the mean area. The mean also follows from the general formula

\displaystyle \mathbb{E}(f(V_0))=\frac{\mathbb{E}(A f(V))}{\mathbb{E}(A)}=\lambda\mathbb{E}(A f(V)).

where V is the typical cell, V0 the 0-cell, and f is a non-negative function on compact sets. Applied to our setting, where f(V) is the number of users in V, we obtain

\displaystyle \mathbb{E}(U_0)=\frac{\mathbb{E}(A^2)}{\mathbb{E}(A)}=\lambda\mathbb{E}(A^2).

Since the user density is 1, this is also the mean area of A0. For the number of sides S, we have E(S)=19/4, but E(S0)=155/32, which is bigger by 3/32.
At the end of this post are three more examples of similarly constructed lattices. In each case, the points within a certain distance of a sub-lattice are removed.

So the typical user is not served by the typical base station, and the typical base station does not serve the typical user. One way to reconcile the two is to define a user point process where a fixed number of users, say one, is placed uniformly at random in each cell. Such a user process is of course no longer independent of the base station process.

For Poisson distributed base stations, the 0-cell is 28% larger than the typical cell. Its mean number of sides is 6.41, whereas the typical cell has 6 sides on average. Hence the 0-cell is not just an enlarged version of the typical cell but also has a different shape. Accordingly, the distance from the nucleus of the typical cell to a random point in the cell is not Rayleigh distributed as it is in the 0-cell. Also, if users form a PPP of density 1, the typical user’s cell has 1+1.28/λ users on average (there is one extra user due to the conditioning of a user to be at the origin), while the typical cell only has a mean of 1/λ users.

Size-biased sampling is important in other wireless networks as well. If a vehicular network is modeled by placing one-dimensional Poisson point processes (cars) on line segments (streets) of independent random length (which is a Cox process supported on line segments), then the typical vehicle’s street length distribution fL0 is different from the length distribution f_L of the streets. By length-biased sampling, the two are related as

\displaystyle f_{L_0}(x)=\frac{xf_L(x)}{\mathbb{E}(L)}.

For example, if L is exponential with mean 1, then L0 is gamma distributed with mean 2. The same situation arises in the interarrival intervals of a one-dimensional PPP (of density 1). The typical such interval is exponential with mean 1, but the interval containing the origin (or any other deterministic time instant) has a mean length of 2. This is sometimes referred to as the waiting time paradox, although there is nothing paradoxical about it – it is just size-biased sampling.

Lastly, as promised, here are three more examples of lattices with increasingly large holes.

Fig. 2: Lattice with period 6. λ=1/6; P(A=1)=1/6; P(A=89/15)=2/3; P(A=169/15)=1/6; E(A)=6.
P(A0=1)=1/36; P(A0=89/15)=89/135; P(A0=169/15)=169/540; E(A0)=6047/810 ≈ 7.46.
Fig. 3: Lattice with period 8. λ=7/32; P(A=1)=5/14; P(A=7/2)=2/7; P(A=29/4)=2/7; P(A=16)=1/14; E(A)=32/7. P(A0=1)=5/64; P(A0=7/2)=7/32; P(A0=29/4)=29/64; P(A0=16)=1/4; E(A0)=2081/256. Here the 0-cell is almost twice as big on average than the typical cell. It corresponds to the largest cell with probability 1/4, but only 1/14 of the cells are largest ones.
Fig. 4: Lattice with period 16. λ=7/128; P(A=1)=5/14; P(A=15/2)=2/7; P(A=33)=2/7; P(A=89)=1/14; E(A)=128/7. P(A0=1)=5/256; P(A0=15/2)=15/128; P(A_0=33)=33/64; P(A0=89)=89/256; E(A0)=12507/256 48.9. The 0-cell is more than 2.5 times larger than the typical one on average.

A dense debate

In recent years, many research efforts were dedicated towards modeling and analyzing denser and denser wireless networks, in terms of the number of devices per km2 or m2. The terminology used ranges from “ultradense” to “hyperdense”, “massively dense”, and “extremely dense”.

IEEE Xplore lists more than 500 journal papers on “ultradense” networks, 25 on “hyperdense” networks (generally published more recently than the ultradense ones), and about 90 on “extremely dense” networks. There even exist 15 on “massively dense” networks. The natural question is how they are ordered. Is “ultradense” denser than “hyperdense” or vice versa? How does “extremely dense” fit in? Are there clear definitions what the different levels of densities mean? And what term do we use when networks get even denser?

Perhaps we can learn something from the terminology used for frequency bands. There is “high frequency” (HF), “very high frequency” (VHF), “ultrahigh frequency” (UHF), followed by “super high frequency” (SHF), “extremely high frequency” (EHF), and “tremendously high frequency” (THF). The first five each span an order of magnitude in frequency (or wavelength), while the last ones spans two order of magnitude, from 300 GHz to 30 THz.
So how about we follow that approach and classify network density levels as follows:

HD: 1-10 km-2
VHD: 10-100 km-2
UHD: 100-1’000 km-2
SHD: 1’000-10’000 km-2
EHD: 10’000-100’000 km-2
THD: 0.1-10 m-2

So, who will be the first to write a paper on tremendously dense networks?

What comes after THD? Not unexpectedly, there is a mathematical answer to that question. A dense set has a well-defined meaning. So in the super tremendously extreme case, we can just say that the devices are dense on the plane, without further qualification. This is achieved, for instance, by placing a device at each location with rational x and y coordinates. This is a dense network model, and almost surely there is no denser one.

Tractable, closed-form, exact

The attributes “tractable”, “closed-form”, and “exact” are frequently used to describe analytical results and, in the case of “tractable”, also models. At the time of writing, IEEE Xplore lists 4540 journal articles with “closed-form” and “wireless” in their meta data, 650 with “tractable” and “wireless”, and 220 with “closed-form”, “wireless” and “stochastic geometry”.

Among the three adjectives, only for “exact” there is general consensus what is means exactly. For “closed-form”, mathematicians have a clear definition: The expression can only consist of finite sums and products, division, roots, exponentials, logarithms, trigonometric and hyperbolic functions and their inverses. Many authors are less strict, using the term also for expressions involving general transcendent functions or infinite sums and products. Lastly, the use of “tractable” varies widely. There are “tractable results”, “tractable models”, “tractable analyses”, and “tractable frameworks”.

“Tractable” is defined by Merriam-Webster as “easily handled, managed, or wrought”, by the Google Dictionary as “easy to deal with”, and by the Cambridge Dictionary as “easily dealt with, controlled, or persuaded”. Wikipedia refers to the mathematical use of the term: “ease of obtaining a mathematical solution such as a closed-form expression”. These definitions are too vague to clearly distinguish a “tractable model” from a “non-tractable” one, since “easy” can mean very different things to different people.

We also find combinations of the terms; in the literature, there are “tractable closed-form expressions” and even “highly accurate simple closed-form approximations”. But shouldn’t all “closed-form” expressions qualify as “tractable”? And aren’t they also “simple”, or are there complicated “closed-form” expressions?

It would be helpful to find an agreement in our community what qualifies as “closed-form”. Here is a proposal:

  • Use “closed-form” in its strict mathematical understanding, allowing only elementary functions.
  • Use “weakly closed-form” for expressions involving hypergeometric, (incomplete) gamma functions, and the error and the Lambert W functions.
  • Any result involving integrals, limits, infinite sums, or general transcendent functions such as generalized hypergeometric and Meijer G functions is not “closed-form” or “weakly closed-form” (but may exact of course).

Thus equipped, we could try to define what a “tractable model” is. For instance, we could declare a model “tractable” if it allows the derivation of at least one non-trivial exact closed-form result for the metric of interest. This way, the SIR distribution in the Poisson bipolar network with ALOHA, Rayleigh fading, and power-law path loss is tractable because the expression only involves an exponential and a trigonometric function. The SIR in the downlink Poisson cellular with Rayleigh fading and path loss exponent 4 is also tractable; its expression includes only square roots and an arctangent. In contrast, the SIR in the uplink Poisson cellular network is not tractable, irrespective of the user point process model.
A result could be termed “tractable” if the typical educated reader can tell how the expression behaves as a function of its parameters.

Going a step further, it may make sense to be more formal and introduce categories for the sharpness of a result, such as these:

A1: closed-form exact
A2: weakly closed-form exact
A3: general exact
B1: closed-form bound
B2: weakly closed-form bound
B3: general bound
C1: closed-form approximation
C2: weakly closed-form approximation
C3: general approximation

Alternatively, we could use A+, A, A-, B+, etc., inspired by the letter grading system used in the USA. We could even calculate a grade point average (GPA) of a set of results, based on the standard letter grade-to-numerical grade conversion.

Such classification allows a non-binary quantification of “tractability” of a model. If the model permits the derivation of an A1 result, it is fully “tractable”. If it only allows C3 results, it is not “tractable”. If we can obtain, say, an A3, a B2, and a C1 result, it is 50% “tractable” or “semi-tractable”. Such a sliding scale instead of a black-and-white categorization would reflect the vagueness of the general definition of the term but put it on a more solid quantitative basis. Subcategories for asymptotic results or “order-of” results could be added.

This way, we can pave the way towards the development of a tractable framework for tractability.

Are users needed in cellular networks?

Let us consider the downlink of a cellular network where base stations form a stationary and ergodic point process Φ and define the SIR at each location xR2 as

\displaystyle \text{SIR}(x)=\frac{h_{N(x),x}\ell(x-N(x))}{\sum_{y\in\Phi\setminus\{N(x)\}} h_{x,y}\ell(y-x)}.

Here N(x) is the nucleus of the Voronoi cell that x belongs to, hx,y is the fading coefficient between x and y, and ℓ is the path loss function. Due to the stationarity of Φ, the SIR statistics do not depend on the location x. In other words, any arbitrary location can be taken to be the typical location that the analysis focuses on.
Example result: If Φ is Poisson, the fading is Rayleigh, and ℓ is a power-law function with exponent 2/δ, it is known that for all x ∈ R2,

\displaystyle \qquad\qquad\mathbb{P}(\text{SIR}(x)>\theta)=\frac{1}{\,_2F_1(1,-\delta;1-\delta,-\theta)},\qquad\qquad\qquad (1)

where 2F1 is the Gauss hypergeometric function.
Since Φ is ergodic, the probability that the SIR exceeds θ is the fraction of the plane that achieves an SIR of at least θ for all realizations of Φ. This means that the probability (ensemble average) can be replaced by a spatial average over an increasingly large region. Sometimes this probability (or spatial average) is questionably called “coverage probability” (see this post), and the area fraction is termed “covered area fraction”.
It is important to note that results such as (1) do not require any specification of a point process of users. This answers the question in the title: No, users are not necessary in the downlink SIR analysis.

That said, in the literature we observe that in many cases, a point process of users is defined before such downlink SIR results are derived. The reason could be that it may seem overly abstract to consider a cellular network model devoid of any users and view the SIR as a random field on the plane. Specializing the location x to the points of a user point process (assumed independent of Φ), we observe that (1) is the SIR distribution at the typical user for any stationary point process of users. So there is nothing wrong in introducing a point process of users, focus on the typical user, and state a result such as (1). It would, however, be potentially misleading to specify the user point process to be a Poisson process, since the reader may then believe that the result only holds for Poisson distributed users.

There is one caveat when introducing a point process of users to formulate downlink results: The interpretation of the SIR distribution as the fraction of users who achieve SIR>θ in each realization of the user and base station point processes may no longer be correct, even if the two point processes are independent and stationary and ergodic. For instance, consider the case where both are stationary (i.e., randomly translated) lattices of the same intensity. Then, given the point processes, the SIR distribution at each user is the same and depends on the relative shift of the lattices. For example, if a user is very close to its serving base station, then all users are close to their serving base station, and the SIR at all users is likely to exceed θ even when θ is, say, 20 dB. In contrast, if one user is equidistant to two base stations, then all users are, and it is unlikely that the SIR (at any or all of them) exceeds 1. So averaging over the users in one realization cannot yield the same result as averaging over the point processes (ensemble averaging). But doesn’t ergodicity imply that the two results are the same? The answer is yes, it does, but individual ergodicity of the two point processes is not sufficient. Since the SIR depends on both of them jointly, they need to be jointly ergodic. This is the condition that is not met in this example scenario of two lattices.

Physical quantities and scale-free results

Sometimes we read in a paper “the link distance is d meters”. Then, a bit later, “In Fig. 3, we set d=10 m”. Putting the two together, this means that “the link distance is 10 m meters”, which of course isn’t what the authors mean. It is not hard to infer that they mean the distance is 10 m, rather than 10 m2. But then why not just say “the link distance is d” from the outset?

Physical quantities are a combination of a numerical value and a unit. If symbols refer to physical quantities, equations are valid irrespective of any metric prefixes used. For example, if the density of a two-dimensional stationary point process is λ, then the mean number of points in a disk of radius r is always λπr2. We can express the radius as r=100 m or as r=0.1 km or r=10000 cm, and the density could be λ=0.1 km-2 or λ=10-5 m-2. In contrast, if a symbol only refers to the numerical value of the physical quantity, the underlying unit (and prefix) needs to be specified separately, and the reader needs to keep it in mind. For example, writing λ=10 and r=2 does not tell us anything about the mean number of points in the disk without the units. If the density is said to be λ m-2 and the radius is said to be r km, then we can infer that there are 40,000,000π points on average in the disk, but if the radius is r m, then there are a mere 40π points.

Separating the numerical values from the units also comes at a loss in flexibility. The advantage of the SI unit system with its metric prefixes is that we can express small and large values conveniently, such as d=5 nm or f=3 THz. Hard-wiring the units by saying “the frequency is f Hz” means that we have to write f=3,000,000,000,000 instead – and expect the reader to memorize that f is expressed in Hz. One may argue that in this case it would be natural to declare that “the frequency is f THz”, but what if much lower frequencies also appear in the paper? A Doppler shift of 10 kHz would become a Doppler shift of f=0.00000001.

This leads to the next problem, which is that some quantities are naturally expressed at a different scale (i.e., with a different metric prefix) than others. Transistor gate lengths are usually expressed in nm, while lengths of fiber optic cables may be expressed in km, and many other lengths and distances fall in between, say wavelengths, antenna spacing, base station height, intervehicular distances etc. Can we expect the readers to keep track of all the base units when we write “the gate length is u nm”, “the wavelength is v cm”, and “the base station height is h m”?

In conclusion, there is really no advantage to this kind of separation. In other words, having mathematical symbols refer to physical quantities rather than just numerical values (while the unit is defined elsewhere) is always preferable.

Now, there are cases, where it makes perfect sense to omit a unit when defining a quantity, say a density or a distance. For example, the normalized notation r=10 or λ=3 is acceptable (usually even preferred) when the normalization reference can be arbitrary. For example, the classical result for the SIR distribution in Poisson bipolar networks (link distance r, path loss exponent 2/δ, Rayleigh fading) is

\displaystyle F(\theta)=\exp(-c(\delta)\lambda r^2\theta^\delta),

irrespective of whether r is normalized by 1 m or 1 km – as long as λ is normalized accordingly, i.e., by 1 m-2 or 1 km-2). Such scale-free results are particularly elegant, because they show that shrinking or expanding the network does not change the result. They may also indicate that one parameter can be fixed without loss of generality. In this example, what matters is the product λ r2, so setting λ=1 or r=1 is sensible to reduce the number of parameters.

One more thing. The conflation of a unit and a noun describing a physical quantity or device has become somewhat popular, unfortunately. It violates rule of proper English composition and also the guidelines for scientific writing as put forth by, for example, IEEE. Let us hope they do not spread further, otherwise we need to get used to THzFrequencies, cmDistances, MsLifetimes, pJConsumptions, km-2Densities etc. Further, improper capitalization can drastically change the quantities. The gap between a mmLength and a MmLength is 9 orders of magnitude, that between a pASource and a PASource is 27 orders of magnitude, and that between ymGaps and YmGaps is 48 orders of magnitude!

Single-point simulation?

When applying simalysis as illustrated in the previous post, the question arises where to put the boundary between the part of the point process that is simulated and the part that is analyzed. Specifically, we may wonder whether we can reduce the simulated part to only a single point (on average), i.e., to choose the number of simulated points to be Poisson with mean 1 in each realization.
Let’s find out, using the same Poisson bipolar model as in the previous post (Rayleigh fading, transmitter density 1, link distance 1/4).

Fig. 1. Simalysis of SIR ccdf for Poisson bipolar network averaged over 500 realizations with 1 point on average. The dashed curves show the exact results.

Fig. 1 shows the simulated (or, more precisely, simalyzed) result where the 500 realizations only contain one single point on average. This means that about 500/e ≈ 184 realizations have no point at all. We observe good accuracy, especially at small path loss exponents α. Also, the simulated curves are lower bounds to the exact ones. This is due to Jensen’s inequality:

\displaystyle \mathbb{E}\exp(-sI_c) \geq \exp(-s\mathbb{E}(I_c)),\quad s>0.

The term on the left side is the exact factor in the SIR ccdf due to the interference Ic from points outside distance c. It is larger than the right side, which is the factor used in the simalysis (see the Matlab code). This property holds for all stationary point processes.

But why stop there? One would think that using, say, only 1/4 point on average would be (essentially) pointless. But let’s try, just to be sure.

Fig. 2. Same plot as in Fig. 1 but the curves are simalyzed with only 1/4 point per realization on average.

Remarkably, even with only 1/4 point per realization on average, the curves for α<2.5 are quite accurate, and the one for α=2.1 is an excellent lower bound! It is certainly much more accurate than a classical simulation with 500,000 points per realization (see the previous post). Such a good match is quite surprising, especially considering that 1/4 point on average means that about 78% of the realizations have 0 points, which means that in about 390 out of the 500 realizations, the simulated factor in the SIR ccdf simply yields 1. Also, in the entire simalysis, only about 125 points are ever produced. It takes no more than about 1/2 s on a standard computer.
We conclude that accurate simulation (simalysis, actually) can be almost point-less.

Simalysis: Symbiosis of simulation and analysis

Simulations can be quite time-consuming. Are there any techniques that can help make them more efficient and/or accurate? Let us focus on a concrete problem. Say we would like to plot the SIR distribution in a Poisson bipolar network for different path loss exponents α, including some values close to 2. Since we would like to compare the result with the exact one, we focus on the Rayleigh fading case where the analytical expression is known and simple. The goal is to get accurate curves for α=3, 2.5, 2.25, and 2.1, and we would like to wait no longer than 1 s for the results on a standard desktop or laptop computer.
Let us first discuss why this is a non-trivial problem. It involves averaging w.r.t. the fading and the point process, and we need to make sure that the number of interferers is large enough for good accuracy. But what is “large enough”? A quick calculation using Campbell’s theorem (for sums) reveals that if we want to capture 99% of the mean interference power (outside radius 1 to avoid complications due to a potential singularity in the path loss law), we find that for α=3, the simulation region needs to be 100 times larger than for α=4. This seems manageable, but for α=2.5, 2.25, 2.1, it is 106, 1014, 1038 times larger, respectively!
Clearly the straightforward approach of producing many realizations of the PPP in a large region does not work in the regime of small α. So how can we achieve our goal above – high accuracy and high efficiency?

The solution is to use an analysis-enhanced simulation technique, which I call simalysis. While we often tend to think as analysis vs. simulation as a dichotomy, in this approach they are used symbiotically. The idea is to exploit analytical results whenever possible to make simulations faster and more accurate. Let me illustrate how simalysis works when applied to the problem above.

For small α, it is impossible to “capture” most of the interference solely by simulation. In fact, most of it stems from the infinitely many distance nodes, each one contributing little, with independent fading. We can thus assume that the variance of the interference of the nodes further than a certain distance (relatively large compared with the mean nearest-neighbor distance) is relatively small. Accordingly, replacing it by its mean is a sensible simplification. Here is where the analysis comes in. For any stationary point process of density λ, the mean interference from the nodes outside distance c is

\displaystyle I(c)=2\pi\lambda\int_c^\infty r^{1-\alpha}{\rm d} r=\frac{2\pi\lambda}{\alpha-2} c^{2-\alpha}.

This interference term can then be added to the simulated interference, which stems from points within distance c. Simulating as few as 50 points is enough for very high accuracy. The result is shown in the figure below, using the MH scale so that the entire distribution is revealed (see this post for details on the MH scale). For α near 2, the curves are indistinguishable!

Fig. 1. SIR distribution for Poisson bipolar network in MH scale. Density 1, link distance 1/4.
The solid lines are obtained by simalysis, the dashed lines show the exact analytical results.

This simulation averages over 500 realizations of the PPP and runs in less than 1 s on a laptop. The Matlab code is available here. It uses a second simalytic technique, namely the analytical averaging over the fading. Irrespective of the type of point process we want to simulate, as long as the fading is Rayleigh, we can perform the averaging over the fading analytically.

For comparison, the figure below shows the simulation results if 500,000 points (interferers) are simulated, without adding the analytical mean interference term, i.e., using classical simulation. Despite taking 600 times longer, the distributions for α<2.5 are not acceptable.

Fig. 2. Classically simulated SIR distribution (solid lines) in Poisson bipolar network, compared with the exact analytical ones (dashed). Same parameters as in Fig. 1.

Double-proving by simulation?

Let us consider a hypothetical scenario that illustrates an issue I frequently observe.


Author: Here is an important result for a canonical Poisson bipolar network:
Theorem: The complementary cumulative distribution of the SIR in a Poisson bipolar network with Rayleigh fading, transmitter density λ, link distance r, and path loss exponent 2/δ is

\displaystyle \bar F(\theta)=\exp(-\lambda\pi r^2 \theta^\delta\Gamma(1-\delta)\Gamma(1+\delta)).

Proof: [Gives proof based on the probability generating functional.]

Reviewer: This is a nice result, but it is not validated by simulation. Please provide simulation results.


We have a proven exact analytical (PEA) result. So why would we need a simulation for “validation”? Where does the lack of trust in proofs come from? I am puzzled by these requests by reviewers. Similar issues arise when authors themselves feel the need to add simulations to the visualization of PEA results.
Perhaps some reviewers are not familiar with the analytical tools used or they think it is easier to have a quick look at a simulated curve rather than checking a proof. Perhaps some authors are not entirely sure their proofs are valid, or they think reviewers are more likely to trust the proofs if simulations are also shown.
The key issue is that such requests by reviewers or simulations by authors take simulation results as the “ground truth”, while portraying PEA results as weaker statements that need validation. This of course is not the case. A PEA result expresses a mathematical fact and thus does not need any further “corroboration”.
Now, if the simulation results are accurate, the analytical and simulated curves lie exactly on top of each other, and the accompanying text states the obvious: “Look, the curves match!”. But what if there isn’t an exact match between the analytical and the simulated curve? Which means that the simulation is not accurate. Certainly that does not qualify as “validation”. The worst conclusion would be to distrust the PEA result and take the simulation as the true result.
By its nature, a simulation is always restricted to a small cross-section of the parameter space. Even the simple result above has four parameters, which would make it hard to comprehensively simulate the network. Related, I am inviting the reader to simulate the result for a path loss exponent α=2.1 or δ=0.95. Almost surely the simulated curve will look quite different from the analytical one.
In conclusion, there is absolutely no need for “two-step verification” of PEA results. On the contrary.

Fraction of reliable links

The fraction of reliable links is an important metric, in particular for applications with (ultra-)high reliability requirements. In the literature, we see that it is sometimes equated with the transmission success probability of the typical link, given by

\displaystyle p_{\text{s}}=\mathbb{P}(\text{SIR}>\theta).

This is the SIR distribution (in terms of the complementary cdf) at the typical link. In this post I would like to discuss whether it is accurate to call ps the fraction of reliable links.

Say someone claims “The fraction of reliable links in this network is ps=0.8″, and I ask “But how reliable are these links?”. The answer might be “They are (at least) 80% reliable of course, because ps=0.8.” Ok, so let us assume that the fraction links with reliability at least 0.8 is 0.8. Following the same logic, the fraction of links with reliability at least 0.7 would be 0.7. But clearly that fraction cannot be smaller than the fraction of links with reliability at least 0.8. There is an obvious contradiction. So how can we quantify the fraction of reliable links in a rigorous way?

First we note that in the expression for ps, there is no notion of reliability but ps itself. This leads to the wrong interpretation above that a fraction ps  of links has reliability at least ps. Instead, we want so specify a reliability threshold so that we can say, e.g., “the fraction of links that are at least 90% reliable is 0.8”. Naturally it then follows that the fraction of links that are at least 80% reliable must be larger than (or equal to) 0.8. So a meaningful expression for the fraction of reliable links must involve a reliability threshold parameter that can be tuned from 0 to 1, irrespective of how reliable the typical link happens to be.

Second, ps gives no indication about the reliability of individual links. In particular, it does not specify what fraction of links achieve a certain reliability, say 0.8. It could be all of them, or 2/3, or 1/2, or 1/5. ps=0.8 means that the probability of transmission success over the typical link is 0.8. Equivalently, in an ergodic setting, in every time slot, a fraction 0.8 of all links happens to succeed, in every realization of the point process. But some links will be highly reliable, while others will be less reliable.

Before getting to the definition of the fraction of reliable links, let us focus on Poisson bipolar networks for illustration, with the following concrete parameters: link distance 1/4, path loss exponent 4, target SIR threshold θ=1, and the fading is iid Rayleigh. The link density is λ, and we use slotted ALOHA with transmit probability is p. In this case, the well-known expression for ps is

\displaystyle p_{\text{s}}=\exp(-c\lambda p),

where c=0.3084 is a function of link distance, path loss exponent, and SIR threshold. We note that if we keep λp constant, ps remains unchanged. Now, instead of just considering the typical link, let us consider all the links in a realization of the network, i.e., for a given set of locations of all transceivers. The video below shows the histogram of the individual link reliabilities for constant λp=1 while varying the transmit probability p from 1.00 to 0.01 in steps of 0.01. The red line indicates ps, which is the average of all link reliabilities and remains constant at 0.735. Clearly, the distribution of link reliabilities changes significantly even with constant ps – as surmised above, ps does not reveal how disparate the reliabilities are. The symbol σ refers to the standard deviation of the reliability distribution, starting at 0.3 at p=1 and decreasing to less than 1/10 of that for p=1/100.

Histogram of link reliabilities in bipolar network.

Equipped with the blue histogram (or pdf), we can easily determine what fraction of links achieves a certain reliability, say 0.6, 0.7, or 0.8. These are shown in the plot below. It is apparent that for small p, due to the concentration of the link reliabilities as p→0, the fraction of reliable links tends to 0 or 1, depending on whether the reliability threshold is above or below the average ps .

Fraction of links achieving reliability at least 0.6, 0.7, 0.8.

So how do we characterize the link reliability distribution theoretically? We start with the conditional SIR ccdf at the typical link, given the point process:

\displaystyle P_{\text{s}}=\mathbb{P}(\text{SIR}>\theta\mid\Phi).

Then ps=E(Ps), with the expectation taken over the point process. Hence ps is the mean of the conditional success probability, and if we consider its distribution, we arrive at the link reliability distribution, shown in blue in the video above. Mathematically,

\displaystyle F(\nu)=\mathbb{P}(P_{\text{s}}>\nu).

where ν is the target reliability. This distribution is a meta distribution, since it is the distribution of a conditional distribution. In ergodic settings, it specifies the fraction of links that achieve an SIR of θ with reliability at least ν, which is exactly what we set out to quantify.
In conclusion: The fraction of reliable links is not given by the standard (mean) success probability; it is given by the meta distribution of the SIR.