Category: Fresh produce

# Taming the meta distribution

The derivation of meta distributions is mostly based on the calculation of the moments of the underlying conditional distribution. The reason is that except for highly simplistic scenarios, a direct calculation is elusive. Recall the definition of a meta distribution

$\displaystyle\bar F(t,x)=\mathbb{P}(P_t>x),\qquad\text{\sf where }\;\;P_t=\mathbb{P}(X>t\mid\Phi).$

Here X is the random variable we are interested in, and Φ is part of the random elements X depends on, usually a point process modeling the locations of wireless transceivers. The random variable Pt is a conditional distribution given Φ.

Using stochastic geometry, we can often derive moments of Pt. Since Pt has finite support, finding its distribution given the moments is a Hausdorff moment problem, which has a rich history in the mathematical literature but is not fully solved. The infinite sequence of integer moments uniquely defines the distribution, but in practice we are always restricted to a finite sequence, often less than 10 moments or so. This truncated version of the problem has infinitely many solutions, and the methods proposed to find or approximate the solution to the standard problem may or may not produce one of the solutions to the truncated one. In fact, they may not even provide an actual cumulative distribution function (cdf). On overview of the existing methods and some improvements can be found here.

In this blog, we focus on a method to find the infimum and supremum of the infinitely many possible cdfs that have a given finite moment sequence. The method is based on the Chebyshev-Markov inequalities. As the name suggests, it generalizes the well-known Markov and Chebyshev inequalities, which are based on moments sequences of length 1 or 2. The key step in this method is to find the maximum probability mass that can be concentrated at a point of interest, say z. This probability mass corresponds to the difference between the infimum and the supremum of the cdf at z. This technical report provides the details of the calculation.

Fig. 1 shows an example for the moment sequence mk =1/(k+1), for k ∈ [n], where the number n of given moments increases from 5 to 15. It can be observed how the infimum (red) and supremum (blue) curves approach each other as more moments are considered. For n → ∞, both would converge to the cdf of the uniform distribution on [0,1], i.e., F(x)=x. The supremum curve lower bounds the complementary cdf. For example, for n =15, ℙ(X>1/2)>0.4. This is the best bound that can be given since this value is achieved by a discrete distribution.

The average of the infimum and supremum at each point naturally lends itself as an approximation of the meta distribution. It can be expected to be significantly more accurate than the usual beta approximation, which is based only on the first two moments.

An implementation is available in GitHub at https://github.com/ewa-haenggi/meta-distribution/blob/main/CMBounds.m.

Acknowledgment: I would like to thank my student Xinyun Wang for writing the Matlab code for the CM inequalities and providing the figure above.

# How well do distributions match? A case for the MH distance

Papers on wireless networks frequently present analytical approximations of distributions. The reference (exact) distributions are obtained either by simulation or by the numerical evaluation of a much more complicated analytical expression. The approximation and the reference distributions are then plotted, and a “highly accurate” or “extremely precise” match is usually declared. There are several issues with this approach.
First, people disagree on what “highly accurate” means. If there is a yawning gap between the distributions, can (should) the match be declared “very precise”? Without any quantification of what “accurate” or “precise” means, it is hard to argue one way or another.
Second, the visual impression can be distorted due to the use of a logarithmic (dB) scale and since, if the distribution has infinite support, only part of it can ever be plotted.

In this post I suggest an approach that addresses both these issues, assuming that at least one of the distributions in question is only available in numerical form (discrete data points). For the second one, we use the Möbius homeomorphic transform to map the infinite support to the [0,1] unit interval. Focusing on complementary cumulative distributions (ccdfs) and assuming the original distribution is supported on the positive real line, the mapped ccdf is obtained by

${ \bar{F}_{\rm MH}(t)=\bar F\left(\frac{t}{1-t}\right).}$

The MH transform and its advantages are discussed in this blog. For instance, it is very useful when applied to SIR distributions. In this case, the mapped ccdf is that of the signal fraction (ratio of desired signal power to total received power). For our purposes here, the [0,1] support is key as it allows not only a complete visualization but also lends itself as a natural distance metric that is itself normalized to [0,1]. Here is the definition of the MH distance:

${\rm d}_{\rm MH}(\bar F,\bar G)\triangleq \|\bar F_{\rm MH}-\bar G_{\rm MH}\|_{\ell_1}=\int_0^1 \big|\bar F\left(\frac{t}{1-t}\right)-\bar G\left(\frac{t}{1-t}\right)\big|{\rm d}t$

Trivially it is bounded by 1, so the distance value directly and unambiguously measures the match between the ccdfs. Accordingly, we can use terms such as “mediocre match” or “good match” depending on this distance. The terminology should be consistent with the visual impression. For instance, if the MH ccdfs are indistinguishable, the match should be called “perfect”. Therefore, to address the first issue raised above, I propose the following intervals and terms.

Another advantage of the MH distance is that it emphasizes the high-value regime (the ccdf near 0) over the low-value regime since it maps values near 0 without distortion while it compresses high values. In the case of SIR ccdfs whose value indicate reliabilities, high values mean high reliabilities, which is the relevant regime in practice.
A simple Matlab implementation of the MH distance is available here. It accepts arbitrary values of the ccdf’s arguments and uses interpolation to achieve uniform sampling of the [0,1] interval.

As an example, here is an animation showing a standard exponential ccdf (MH mapped of course) in blue and another exponential ccdf with a parameter varying from 1.5 to 0.64. It is apparent that the terminology corresponds to the visual appraisal of the gap between the two ccdfs.

# A case for T junctions

It has been established (for example, here) that the standard two-dimensional homogeneous PPP is not an adequate model for vehicular networks, since vehicles are mostly confined to streets. The Poisson line Cox process (PLCP) has naturally emerged as the model of choice. In this process, one-dimensional PPPs are placed on a street system formed by a Poisson line process. This model is somewhat tractable and thus has gained some traction in the community. With probability 1 each line (or street) intersects with each other line, so intersections are formed, and the communication performance at the typical intersection vehicles can be studied. This is important since vehicles at intersections are more accident-prone than other vehicles.

How about T junctions? Clearly, the PLCP has no T junctions a.s. But while not quite as frequent as (four-way) intersections, they are an important building block of the street systems in every city, and it is reasonable to assume that they inherit some of the dangers of intersections. However, the performance of vehicles at T junctions have barely been modeled and analyzed. The reason is perhaps not that it is not worthy of study but the lack of a natural model. Let’s say we wanted to construct a Cox model of vehicles that is supported on a street system that has no intersections but only T junctions, with the T junctions themselves forming a stationary point process (in the same way the intersections in the PLCP form a stationary point process). What is the simplest (most natural, most tractable) model?

One model we came up with is inspired by the so-called lilypond model. From each point of a PPP, a line segment grows in a random orientation in both directions. All segments grow at the same speed until one of their endpoints hit another segment. Once all growth has stopped, the lilypond street model is obtained. Here is a realization:

Then PPPs of vehicles can be placed on each line segment to form a Lilypond line segment Cox process. Some results for vehicular networks based on this model are available here. The model has the advantage that it has only a single parameter – the density of the underlying PPP of the center points of each line segment. On the other hand, the distribution of the length of the line segments can only be bounded, and the construction naturally creates a dependence between the lengths of nearby segments, which limits the tractability. For instance, in a region with many initial Poisson points, segments will be short on average, while in a region with sparse Poisson points, segments will be long. Also, the construction implies that simulating this process takes significantly more time than simulating a PLCP.

Given the shortcoming of the model, it seems quite probable that there are other, simpler and (even) more natural models for street systems with T junctions. Let’s try and find them!

# On cell slicing

Network slicing is a warm topic these days. Here we discuss cell slicing, where a polygon is cut in three pieces (sub-polygons) by two lines through its nucleus and a random point, respectively. First, as a sequel to this post, we focus on the 0-cell in the Poisson-Voronoi tessellation, which is the Voronoi cell of a PPP that contains the origin.
As in that earlier post, we rotate and shift the 0-cell so that its nucleus is at the origin and the point located uniformly randomly in the cell is to the right (on the positive x axis). In a cellular network application, the nucleus (now at the origin) is the location of the base station, while the random point is the typical user. Then we draw vertical lines through the origin and the random point and calculate the areas of the polygons to the left of the origin and to the left of the random point. This movie shows a number of realizations of this setup:

For intensity λ=1, the 0-cell in the Poisson-Voronoi tessellation has a mean area of 1.280176, obtained by the numerical evaluation of an integral. The polygon to the left of the nucleus o, shown in green in the movie, has a mean area of 0.517649, also obtained by numerical integration. The red polygon, to the right of o and to the left of x, has a mean area of 0.529111, obtained by simulation. Relative to the total mean area, we thus have:

• Mean area to the left of o: 40.4%
• Mean area to the left of x: 81.8%
• Mean area to the right of x: 18.2%

Hence, on average, almost 60% of the (other) users in the cell are on the same side of the base station as the typical user, and the mean area to the left of the typical user is larger than the entire mean area of the typical cell (which is 1 for λ=1). Also, more than 18% of the users are “behind” the typical user.

Not surprisingly, the area fractions are quite similar in the typical Poisson-Voronoi cell. (Note that the uniformly random point in the typical cell does not correspond to the typical user of a user point process that is independent of the base station process – see this post). These percentages (area fractions) for the typical cell are all simulated:

• Mean area to the left of o: 39.8%
• Mean area to the left of x: 81.3%
• Mean area to the right of x: 18.7%

How unusual are these area fractions? Let us compare them with those in the disk, which is, in some sense, the ideal cell shape. In the disk, with the nucleus o at the center, the mean area fraction to the left of o is trivially 1/2, while the area to the left of a uniformly random point is easily determined to be 7/8. This shows that the (roughly) 2/5 – 2/5 – 1/5 split in the typical Poisson-Voronoi cells is relatively far from the 1/2 – 3/8 – 1/8 split of the disk. How about regular polygons with a finite number of sides? The second movie shows some realizations for 3,4, … ,10,12,15,20,32 sides.

As expected, for a larger number of sides the area fractions approach those of the disk. Here is a plot of the relative deviations to the disk of regular polygons, where o is the centroid:

For instance, for the triangle, the area fraction to the left of o is 0.516, which is 3.2% larger than for the disk. Hence the first blue point in Fig. 1 (top left), corresponding to 3 sides, is at 3.2. To better see how the deviations behave for 5 or more sides, here is another version of the figure that shows only pentagons and higher.

From the blue curve it is apparent that the area to the left of the center is always exactly one half if the number of sides is even. The reason is that with an even number of sides, the polygons to the left of o and to the right of o are always congruent irrespective of the location of the random point. We also see that the hexagon is quite close to the disk already, with a mere 0.15% deviation from the 7/8 area fraction (to the left of x) of the disk.

# The curious shape of Poisson-Voronoi cells

In this blog we are exploring the shape of two kinds of cells in the Poisson-Voronoi tessellation on the plane, namely the 0-cell and the typical cell. The 0-cell is the cell containing the origin, while the typical cell is the cell obtained by conditioning on a Poisson point to be at the origin (which is the same as adding the origin to the PPP).

The cell shape has an important effect on the signal and interference powers at the typical user (in the 0-cell) and at the user in the typical cell. For instance, in the 0-cell, which contains the typical user at a uniformly random location, about 1/4 of the cell edge is at essentially the same distance to the base station as the typical user on average). Hence it is not the case that edge users necessarily suffer from larger signal attenuation than the typical user (who resides inside the cell).

The cell shape is determined by the directional radii of the cells when their nucleus is at the origin. To have a well-defined orientation, we select a location uniformly in the cell and rotate the cell so that this location falls on the positive x-axis. In the 0-cell, this involves first a translation of the cell’s nucleus to the origin, followed by a rotation until the original origin (which is uniformly distributed in the cell) lies on the positive x-axis. This is illustrated in Movie 1 below. In the typical cell, it involves adding a Poisson point, selecting a uniform location, and a rotation so that this uniform location lies on the positive x-axis. This is illustrated in Movie 2.

As indicated in the movies, the distances from the nucleus to the uniformly random location are denoted by D0 and D, respectively, and the directional radii by R0(ϕ) and R(ϕ), respectively. This way, the boundary of the cells is described in polar coordinates as (R0(ϕ),ϕ) and (R(ϕ),ϕ), ϕ ∈ [0,2π). In a cellular network model, the uniform random location could be that of a user, while the PPP models the base stations. In this case D0 is the link distance from the typical user to its serving base station, while D is the link distance from the typical base station to a randomly located user it serves. The distinction between the typical user’s and the typical base station’s point of view is explained in this blog.

Let λ denote the density of the PPP. Three results are well known:

• The distribution of D0 follows from the void probability of the PPP. It is Rayleigh with mean 1/(2√λ).
• Since the mean area of the typical cell is 1/λ, we have ∫ 0π 𝔼(R(ϕ)2) dϕ = 1/λ.
• The minimum of R(ϕ) is distributed with pdf f(r)=8λπr exp(-4λπr2). This is half the distance to the nearest neighboring Poisson point (base station).

In contrast, there is no closed-form expression for the distribution of D. Due to size-biased sampling, the area of the 0-cell stochastically dominates that of the typical cell and, in turn, D0 dominates D.

Analyzing the directional radii, we obtain these new insights on the cell shapes:

• If Ψ is uniform in [0,π], R(Ψ) is again Rayleigh with mean 1/(2√λ).
• R0(π) is also Rayleigh with the same mean. In fact, R0(π) and D0 are iid.
• R0(0) has mean 3/(4√λ) and is distributed as

$\displaystyle f_{R_0(0)}(y)=2(\lambda\pi)^2 y^3 \exp(-\lambda\pi y^2).$

• Hence R0(0) is on average exactly 50% larger than R0(π). For the typical cell, simulation results indicate that R(0) is about 55% larger on average than R(π).
• The difference R0(0)-D0 is distributed as f(r)=π√λ erfc(r √(πλ)). Its mean is 1/(4√λ). Hence the typical user is no further from the cell edge than the base station on average.
• The joint distribution of D0 and R0(ϕ) can be given in exact analytical form.
• 3/4 of the typical cell is further away from the nucleus than the nearest point on the cell edge (i.e., the minimum directional radius). Expressed differently, a uniformly random user in the typical cell has a 75% chance of being further away from the base station than the nearest edge user. By simulation, D on average is 2.7 times larger than the minimum of the directional radii.

In conclusion, the 0-cell and the typical cell are quite asymmetric around the nucleus (base station) and the uniformly random point (user). In the direction away from the base station, the user is about 4 times closer to the cell edge than in the direction towards the base station, and many locations on the cell edge are closer to the base station than the user inside the cell. These results have implications on the design of efficient cellular network transmission schemes, such as beamforming, NOMA, and base station cooperation, in both down- and uplink.

More details are available in Section II of this paper.

# Tractable, closed-form, exact

The attributes “tractable”, “closed-form”, and “exact” are frequently used to describe analytical results and, in the case of “tractable”, also models. At the time of writing, IEEE Xplore lists 4540 journal articles with “closed-form” and “wireless” in their meta data, 650 with “tractable” and “wireless”, and 220 with “closed-form”, “wireless” and “stochastic geometry”.

Among the three adjectives, only for “exact” there is general consensus what is means exactly. For “closed-form”, mathematicians have a clear definition: The expression can only consist of finite sums and products, division, roots, exponentials, logarithms, trigonometric and hyperbolic functions and their inverses. Many authors are less strict, using the term also for expressions involving general transcendent functions or infinite sums and products. Lastly, the use of “tractable” varies widely. There are “tractable results”, “tractable models”, “tractable analyses”, and “tractable frameworks”.

“Tractable” is defined by Merriam-Webster as “easily handled, managed, or wrought”, by the Google Dictionary as “easy to deal with”, and by the Cambridge Dictionary as “easily dealt with, controlled, or persuaded”. Wikipedia refers to the mathematical use of the term: “ease of obtaining a mathematical solution such as a closed-form expression”. These definitions are too vague to clearly distinguish a “tractable model” from a “non-tractable” one, since “easy” can mean very different things to different people.

We also find combinations of the terms; in the literature, there are “tractable closed-form expressions” and even “highly accurate simple closed-form approximations”. But shouldn’t all “closed-form” expressions qualify as “tractable”? And aren’t they also “simple”, or are there complicated “closed-form” expressions?

It would be helpful to find an agreement in our community what qualifies as “closed-form”. Here is a proposal:

• Use “closed-form” in its strict mathematical understanding, allowing only elementary functions.
• Use “weakly closed-form” for expressions involving hypergeometric, (incomplete) gamma functions, and the error and the Lambert W functions.
• Any result involving integrals, limits, infinite sums, or general transcendent functions such as generalized hypergeometric and Meijer G functions is not “closed-form” or “weakly closed-form” (but may exact of course).

Thus equipped, we could try to define what a “tractable model” is. For instance, we could declare a model “tractable” if it allows the derivation of at least one non-trivial exact closed-form result for the metric of interest. This way, the SIR distribution in the Poisson bipolar network with ALOHA, Rayleigh fading, and power-law path loss is tractable because the expression only involves an exponential and a trigonometric function. The SIR in the downlink Poisson cellular with Rayleigh fading and path loss exponent 4 is also tractable; its expression includes only square roots and an arctangent. In contrast, the SIR in the uplink Poisson cellular network is not tractable, irrespective of the user point process model.
A result could be termed “tractable” if the typical educated reader can tell how the expression behaves as a function of its parameters.

Going a step further, it may make sense to be more formal and introduce categories for the sharpness of a result, such as these:

A1: closed-form exact
A2: weakly closed-form exact
A3: general exact
B1: closed-form bound
B2: weakly closed-form bound
B3: general bound
C1: closed-form approximation
C2: weakly closed-form approximation
C3: general approximation

Alternatively, we could use A+, A, A-, B+, etc., inspired by the letter grading system used in the USA. We could even calculate a grade point average (GPA) of a set of results, based on the standard letter grade-to-numerical grade conversion.

Such classification allows a non-binary quantification of “tractability” of a model. If the model permits the derivation of an A1 result, it is fully “tractable”. If it only allows C3 results, it is not “tractable”. If we can obtain, say, an A3, a B2, and a C1 result, it is 50% “tractable” or “semi-tractable”. Such a sliding scale instead of a black-and-white categorization would reflect the vagueness of the general definition of the term but put it on a more solid quantitative basis. Subcategories for asymptotic results or “order-of” results could be added.

This way, we can pave the way towards the development of a tractable framework for tractability.

# Alice and Bob go viral, wirelessly

Alice has bits of viral information – so-called vitbits – to share. Despite her best efforts, she coughs up a strong signal and emits it using directional breathforming. Luckily, the line-of-sight is masked. Nearby Bob is unconcerned, he relies on an outage. After all, he is not in the near field, and he wouldn’t touch any of the non-intelligently reflecting surfaces in the room.

However, masking the main lobe of transmission leads to multi-path propagation and diversity. Waves of vitbits sitting on droplets (scientifically known as votons) are traveling along different paths, in an attempt to reach a destination. Their power decays quickly over distance, according to a power-law, with an empirical free-space limit of 2 m. But in this case, votons from different directions meet coherently, joining forces and managing to maintain strength and collectively carry sufficiently many vitbits.

Unfortunately, the vitbits find their target. There is no outage, just Bob’s outrage. He became a victim of his vir-ility.

# Epidemics are spatial, stochastic, and wireless

Inspired by current events, let us focus on the “successful” transmission of a virus from one host to another. In the case of a coronavirus, such transmission happens when the infected person (the source) emits a “signal” by breathing, coughing, sneezing, or speaking, and another person (the destination) gets infected when the respiratory droplets land in the mouth or nose or are inhaled. Fundamentally, the process looks very similar to that of a conventional RF wireless transmission. There are notions of signal strength, directional transmission, path loss, and shadowing in both, resulting in a probability of “successful” reception. In both cases, distances play a critical role – there is the well-known 2 m separation in the case of coronaviral transmission that (presumably) causes an outage in the transmission with high probability; similarly, in the wireless case, there is sharp decay of reception probabilities as a function of distance due to path loss.

Given the critical role of the geometry of host (transceiver) locations, it appears that known models and analytical tools from stochastic geometry could be beneficially applied to learn more about the spread of an epidemic. In contrast to the careful modeling of channels and transmitter and receiver characteristics in wireless communications, the models for the spread of infectious diseases are usually based on the basic reproduction number R0, which is a population-wide parameter that reflects distances between individual agents very indirectly only. Since it is important to understand the effect of physical (Euclidean) distancing (often misleadingly referred to as “social distancing”) and masking (i.e., shadowing), it seems that incorporating distances explicitly in the models would increase their predictive power. Such models would account for the strong dependence of infection probabilities on distance, akin to the path loss function in wireless communications, as well as directionality, akin to beamforming. Time of proximity or exposure can be incorporated if mobility is included.

This line of thought raises some vital questions: How can expertise in wireless transmission be applied to viral transmission? What can stochastic geometry contribute to the understanding of how infectious diseases are spreading (spatial epidemics)? Is there a wireless analogy to “herd immunity”, perhaps related to percolation-theoretic analyses of how broadcasting over multiple hops in a wireless network leads to a giant component of nodes receiving an information packet?

I believe it is quite rewarding to think about these questions, both from a theoretical point of view and in terms of their real-world impact.

Papers on these topics are solicited in the special issue on Spatial Transmission Dynamics of MDPI Information (deadline Feb. 28, 2021).

# Unmasking distributions with infinite support

When visualizing distributions with infinite support, we face the challenge that they can only be shown partially. Usually we try to judiciously choose the interval of our plot so that the interesting part is revealed, and it is understood that outside that interval the function is essentially zero (for a density) or essentially zero or one (for a cumulative distribution). However, there are two disadvantages to that approach: First, if two distributions are not shown on the same interval, it is hard to compare them. Second, interesting asymptotic behavior in the tails are masked. In many cases, using a linear scale suffers from a third shortcoming: Interesting features may occur on a significantly different scale. Which would require choosing a large interval, but that, in turn, may mask the behavior in some parts.

When plotting distribution of signal-to-interference ratios (SIRs), the standard approach is to use a dB scale. The complementary cumulative distribution (ccdf) is usually interpreted as the success probability of a transmission (in an interference-limited setting). While the dB scale allows for visualization the cccdf over a larger range, it has its own shortcomings: First, it turns the one-sided infinite support [0,∞) into the two-sided infinite support (-∞,∞), which can make selecting a suitable interval harder. Second, it distorts the ccdf, which prevents the viewer from obtaining insight into asymptotic behaviors.

So how can we resolve these issues? It turns out that there is a straightforward solution. It is based on a homeomorphic mapping of [0,∞) to the unit interval [0,1]. This mapping is given by the function T(x)=x/(1+x), and the resulting scale is called MH (Möbius homeomorphic) scale. For comparison, the dB scale has the mapping 10 log10(x). In the figures below, the dummy variable is θ, and we have θ dB=10θ/10, and θ MH=θ/(1-θ). In the important high-reliability regime θ→0, θ MH ∼ θ, i.e., there is no distortion.
The three figures show 6 ccdfs on a linear, dB, and the MH scale. Clearly, the linear scale plots mask the information about the tail. Some curves go to zero (too) quickly, while the behavior of the yellow one past 100 is completely hidden. The dB scale displays the transitional regime more prominently, but all curves have an inverted S shape, which reduces the discriminative power of these plots. In particular, for small θ, they all become flat. For example, the blue (Pareto) and the cyan (Lévy) curve look similar in the dB scale, but with some shift. The MH scale, however, reveals that the asymptotic behavior on both ends is, in fact, quite different. Also, the red (another Pareto distribution) and green (gamma) curve look fairly similar in the dB scale, while the MH scale emphasizes the difference between the two. Generally, the MH plots enhance the differences because the slopes at 0 and at 1 can assume any value, while the slopes in the dB plots always approach 0 – assuming the range is chosen wide enough.

In summary, the MH scale has the following advantages:

• There is a single finite interval that reveals the complete distribution. There is never a question of what interval to choose, and nothing remains hidden.
• The asymptotic behaviors are clearly visible. In comparison, on the dB scale, the behavior near 0 and towards infinity is always obscured.
• In the case of SIRs, the MH scale has the additional interpretation as visualizing the distribution of the signal fraction S/(S+I) (SF) on a linear scale: If F(θ) is the ccdf of the SIR S/I, then F(T(θ)) is the ccdf of the SF.

The MH mapping and its application to SIRs and signal fractions was first introduced in the invited paper M. Haenggi, “SIR Analysis via Signal Fractions”, IEEE Communications Letters, vol. 24, pp. 1358-1362, July 2020.