Statistical Odds & Ends

Brainteaser about the probability of two particular lines intersecting

kjytay Apr 19, 2026

Pick a point uniformly at random in a square and connect it to the bottom-left corner of the square. Pick a second point uniformly at random in the square and connect it to the top-left corner of the square. What … Continue reading →

Show full content

I won’t lay out all the details of the solution here, but I’ll show two different solutions, one slightly easier than the other.

Some common notation across the two solutions: Let $O=(0,0)$ and $A = (0,1)$ denote the bottom-left and top-left corners respectively, and let $P=(a,b)$ and $Q=(u,v)$ denote the points connected to $O$ and $A$ respectively. We are asked for the probability that the line segments $OP$ and $AQ$ intersect.

Solution 1

In this first approach, we fix the position of $P = (a,b)$ , then determine the region of the square where $Q$ must be in order for the two line segments to intersect. We then integrate this area over $a$ and $b$ to get the required probability.

We have to split into two cases, whether $P$ lies above or below the line $x + y = 1$ . If $P$ lies below the line $x + y = 1$ , then $Q$ must lie in the red triangle in order for the two line segments to intersect. With some basic cartesian geometry, we can determine that the red triangle has base $\dfrac{a}{1-b}$ and height $b$ , and thus has area $\frac{ab}{2(1-b)}$ .

If $P$ lies above the line $x+y = 1$ , then $Q$ must lie in the red quadrilateral:

Splitting the quadrilateral into the right triangle on the left and the trapezium on the right, we can determine that its area is

$\begin{aligned} \text{Area} &= \frac{1}{2}ab + \frac{1}{2}\left(b + 1 + \frac{b-1}{a} \right)(1-a) \\ &= \dots \\ &= 1 - \frac{a}{2} - \frac{1-b}{2a}. \end{aligned}$

Hence, the probability that the two lines intersect is

$\begin{aligned} \text{Probability} &= \int_0^1 \int_0^{1-a} \frac{ab}{2(1-b)} db \; da + \int_0^1 \int_{1-a}^1 \left( 1 - \frac{a}{2} - \frac{1-b}{2a} \right) db \; da. \end{aligned}$

Both of these integrals can be evaluated using standard techniques (most advanced technique needed here is integration by parts for $x \log x$ ) by evaluating the integral over $b$ first, then over $a$ .

Solution 2

Recall we defined $P = (a,b)$ and $Q = (u,v)$ . Fix $a$ and $u$ .

First consider the case where $a < u$ . Draw the vertical line through $P$ , and let that line intersect $AQ$ at $R$ . The line segments $OP$ and $AQ$ DON’T intersect if and only if $R$ lies above $P$ (the red segment).

$R$ ‘s y-coordinate is $1 + \dfrac{(v-1)a}{u}$ , so the two lines intersect if and only if $b > 1 + \dfrac{(v-1)a}{u}$ . Integrating over all possible values of $b$ and $v$ , we have

$\begin{aligned} \mathbb{P} \{ \text{intersection} | a, c, a < c \} &= \int_0^1 1 - \left[ 1 + \dfrac{(v-1)a}{u} \right] dv \\ &= \int_0^1 \dfrac{(1-v)a}{u} dv \\ &= \frac{a}{u} \left[ v - \frac{v^2}{2} \right]_0^1 \\ &= \frac{a}{2u}. \end{aligned}$

Integrating out $a$ and $c$ ,

$\begin{aligned} \mathbb{P} \{ \text{intersection when } a < c \} &= \int_0^1 \int_a^1 \frac{a}{2u} du \; da \\ &= \int_0^1 \left[ \frac{a \log u}{2} \right]_a^1 da \\ &= -\frac{1}{2} \int_0^1 a \log a \; da. \end{aligned}$

Using integration by parts and L’Hôpital’s rule, we can determine that $\displaystyle\int_0^1 a \log a \; da = -\dfrac{1}{4}$ , so the probability above is $1/8$ .

Now, assume that $a > c$ . The probability of intersection in this case is the same as the case above by symmetry! (Imagine flipping the square along the x-axis and switching the roles of $(O, P$ and $(A, Q)$ .)

Hence, in sum the probability of intersection is $1/8 + 1/8 = 1/4$ .

http://statisticaloddsandends.wordpress.com/?p=8254

Extensions

Brainteaser about max probability of a random variable being 0 given some constraints

kjytay Apr 1, 2026

is a non-negative random variable such that its expectation is 5 and its variance is 25. What is the maximum probability that ?(Source: quantprof.org) Decompose as , where and are independent, , and with probability , and otherwise. The problem … Continue reading →

Show full content

$X$ is a non-negative random variable such that its expectation is 5 and its variance is 25. What is the maximum probability that $X=0$ ?
(Source: quantprof.org)

Decompose $X$ as $X = YZ$ , where $Y$ and $Z$ are independent, $Z > 0$ , and $Y = 1$ with probability $1-p$ , and $Y = 0$ otherwise. The problem becomes trying to find the maximum value of $p$ .

The idea is to use conditioning to derive some constraints on $p$ . Using the fact that $Y^2 = Y$ , we have

$\begin{aligned} \mathbb{E}[YZ \mid Y] &= Y \mathbb{E}[Z], \\ \mathbb{E}[(YZ)^2 \mid Y] &= \mathbb{E}[YZ^2 \mid Y] = Y \mathbb{E}[Z^2], \\ \text{Var}(YZ \mid Y) &= \mathbb{E}[(YZ)^2 \mid Y] - (\mathbb{E}[YZ \mid Y])^2 \\ &= Y \mathbb{E}[Z^2] - Y (\mathbb{E}[Z])^2 \\ &= Y \text{Var}(Z). \end{aligned}$

By the law of iterated expectations,

$\begin{aligned} 5 &= \mathbb{E}[X] = \mathbb{E}[\mathbb{E}[YZ \mid Y]] \\ &= \mathbb{E}[Z] \mathbb{E}[Y] \\ &= (1-p) \mathbb{E}[Z], \\ \mathbb{E}[Z] &= \frac{5}{1-p}. \end{aligned}$

By the law of total variance,

$\begin{aligned} 25 &= \text{Var}(X) = \mathbb{E}[\text{Var}(YZ \mid Y)] + \text{Var}(\mathbb{E}[YZ \mid Y]) \\ &= \mathbb{E}[Y \text{Var}(Z)] + \text{Var}(Y \mathbb{E}[Z]) \\ &= \text{Var}(Z) \mathbb{E}[Y] + (\mathbb{E}[Z])^2 \text{Var}(Y) \\ &= (1-p) \text{Var}(Z) + p(1-p) \left(\frac{5}{1-p} \right)^2. \end{aligned}$

Since $\text{Var}(Z) \geq 0$ , we have

$\begin{aligned} 25 &\geq p(1-p) \left(\frac{5}{1-p} \right)^2 = \frac{25p}{1-p}, \\ 1-p &\geq p, \\ p &\leq \frac{1}{2}. \end{aligned}$

Equality can be achieved when $\text{Var}(Z) = 0$ , i.e. $Z = 10$ .

http://statisticaloddsandends.wordpress.com/?p=8241

Extensions

What is a probability-generating function (PGF)?

kjytay Nov 10, 2025

If is a discrete random variable taking values in , then the probability-generating function (PGF) of is defined as where is the probability mass function of . PGFs obey all rules of power series with non-negative coefficients. In particular, PGFs … Continue reading →

Show full content

If $X$ is a discrete random variable taking values in $\{0, 1, \dots \}$ , then the probability-generating function (PGF) of $X$ is defined as

$\begin{aligned} G(z) = \mathbb{E} [z^X] = \sum_{x=0}^\infty p(x) z^x, \end{aligned}$

where $p$ is the probability mass function of $X$ .

PGFs obey all rules of power series with non-negative coefficients. In particular,

$G(1^-) = \lim_{x \uparrow 1} G(x) = 1$ , and
The radius of convergence of a PGF is at least 1.

PGFs can be useful for computing probabilities and expectations:

$\begin{aligned} p(k) = \mathbb{P}\{ X = k\} = \dfrac{G^{(k)}(0)}{k!}. \end{aligned}$

For $k = 0, 1, 2,\dots$ ,

$\begin{aligned} \mathbb{E}\left[ \dfrac{X!}{(X-k)!} \right] = G^{(k)}(1^-) \end{aligned}$

In particular,

$\begin{aligned} \mathbb{E}[X] &= G'(1-), \\ \text{Var}(X) &= G''(1^-) + G'(1^-) - [G'(1^-)]^2. \end{aligned}$

References:

Wikipedia. Probability-generating function.

http://statisticaloddsandends.wordpress.com/?p=8230

Extensions

Brainteaser about points belonging to the same semicircular arc

kjytay Aug 31, 2025

points are placed uniformly at random on a circle. What’s the probability that all of them lie within a semicircular arc? Label the points . For each , define the event as the event where all points lie on the … Continue reading →

Show full content

$n$ points are placed uniformly at random on a circle. What’s the probability that all $n$ of them lie within a semicircular arc?

Label the points $1, 2, \dots, n$ . For each $i =1, \dots, n$ , define the event $A_i$ as the event where all $n$ points lie on the semicircular arc starting from point $i$ and moving clockwise.

The first thing to note is that

$\begin{aligned} \{ \text{all } n \text{ points lie within a semicircular arc} \} = \bigcup_{i=1}^n A_i. \end{aligned}$

The next thing to note is that the $A_i$ ‘s are actually disjoint (except possibly for configurations with probability zero)! We can prove this by contradiction. Assume that $A_i$ and $A_j$ are not disjoint. That means that there is some configuration of points where the clockwise semicircle starting from $i$ covers all points, and the clockwise semicircle starting from $j$ covers all points. Consider moving clockwise from $i$ : the arc reaching $j$ from this direction is $\leq$ half the circle. But by the same logic, the arc reaching $i$ from $j$ in this direction is also $\leq$ half the circle! The only way this is possible is if the two points are two ends of the same diameter, but that happens with probability zero.

Hence,

$\begin{aligned} \mathbb{P}\{ \text{all } n \text{ points lie within a semicircular arc} \} &= \mathbb{P} \left( \bigcup_{i=1}^n A_i \right) \\ &= \sum_{i=1}^n \mathbb{P}(A_i) \\ &= n \mathbb{P} (A_1) \\ &= \frac{n}{2^{n-1}}. \end{aligned}$

http://statisticaloddsandends.wordpress.com/?p=8222

Extensions

Brainteaser about expected number of rolls

kjytay Aug 30, 2025

An -sided fair die is rolled repeatedly. What is the expected number of rolls until the total sum becomes at least ? This question can be solved by conditioning on the outcome of the first roll. For , let be … Continue reading →

Show full content

An $n$ -sided fair die is rolled repeatedly. What is the expected number of rolls until the total sum becomes at least $n$ ?

This question can be solved by conditioning on the outcome of the first roll. For $k = 1, \dots, n$ , let $E_k$ be the expected number of rolls until the total sum becomes at least $k$ . What we want is $E_n$ .

Let’s evaluate $E_k$ for small $k$ . It’s clear that $E_1 = 1$ , and

$\begin{aligned} E_2 &= \mathbb{P}(\text{roll more than 1}) + \mathbb{P}(\text{roll a 1}) \cdot (1 + E_1) \\ &= \frac{n-1}{n} + \frac{1}{n}(2) \\ &= \frac{n+1}{n}. \end{aligned}$

We can show by induction that $E_k = \left(\dfrac{n+1}{n} \right)^{k-1}$ . Here’s the induction step going from $k-1$ to $k$ :

$\begin{aligned} E_k &= \mathbb{P}(\text{roll more than } k-1) + \sum_{i = 1}^{k-1} \mathbb{P}(\text{roll a } k - i) \cdot (1 + E_i) \\ &= \frac{n-k-1}{n} + \sum_{i=1}^{k-1} \frac{1}{n}\cdot \left[ 1 + \left(\frac{n+1}{n} \right)^{i-1} \right] \\ &= 1 + \frac{1}{n}\sum_{i=1}^{k-1} \left(\frac{n+1}{n} \right)^{i-1} \\ &= 1 + \frac{1}{n} \cdot \frac{(\frac{n+1}{n})^{k-1} - 1}{\frac{n+1}{n} - 1} \\ &= 1+ \left( \frac{n+1}{n} \right)^{k-1} - 1 \\ &= \left( \frac{n+1}{n} \right)^{k-1}. \end{aligned}$

Substituting $k = n$ , we get

$\begin{aligned} E_n = \left( \frac{n+1}{n} \right)^{n-1}. \end{aligned}$

As $n \rightarrow \infty$ , $E_n \rightarrow e$ .

Here’s an R script to check our expression for $E_k$ via Monte Carlo:

run_simulation <- function(n, k) {
  num_rolls <- 0
  current_sum <- 0
  while (current_sum < k) {
    num_rolls <- num_rolls + 1
    current_sum <- current_sum + sample(n, 1)
  }
  
  num_rolls
}

n <- 10
k <- 3
n_sim <- 10000

set.seed(1)
num_rolls <- replicate(n_sim, run_simulation(n, k))
print(paste("MC estimate:", mean(num_rolls)))
print(paste("Exact value:", (1 + 1/n)^(k-1)))

http://statisticaloddsandends.wordpress.com/?p=8215

Extensions

Downloading datasets from Our World in Data in R

kjytay Jan 30, 2025

I recently learned from Allen Downey’s blog that Our World in Data is providing API access to their data. Our World in Data hosts datasets across several important topics, from population and demographic change, poverty and economic development, to human … Continue reading →

Show full content

I recently learned from Allen Downey’s blog that Our World in Data is providing API access to their data. Our World in Data hosts datasets across several important topics, from population and demographic change, poverty and economic development, to human rights and democracy. From Nov 2024, Our World in Data “[offers] direct URLs to access data in CSV format and comprehensive metadata in JSON format” (this is what they call the Public Chart API).

See this link for full documentation on the chart data API. Allen Downey’s blog post shows how to use this API in Python; in this post I’ll show the corresponding code in R.

Downloading metadata

The chart data API page tells us what APIs are available to us. The starting point is a base URL which we denote by <base_url>. This is what’s available to us:

<base_url> – The page on the website where you can see the chart.
<base_url>.csv – The data for this chart, usually what we want to download.
<base_url>.metadata.json – The metadata for this chart, e.g. chart title, the units, how to cite the data sources.
<base_url>.zip – The dataset, metadata, and a readme as zip file.

Let’s start by looking at the metadata for average monthly surface temperature:

library(tidyverse)
library(httr)
library(jsonlite)

# define URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.metadata.json"
query_params <- list(
  v = "1",
  csvType = "full",
  useColumnShortNames = "true"
)

# get metadata
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
metadata <- fromJSON(content(response, as = "text", encoding = "UTF-8"))

The returned metadata object is a list with a few keys:

# view the metadata keys
names(metadata)

#> [1] "chart"          "columns"        "dateDownloaded" "activeFilters"

The chart element gives us high-level information of the chart:

glimpse(metadata$chart)

#> List of 5
#>  $ title           : chr "Average monthly surface temperature"
#>  $ subtitle        : chr "The temperature of the air measured 2 meters above the ground, encompassing land, sea, and in-land water surfaces."
#>  $ citation        : chr "Contains modified Copernicus Climate Change Service information (2025)"
#>  $ originalChartUrl: chr "https://ourworldindata.org/grapher/average-monthly-surface-temperature?v=1&csvType=full&useColumnShortNames=true"
#>  $ selection       : chr "World"

There’s more information on the dataset in metadata$columns$temperature_2m

names(metadata$columns$temperature_2m)
#>  [1] "titleShort"            "titleLong"             "descriptionShort"      "descriptionProcessing" "shortUnit"             "unit"                 
#>  [7] "timespan"              "type"                  "owidVariableId"        "shortName"             "lastUpdated"           "citationShort"        
#> [13] "citationLong"          "fullMetadata"

(Why temperature_2m? Looking at metadata$columns$temperature_2m$shortName, it seems like temperature_2m is the short name for the dataset.)

Downloading the dataset

Here is code downloading the average monthly surface temperature for just the USA:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "filtered",
  useColumnShortNames = "true",
  tab = "chart",
  country = "USA"
)

# get data
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
df <- read_csv(content(response, as = "text", encoding = "UTF-8"))

head(df)
#> # A tibble: 6 × 6
#>   Entity        Code   year Day        temperature_2m...5 temperature_2m...6
#>   <chr>         <chr> <dbl> <date>                  <dbl>              <dbl>
#> 1 United States USA    1941 1941-12-15            -1.88                 8.02
#> 2 United States USA    1942 1942-01-15            -4.78                 7.85
#> 3 United States USA    1942 1942-02-15            -3.87                 7.85
#> 4 United States USA    1942 1942-03-15             0.0978               7.85
#> 5 United States USA    1942 1942-04-15             7.54                 7.85
#> 6 United States USA    1942 1942-05-15            12.1                  7.85

Allen Downey notes that the last column is undocumented, and based on context is probably the average annual temperature.

Here’s a simple plot of the data:

plot(df$Day, df$temperature_2m...5, type = "l",
     main = "USA ave monthly temperature",
     xlab = "Date", ylab = "Temperature in C")

If you wanted the entire dataset, just use csvType = "full" in the request parameters, like so:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "full",
  useColumnShortNames = "true",
  tab = "chart"
)

# same code for the http request works, see above

How do you know which filters you can apply? There is no extensive documentation of what filters you can apply, but as you click on the base URL to change the chart, the URL changes. You can then use this to infer the query parameters to add.

For example, I clicked around on the chart to get this URL: https://ourworldindata.org/grapher/average-monthly-surface-temperature?tab=chart&time=2001-05-15..2024-05-15&country=OWID_WRL~THA. The code below shows how you can download the data and reproduce the chart:

# define the URL and query parameters
url <- "https://ourworldindata.org/grapher/average-monthly-surface-temperature.csv"
query_params <- list(
  v = "1",
  csvType = "filtered",
  useColumnShortNames = "true",
  tab = "chart",
  time = "2001-05-15..2024-05-15",
  country = "OWID_WRL~THA"
)

# get data
headers <- add_headers(`User-Agent` = "Our World In Data data fetch/1.0")
response <- GET(url, query = query_params, headers)
df <- read_csv(content(response, as = "text", encoding = "UTF-8"))
ggplot(df) +
  geom_line(aes(x = Day, y = temperature_2m...5, col = Entity)) +
  labs(title = "Ave monthly temperature for World & Thailand",
       x = "Date", y = "Temperature in C") +
  theme_bw() +
  theme(legend.position = "bottom")

http://statisticaloddsandends.wordpress.com/?p=8190

Extensions

What is the fused lasso?

kjytay Jan 17, 2025

The fused lasso, introduced in Tibshirani et al. (2005) (Reference 1), is an extension of the lasso. While the lasso produces sparse models, the fused lasso produces models that are not only sparse, but favor a locally constant coefficient profile. … Continue reading →

Show full content

The fused lasso, introduced in Tibshirani et al. (2005) (Reference 1), is an extension of the lasso. While the lasso produces sparse models, the fused lasso produces models that are not only sparse, but favor a locally constant coefficient profile.

Assume that we have observations $i = 1, \dots, n$ . For each observation $i$ , assume that we have feature data $\boldsymbol{x}_i = ( x_{i1}, x_{i2}, \dots, x_{ip})$ , and an observation $y_i$ which we want to predict. Both the lasso and fused lasso model the predictions as

$\begin{aligned} f(\boldsymbol{x}_i) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_p x_{ip}. \end{aligned}$

Where the methods differ is how they determine the coefficients $\beta_0, \beta_1, \dots, \beta_p$ . The lasso estimates the coefficients as the solution to the minimization problem

$\begin{aligned} \hat\beta = \underset{\beta}{\text{argmin}} \; \left[ \frac{1}{2n} \sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 + \lambda \sum_{j=1}^p |\beta_j | \right], \end{aligned}$

where $\lambda \geq 0$ is a hyperparameter. (Note that the intercept $\beta_0$ is not included in the penalty term: this is usually the case for regularized models.) The penalty term encourages the coefficients $\beta_j$ to shrink to zero, giving sparsity.

The fused lasso estimates the coefficients as the solution to the minimization problem

$\begin{aligned} \hat\beta = \underset{\beta}{\text{argmin}} \; \left[ \frac{1}{2n} \sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 + \lambda_1 \sum_{j=1}^p |\beta_j | + \lambda_2 \sum_{j=2}^p |\beta_j - \beta_{j-1} | \right], \end{aligned}$

where $\lambda_1 \geq 0$ and $\lambda_2 \geq 0$ are hyperparameters. On top of penalizing the size of the coefficients, the fused lasso also penalizes the differences between successive coefficients. This encourages the coefficient profile (the plot of $\hat\beta_j$ vs. $j$ ) to look piecewise-constant.

The minimization problem for the fused lasso is a quadratic program and can be solved with general QP solvers. There are of course techniques to make the solver efficient: the original paper (Reference 1) has some ideas. You can fit the fused lasso in R with the fusedlasso function in the genlasso package.

Why might you want to use the fused lasso? You might want to do so when the features are ordered in some meaningful way. Note that the lasso solution is invariant to the ordering of the features whereas the fused lasso solution is not. In Reference 1, they gave a motivating example from protein mass spectroscopy. For a particular sample $i$ , they have intensity values $x_{ij}$ for several mass over charge ratios ( $m/z$ ). The goal is to find $m/z$ -sites that discriminate between healthy and sick patients. The feature number $j$ is related to the $m/z$ value, so it might make sense to have a coefficient profile that is locally constant.

References:

Tibshirani, R., et al. (2005). Sparsity and smoothness via the fused lasso.

http://statisticaloddsandends.wordpress.com/?p=8171

Extensions

What is the Cauchy combination test?

kjytay Sep 7, 2024

Set-up Assume that we have p-values . Assume that they are computed from z-scores (test statistics following normal distributions). Let and let . Without loss of generality, assume that each test statistic has variance 1. With this, we can express … Continue reading →

Show full content

Set-up

Assume that we have p-values $p_1, \dots, p_d$ . Assume that they are computed from z-scores $\mathbf{X} = (X_1, \dots, X_d)$ (test statistics following normal distributions). Let $\mathbb{E}[\mathbf{X}] = \boldsymbol{\mu}$ and let $\text{Cov}(\mathbf{X}) = \mathbf{\Sigma}$ . Without loss of generality, assume that each test statistic $X_i$ has variance 1. With this, we can express the p-values as

$\begin{aligned} p_i = 2 \left[ 1 - \Phi (|X_i|) \right], \end{aligned}$

where $\Phi$ is the CDF function of the standard normal distribution.

We are interested in testing the global null hypothesis $H_0 :\boldsymbol{\mu} = \mathbf{0}$ .

The Cauchy combination test

Assume that we have some random vector $w = (w_1, \dots, w_d)$ such that $w_i \geq 0$ for all $i$ , $w_1 + \dots + w_d = 0$ and $w$ is independent of $\mathbf{X}$ . The test statistic for the Cauchy combination test, proposed by Liu & Xie 2020 (Reference 1), is

$\begin{aligned} T(\mathbf{X}) = \sum_{i=1}^d w_i \tan \left[ \pi \left( 2 \Phi (|X_i|) - \frac{3}{2} \right) \right] = \sum_{i=1}^d w_i \tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right]. \end{aligned}$

Under the global null, $p_i \sim \text{Unif}(0,1)$ for each $i$ , implying that $\tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right]$ has the standard Cauchy distribution (see this previous post). If the p-values are independent, then Proposition 1 in this other previous post implies that $T(\mathbf{X})$ has the standard Cauchy distribution.

In turns out that even if the p-values are dependent, $T(\mathbf{X})$ is still approximately Cauchy! This approximation is formalized in Theorem 1 of Reference 1:

Theorem. Suppose that for any $1 \leq i < j \leq d$ , $(X_i, X_j)$ follows a bivariate normal distribution. Suppose also that $\mathbb{E}[\mathbf{X}] = \mathbf{0}$ . Let $W_0$ be a standard Cauchy random variable. Then for any fixed $d$ and any correlation matrix $\mathbf{\Sigma} \geq \mathbf{0}$ , we have

$\begin{aligned} \lim_{t \rightarrow +\infty} \dfrac{\mathbb{P}\{ T(\mathbf{X}) > t \}}{\mathbb{P} \{ W_0 > t \} } = 1. \end{aligned}$

The theorem says that the test statistic $T(\mathbf{X})$ has approximately a Cauchy tail even under dependency structures in $\mathbf{X}$ . Knowing the (approximate) distribution of $T(\mathbf{X})$ under the global null allows us to use it as a test statistic.

Other notes

The “bivariate normal distribution” condition is a bit of a technical assumption, the authors claim it is a mild assumption.
This result bears a lot of similarity with a result by Pillai & Meng 2016 (see this previous post for a description). Section 2.2 of Reference 1 discusses the similarities and the differences.
Theorem 1 above holds for fixed $d$ (number of p-values). Section 2.3 of Reference 1 has a high-dimensional asymptotic result where $d = o(t^c)$ with $0 < c < 1/2$ .
The Cauchy combination test is especially powerful when only a small number of $\mu_i$ ‘s are large, or equivalently when a smaller number of $p_i$ ‘s are very small. We can see this intuitively: small $p_i$ ‘s become very large $\tan [\pi(1/2 - p_i)]$ ‘s, so the test statistic will be dominated by a few very large p-values. See Section 4.2 of Reference 1 for a power comparison study.

References:

Liu, Y., and Xie, J. (2020). Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures.

http://statisticaloddsandends.wordpress.com/?p=8147

Extensions

An interesting result involving the Cauchy distribution

kjytay Sep 7, 2024

If are independent variables and is a random vector independent of the ‘s with for all and , it is well-known that also has a distribution. It is also well-known that if and are independent standard normal variables, then has … Continue reading →

Show full content

If $Z_1, \dots, Z_n$ are independent $\text{Cauchy}(0, 1)$ variables and $w= (w_1, \dots, w_n)$ is a random vector independent of the $Z_i$ ‘s with $w_i \geq 0$ for all $i$ and $w_1 + \dots w_n = 0$ , it is well-known that $\displaystyle\sum_{i=1}^n w_i Z_i$ also has a $\text{Cauchy}(0, 1)$ distribution.

It is also well-known that if $X$ and $Y$ are independent standard normal variables, then $X / Y$ has a $\text{Cauchy}(0, 1)$ distribution.

Putting these two facts together, we have the following proposition:

Proposition 1. Let $\mathbf{X} = (X_1, \dots, X_n)$ and $\mathbf{Y} = (Y_1, \dots, Y_n)$ be i.i.d. $\mathcal{N}(\mathbf{0}, \mathbf{\Sigma})$ , where $\mathbf{\Sigma}$ is a diagonal matrix with strictly positive entries on the diagonal. Let $w = (w_1, \dots, w_n)$ be a random vector independent of $(\mathbf{X},\mathbf{Y})$ such that $w_i \geq 0$ for all $i$ and $w_1 + \dots w_n = 0$ . Then

$\begin{aligned} \sum_{i=1}^n w_i \dfrac{X_j}{Y_j} \sim \text{Cauchy}(0,1). \end{aligned}$

Interestingly, the result above holds for any covariance matrix $\mathbf{\Sigma}$ with positive entries on the diagonal! This was proved in Pillai & Meng 2016 (Reference 1). In other words, we can have any dependence structure between the elements of $\mathbf{X}$ (and likewise for $\mathbf{Y}$ , but as long as the dependence structure for the two random vectors is the same, the linear combination of their ratios remains Cauchy-distributed! Here is the formal statement:

Proposition 2. Let $\mathbf{X} = (X_1, \dots, X_n)$ and $\mathbf{Y} = (Y_1, \dots, Y_n)$ be i.i.d. $\mathcal{N}(\mathbf{0}, \mathbf{\Sigma})$ , where $\mathbf{\Sigma}$ is any covariance matrix with strictly positive entries on the diagonal. Let $w = (w_1, \dots, w_n)$ be a random vector independent of $(\mathbf{X},\mathbf{Y})$ such that $w_i \geq 0$ for all $i$ and $w_1 + \dots w_n = 0$ . Then

$\begin{aligned} \sum_{i=1}^n w_i \dfrac{X_j}{Y_j} \sim \text{Cauchy}(0,1). \end{aligned}$

References:

Pillai, N. S., and Meng, X.-L. (2016). An Unexpected Encounter with Cauchy and Lévy.

http://statisticaloddsandends.wordpress.com/?p=8136

Extensions

Converting a uniform distribution into a Cauchy distribution

kjytay Aug 29, 2024

The following proposition demonstrates how one can transform a uniform distribution on into a standard Cauchy distribution: Proposition. If and , then has standard Cauchy distribution. Proof: For any , where the second-last equality is due to the fact that … Continue reading →

Show full content

The following proposition demonstrates how one can transform a uniform distribution on $(0, 1)$ into a standard Cauchy distribution:

Proposition. If $U \sim \text{Unif}(0,1)$ and $X = \tan \left[ \pi \left( \frac{1}{2} - U \right)\right]$ , then $X$ has standard Cauchy distribution.

Proof: For any $x \in \mathbb{R}$ ,

$\begin{aligned} \mathbb{P} \left\{ X \leq x \right\} &= \mathbb{P} \left\{ \tan \left[ \pi \left( \frac{1}{2} - U \right)\right] \leq x \right\} \\ &= \mathbb{P} \left\{ \frac{1}{2} - U \leq \frac{1}{\pi}\tan^{-1} x \right\} \\ &= \mathbb{P} \left\{ \frac{1}{2} - \frac{1}{\pi}\tan^{-1} x \leq U \right\} \\ &= 1 - \left[ \frac{1}{2} - \frac{1}{\pi}\tan^{-1} x \right] \\ &= \frac{1}{2} + \frac{1}{\pi}\tan^{-1} x, \end{aligned}$

where the second-last equality is due to the fact that $\tan^{-1} x \in (-\frac{1}{2}, \frac{1}{2})$ , so the LHS inside the probability is always in $(0, 1)$ . In other words, $X$ has the CDF of the standard Cauchy distribution.

We could also differentiate the CDF above w.r.t. $x$ to get the PDF of $X$ , which is more easily recognizable as the PDF of the standard Cauchy:

$\begin{aligned} f_X(x) &= \dfrac{d}{dx} \left[ \frac{1}{2} + \frac{1}{\pi}\tan^{-1} x \right] \\ &= \frac{1}{\pi (1 + x^2)}. \end{aligned}$

(Alternatively, one could use the formulas in this previous post with minor modifications since the transformation here is monotonically decreasing, not increasing.)

http://statisticaloddsandends.wordpress.com/?p=8120

Extensions