Box-Muller Transformation

12 Sep, 2024

The Box-Muller transformation is a method to construct a random variable that follows a Gaussian distribution by using a random variable that follows a uniform distribution.

Content

The Box-Muller transform is a method used to generate normally distributed random variables from uniformly distributed random variables.

Specifically, it involves selecting two random variables, $U_1$ and $U_2$ , which are uniformly distributed over the interval ([0,1]). The random variables $X$ and $Y$ are constructed as follows:

X = \cos(2\pi U_1) \sqrt{-2\ln U_2}

Y = \sin(2\pi U_1) \sqrt{-2\ln U_2}

Then, $X$ and $Y$ follow a normal distribution with a mean of 0 and a variance of 1.

Derivation

Assume that $X$ and $Y$ are independent, normally distributed random variables with a mean of 0 and a variance of 1. Let $p(X)$ and $p(Y)$ be their probability density functions. Thus, we have:

p(X) = \frac{1}{\sqrt{2\pi}} e^{-\frac{X^2}{2}}, \\ p(Y) = \frac{1}{\sqrt{2\pi}} e^{-\frac{Y^2}{2}}

Since $X$ and $Y$ are independent, their joint probability density function is:

p(X,Y) = \frac{1}{2\pi} e^{-\frac{X^2 + Y^2}{2}}

We now perform a coordinate transformation by defining:

X = R \cos(\theta), \\ Y = R \sin(\theta)

Thus, the joint distribution in polar coordinates becomes:

\int_{0}^{\infty} \int_{0}^{2\pi} \frac{1}{2\pi} e^{-\frac{R^2}{2}} R \, d\theta \, dR = 1

From this, we can derive the distribution functions for $R$ and $\theta$ , denoted as $P_R$ and $P_\theta$ :

P_R(R \leq r) = \\ \int_{0}^{r} \int_{0}^{2\pi} \frac{1}{2\pi} e^{-\frac{R^2}{2}} R \, d\theta \, dR = 1 - e^{-\frac{r^2}{2}}

P_\theta(\theta \leq \phi) = \\ \int_{0}^{\phi} \int_{0}^{\infty} \frac{1}{2\pi} e^{-\frac{R^2}{2}} R \, dR \, d\theta = \frac{\phi}{2\pi}

It is evident that $\theta$ follows a uniform distribution over $[0, 2\pi]$ . Let

F_R = 1 - e^{-\frac{r^2}{2}}

The inverse function is given by:

R = F_R^{-1}(z) \\ = \sqrt{-2 \ln(1 - z)}

When $z$ is uniformly distributed over $[0,1]$ , the distribution function of $R$ is $F_R(r)$ . Therefore, we can select two random variables $U_1$ and $U_2$ , uniformly distributed over $[0,1]$ , such that:

\theta = 2\pi U_1, \\ 1 - z = U_2, \\ R = \sqrt{-2 \ln U_2}

Substituting these into the earlier expressions, we obtain:

X = R \cos(\theta), \\ Y = R \sin(\theta)

Thus, the original expressions for $X$ and $Y$ are recovered, where they follow a normal distribution with mean 0 and variance 1.

Verification

To generate a random variable $X \sim N(0,1)$ following a standard normal distribution, we note that the inverse of the error function $\mathrm{erf}^{-1}$ is not elementary. Using a series expansion to approximate this inverse can introduce significant computational error.

Elementary functions, on the other hand, have well-established high-precision algorithms. Thus, we aim to construct a normally distributed random variable from a uniformly distributed one using elementary functions.

Let $U, V \in (0,1)$ be uniformly distributed random variables. Consider the following transformations:

X = \sqrt{-2\ln U} \cos(2\pi V), \\ Y = \sqrt{-2\ln U} \sin(2\pi V)

Next, we verify that $X$ and $Y$ are independent and follow a standard normal distribution.

Define:

R = \sqrt{X^2 + Y^2}, \\ \Theta = \arctan\left(\frac{Y}{X}\right)

Note that $\Theta = 2\pi V$ and $R = \sqrt{-2\ln U}$ . We now show that $R$ and $\Theta$ , under the polar coordinate transformation, are independent and that they yield standard normal random variables.

Consider the probability density function of one variable when the other

is fixed. The probability density function of $\Theta$ is:

P_\Theta(\Theta) = \frac{1}{2\pi}

which is straightforward. For $R$ , the cumulative distribution function is:

P(R \leq R_0) = \\ P\left(U \leq \exp\left(\frac{-R_0^2}{2}\right)\right)

Thus, the probability density function of $R$ is:

P_R(R) = R e^{-\frac{R^2}{2}}

Hence, the joint probability density of $R$ and $\Theta$ is:

P(R, \Theta) = \frac{1}{2\pi} R e^{-\frac{R^2}{2}}

Since the polar coordinates yield:

R \, dR \, d\Theta = dX \, dY

we obtain the joint probability density of $X$ and $Y$ :

P(X,Y) = \frac{1}{2\pi} e^{-\frac{X^2 + Y^2}{2}} \\ = \left(\frac{1}{\sqrt{2\pi}} e^{-\frac{X^2}{2}}\right) \left(\frac{1}{\sqrt{2\pi}} e^{-\frac{Y^2}{2}}\right)

This is precisely the probability density function of two independent, normally distributed random variables $X$ and $Y$ , each following a standard normal distribution.

Code Implementation

function randInterval(): number {
  return Math.random();
}

function boxMuller(mu: number, sigma: number): [number, number] {
  const u = randInterval();
  const v = randInterval();
  const x = Math.cos(2 * Math.PI * u) * Math.sqrt(-2 * Math.log(v));
  const y = Math.sin(2 * Math.PI * u) * Math.sqrt(-2 * Math.log(v));
  return [x * sigma + mu, y * sigma + mu];
}

// Usage example
const [value1, value2] = boxMuller(0, 1);
console.log(value1, value2);
// Outputs two normally distributed random numbers

To verify that the boxMuller function generates numbers that follow a normal distribution, one can generate a large set of random numbers and plot a histogram to compare with a standard normal distribution. You can use libraries such as plotly to visualize the histogram and evaluate the output.

Here is a test program that generates a dataset using the boxMuller function and performs basic statistical tests, along with plotting the histogram. Assuming you are running in a Node.js environment, you can use plotly-nodejs or directly use plotly.js in the browser to generate the graph.

First, install the required dependencies:

npm i plotly.js-dist

Then, execute the following code:

import * as fs from "fs";
import * as plotly from "plotly.js-dist";

// Generate random numbers following a normal distribution using Box-Muller
function randInterval(): number {
  return Math.random();
}

function boxMuller(mu: number, sigma: number): [number, number] {
  const u = randInterval();
  const v = randInterval();
  const x = Math.cos(2 * Math.PI * u) * Math.sqrt(-2 * Math.log(v));
  const y = Math.sin(2 * Math.PI * u) * Math.sqrt(-2 * Math.log(v));
  return [x * sigma + mu, y * sigma + mu];
}

// Generate n normally distributed random numbers
function generateNormalDistribution(
  mu: number,
  sigma: number,
  n: number
): number[] {
  const values: number[] = [];
  for (let i = 0; i < n / 2; i++) {
    const [value1, value2] = boxMuller(mu, sigma);
    values.push(value1, value2);
  }
  return values;
}

// Generate the data
const mu = 0;
const sigma = 1;
const sampleSize = 10000; // Sample size
const values = generateNormalDistribution(mu, sigma, sampleSize);

// Calculate the mean and variance of the generated random numbers
const mean = values.reduce((acc, val) => acc + val, 0) / values.length;
const variance =
  values.reduce((acc, val) => acc + (val - mean) ** 2, 0) / values.length;

console.log(`Mean: ${mean}`);
console.log(`Variance: ${variance}`);

// Plot the histogram
const trace = {
  x: values,
  type: "histogram",
  xbins: {
    size: 0.1, // Bin size for the histogram
  },
  marker: {
    color: "blue",
  },
  opacity: 0.7,
  name: "Box-Muller Distribution",
};

const data = [trace];

const layout = {
  title: "Generated Normal Distribution",
  xaxis: { title: "Value" },
  yaxis: { title: "Frequency" },
  bargap: 0.05,
};

const graphOptions = {
  filename: "box-muller-distribution",
  fileopt: "overwrite",
};
fs.writeFileSync("box-muller.html", plotly.plot(data, layout, graphOptions));

console.log("Box-Muller distribution data saved as box-muller.html.");

This program uses the boxMuller function to generate 10,000 random numbers that follow a normal distribution. It calculates the mean and variance of the generated dataset. The random numbers are visualized using a histogram, allowing for a direct comparison with a standard normal distribution. The histogram is saved as a file named box-muller.html, which can be opened in a browser for viewing.

MY BLOG

Content

Derivation

Verification

Code Implementation