The Fréchet Mean Prototype

When the data being processed are not simple vectors (e.g., complex matrices, probability distributions, points on a curved surface) and cannot be directly summed and divided by $N$ , the Fréchet mean becomes necessary.

The core value of the Fréchet mean prototype is that it generalizes the "averaging" operation from flat vector spaces to arbitrary curved spaces or complex objects, enabling many classical statistical methods and machine learning algorithms to handle non-vector data.

Mean Prototype

The Mean Prototype of a set of vectors refers to a new vector obtained by arithmetically averaging each corresponding component of the vectors.

More specifically: Suppose we have $m$ $n$ -dimensional vectors:

\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m

where each vector $\mathbf{v}_i = (v_{i1}, v_{i2}, \dots, v_{in})$ .

The mean prototype $\boldsymbol{\mu}$ is simply defined as:

\boldsymbol{\mu} = \frac{1}{m} \sum_{i=1}^{m} \mathbf{v}_i

That is:

\boldsymbol{\mu} = \left( \frac{1}{m}\sum_{i=1}^{m} v_{i1},\ \frac{1}{m}\sum_{i=1}^{m} v_{i2},\ \dots,\ \frac{1}{m}\sum_{i=1}^{m} v_{in} \right)

Intuitively, the mean vector is the "center point" or "average position" of the set of vectors. In data analysis and machine learning (e.g., clustering, PCA), it is commonly used to represent the overall trend of the dataset. The Mean Prototype can be seen as the "optimal representative point under Euclidean geometry + Gaussian assumptions." If your samples are embedded on a unit sphere or follow a vMF distribution, the Euclidean mean is no longer optimal.

Geometric Median Prototype

The Geometric Median Prototype refers to: For a set of vectors (data points), find a point that minimizes the sum of Euclidean distances to all data points. This point is the geometric median of the set of vectors and serves as the "prototype" representing the entire dataset.

Formal definition: Given $m$ $n$ -dimensional vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m$ , the geometric median $\mathbf{p}$ satisfies:

\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_2

where $\|\cdot\|_2$ is the Euclidean distance under the $L2$ norm.

Intuitively, in 1D: the geometric median is the ordinary median (minimizing the sum of absolute deviations). Extended to multiple dimensions: it is a point in space that minimizes the "total straight-line distance" to all data points. Imagine finding a location on a plane that minimizes the total travel distance to all given points — similar to finding a gathering point that minimizes total travel distance among several cities.

The Geometric Median Prototype is highly robust, which is its most important characteristic — even if the data contains a few outliers, the geometric median still reflects the center of the "majority" of points. It is equivariant under rotation and translation: if the data is rotated or translated, the geometric median changes accordingly. It can lie anywhere between data points (unlike K-medoids, which can only select actual data points).

Since there is no closed-form solution, the Weiszfeld algorithm (iteratively reweighted average) is commonly used:

\mathbf{p}_{t+1} = \frac{\sum_{i=1}^m \frac{\mathbf{v}_i}{\|\mathbf{p}_t - \mathbf{v}_i\|}}{\sum_{i=1}^m \frac{1}{\|\mathbf{p}_t - \mathbf{v}_i\|}}

The initial value can be the mean vector, iterating until convergence. Note: Special handling is required when $\mathbf{p}_t$ coincides exactly with a data point (skip that point).

It is used to represent "typical samples" instead of the mean when data contains outliers or noise. Hence, it serves as cluster centers in some clustering algorithms (more robust than K-means).

In video background modeling, the geometric median of multiple frames can serve as the background (unaffected by foreground moving objects). It is also used to find the optimal service point among multiple demand points (e.g., hospital, warehouse location).

The Geometric Median Prototype is the "spatial center" of data points, minimizing the total Euclidean distance to all points, and is more robust to outliers than the mean vector.

$L_p$ Euclidean Norm

The Euclidean norm is the most familiar "vector distance". Different norms essentially define "what distance/magnitude means", and different definitions directly change the geometric structure, optimization behavior, and even model preferences. Generally, we refer to the $L_p$ norm, where $p$ is a parameter.

For an $n$ -dimensional vector $\mathbf{x} = (x_1, x_2, ..., x_n)$ , the Lp norm is defined as:

\|\mathbf{x}\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}, \quad p \ge 1

Different values of $p$ yield distances under different norms. When $p=1$ , we obtain the distance under the $L_1$ norm:

\|\mathbf{x}\|_1 = \sum_{i=1}^{n} |x_i| = |x_1| + |x_2| + ... + |x_n|

Intuitively, it is the sum of absolute values of each component. In a 2D plane, the $L_1$ distance is the Manhattan distance (moving only along coordinate axes). For example, $\mathbf{x} = (3, -4)$ has $L_1$ norm $\|\mathbf{x}\|_1 = |3| + |-4| = 3 + 4 = 7$ .

When $p=2$ , the distance under the $L_2$ norm is defined as:

\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2} = \sqrt{x_1^2 + x_2^2 + ... + x_n^2}

Intuitively, it is the straight-line distance from the origin to that point in geometry. For example, $\mathbf{x} = (3, -4)$ has $L_2$ norm $\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2} = \sqrt{9+16} = 5$ . Geometrically, when $\|\mathbf{x}\|_2 = 1$ , the vectors lie on the unit circle.

In summary, the distance under the $L_1$ norm is the sum of absolute values per coordinate (Manhattan distance), yielding sparse solutions. The $L_2$ norm is the square root of the sum of squares (Euclidean distance), smooth and easy to optimize. The general $L_p$ formula is the $(1/p)$ -th power of the sum of the $p$ -th powers of each component, approaching the maximum as $p \to \infty$ .

Coordinate-wise Median Prototype

The Coordinate-wise Median Prototype refers to: For a multi-dimensional dataset, compute the median on each dimension (coordinate) separately, and then combine these medians into a new vector that serves as the "prototype" or central representative of the entire dataset.

Suppose we have $m$ $n$ -dimensional vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m$ , where

\mathbf{v}_i = (v_{i1}, v_{i2}, \dots, v_{in})

The Coordinate-wise Median Prototype $\mathbf{p} = (p_1, p_2, \dots, p_n)$ is defined as:

p_j = \text{median}(v_{1j}, v_{2j}, \dots, v_{mj}), \quad j = 1, 2, \dots, n

That is, each component independently takes the median of all data points on that dimension.

If rotated by 45 degrees, the coordinate-wise median result changes (not rotation equivariant), whereas the geometric median rotates accordingly.

In computation, it only requires sorting per dimension $O(m \log m)$ or using a selection algorithm $O(m)$ , and it can resist up to 50% of arbitrary extreme outliers.

However, the choice of coordinate axes affects the result. If the data has correlations (e.g., a tilted ellipse), the coordinate-wise median can deviate completely from the true center. It necessarily lies within the value range of each dimension, so it remains within the axis-aligned bounding box of the data, but not necessarily within the convex hull of the data points (because each dimension is chosen independently). In machine learning, it lacks a global optimization objective, unlike the median or geometric median which minimize some sum of distances. It is not suitable for objective functions.

It is robust to outliers when the data has outliers and the dimensions are relatively independent.

For example, in 3D space, we have a $4 \times 3$ vector matrix:

\begin{pmatrix} 5 & 3 & 7 & 4 \\ 8 & 5 & 6 & 7 \\ 12 & 15 & 10 & 13 \end{pmatrix}

The median of the 1st dimension is $4.5$ , the 2nd dimension is $6.5$ , and the 3rd dimension is $12.5$ .

Its Coordinate-wise Median Prototype is $(4.5, 6.5, 12.5)^T$ .

Extend to High Norm

Having defined the Euclidean norm, let's revisit the definitions of the mean prototype, geometric median prototype, and coordinate-wise median prototype. Clearly, we can define them using the Euclidean norm.

The mean prototype is the point that minimizes the sum of squared Euclidean distances ( $L_2$ distance norm) to all points:

\mathbf{\mu} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_2^2

The geometric median prototype is the point that minimizes the sum of Euclidean distances ( $L_2$ distance norm) to all points:

\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m ||\mathbf{x} - \mathbf{v}_i||_2

The coordinate-wise median prototype is the point that minimizes the sum of Manhattan distances ( $L_1$ distance norm) to all points:

\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m ||\mathbf{x} - \mathbf{v}_i||_1

At this point, we can unify all the above concepts:

\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_p^q

where $p$ is the distance norm, and $q$ is the power to which the distance under that norm is raised. $p$ determines how distance is measured, while $q$ determines the penalty on outliers: the larger $q$ , the more sensitive to outliers.

This formula also describes the core objective function of the Generalized Weber Location Problem.

Mean Prototype

Geometric Median Prototype

LpL_pLp​ Euclidean Norm

Coordinate-wise Median Prototype

Extend to High Norm

$L_p$ Euclidean Norm