2026, April, 25
The Fréchet Mean Prototype

The Fréchet Mean Prototype is the center point in a metric space that minimizes the sum of the squared distances to all data points. It extends the averaging operation from flat vector spaces to arbitrary curved spaces or complex objects.

When the data being processed are not simple vectors (e.g., complex matrices, probability distributions, points on a curved surface) and cannot be directly summed and divided by NN, the Fréchet mean becomes necessary.

The core value of the Fréchet mean prototype is that it generalizes the "averaging" operation from flat vector spaces to arbitrary curved spaces or complex objects, enabling many classical statistical methods and machine learning algorithms to handle non-vector data.

Mean Prototype

The Mean Prototype of a set of vectors refers to a new vector obtained by arithmetically averaging each corresponding component of the vectors.

More specifically: Suppose we have mm nn-dimensional vectors:

v1,v2,,vm \mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m

where each vector vi=(vi1,vi2,,vin)\mathbf{v}_i = (v_{i1}, v_{i2}, \dots, v_{in}).

The mean prototype μ\boldsymbol{\mu} is simply defined as:

μ=1mi=1mvi \boldsymbol{\mu} = \frac{1}{m} \sum_{i=1}^{m} \mathbf{v}_i

That is:

μ=(1mi=1mvi1, 1mi=1mvi2, , 1mi=1mvin) \boldsymbol{\mu} = \left( \frac{1}{m}\sum_{i=1}^{m} v_{i1},\ \frac{1}{m}\sum_{i=1}^{m} v_{i2},\ \dots,\ \frac{1}{m}\sum_{i=1}^{m} v_{in} \right)

Intuitively, the mean vector is the "center point" or "average position" of the set of vectors. In data analysis and machine learning (e.g., clustering, PCA), it is commonly used to represent the overall trend of the dataset. The Mean Prototype can be seen as the "optimal representative point under Euclidean geometry + Gaussian assumptions." If your samples are embedded on a unit sphere or follow a vMF distribution, the Euclidean mean is no longer optimal.

Geometric Median Prototype

The Geometric Median Prototype refers to: For a set of vectors (data points), find a point that minimizes the sum of Euclidean distances to all data points. This point is the geometric median of the set of vectors and serves as the "prototype" representing the entire dataset.

Formal definition: Given mm nn-dimensional vectors v1,v2,,vm\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m, the geometric median p\mathbf{p} satisfies:

p=argminxRni=1mxvi2\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_2

where 2\|\cdot\|_2 is the Euclidean distance under the L2L2 norm.

Intuitively, in 1D: the geometric median is the ordinary median (minimizing the sum of absolute deviations). Extended to multiple dimensions: it is a point in space that minimizes the "total straight-line distance" to all data points. Imagine finding a location on a plane that minimizes the total travel distance to all given points — similar to finding a gathering point that minimizes total travel distance among several cities.

The Geometric Median Prototype is highly robust, which is its most important characteristic — even if the data contains a few outliers, the geometric median still reflects the center of the "majority" of points. It is equivariant under rotation and translation: if the data is rotated or translated, the geometric median changes accordingly. It can lie anywhere between data points (unlike K-medoids, which can only select actual data points).

Since there is no closed-form solution, the Weiszfeld algorithm (iteratively reweighted average) is commonly used:

pt+1=i=1mviptvii=1m1ptvi\mathbf{p}_{t+1} = \frac{\sum_{i=1}^m \frac{\mathbf{v}_i}{\|\mathbf{p}_t - \mathbf{v}_i\|}}{\sum_{i=1}^m \frac{1}{\|\mathbf{p}_t - \mathbf{v}_i\|}}

The initial value can be the mean vector, iterating until convergence. Note: Special handling is required when pt\mathbf{p}_t coincides exactly with a data point (skip that point).

It is used to represent "typical samples" instead of the mean when data contains outliers or noise. Hence, it serves as cluster centers in some clustering algorithms (more robust than K-means).

In video background modeling, the geometric median of multiple frames can serve as the background (unaffected by foreground moving objects). It is also used to find the optimal service point among multiple demand points (e.g., hospital, warehouse location).

The Geometric Median Prototype is the "spatial center" of data points, minimizing the total Euclidean distance to all points, and is more robust to outliers than the mean vector.

LpL_p Euclidean Norm

The Euclidean norm is the most familiar "vector distance". Different norms essentially define "what distance/magnitude means", and different definitions directly change the geometric structure, optimization behavior, and even model preferences. Generally, we refer to the LpL_p norm, where pp is a parameter.

For an nn-dimensional vector x=(x1,x2,...,xn)\mathbf{x} = (x_1, x_2, ..., x_n), the Lp norm is defined as:

xp=(i=1nxip)1/p,p1\|\mathbf{x}\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}, \quad p \ge 1

Different values of pp yield distances under different norms. When p=1p=1, we obtain the distance under the L1L_1 norm:

x1=i=1nxi=x1+x2+...+xn\|\mathbf{x}\|_1 = \sum_{i=1}^{n} |x_i| = |x_1| + |x_2| + ... + |x_n|

Intuitively, it is the sum of absolute values of each component. In a 2D plane, the L1L_1 distance is the Manhattan distance (moving only along coordinate axes). For example, x=(3,4)\mathbf{x} = (3, -4) has L1L_1 norm x1=3+4=3+4=7\|\mathbf{x}\|_1 = |3| + |-4| = 3 + 4 = 7.

When p=2p=2, the distance under the L2L_2 norm is defined as:

x2=i=1nxi2=x12+x22+...+xn2\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^{n} x_i^2} = \sqrt{x_1^2 + x_2^2 + ... + x_n^2}

Intuitively, it is the straight-line distance from the origin to that point in geometry. For example, x=(3,4)\mathbf{x} = (3, -4) has L2L_2 norm x2=32+(4)2=9+16=5\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2} = \sqrt{9+16} = 5. Geometrically, when x2=1\|\mathbf{x}\|_2 = 1, the vectors lie on the unit circle.

In summary, the distance under the L1L_1 norm is the sum of absolute values per coordinate (Manhattan distance), yielding sparse solutions. The L2L_2 norm is the square root of the sum of squares (Euclidean distance), smooth and easy to optimize. The general LpL_p formula is the (1/p)(1/p)-th power of the sum of the pp-th powers of each component, approaching the maximum as pp \to \infty.

Coordinate-wise Median Prototype

The Coordinate-wise Median Prototype refers to: For a multi-dimensional dataset, compute the median on each dimension (coordinate) separately, and then combine these medians into a new vector that serves as the "prototype" or central representative of the entire dataset.

Suppose we have mm nn-dimensional vectors v1,v2,,vm\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_m, where

vi=(vi1,vi2,,vin)\mathbf{v}_i = (v_{i1}, v_{i2}, \dots, v_{in})

The Coordinate-wise Median Prototype p=(p1,p2,,pn)\mathbf{p} = (p_1, p_2, \dots, p_n) is defined as:

pj=median(v1j,v2j,,vmj),j=1,2,,np_j = \text{median}(v_{1j}, v_{2j}, \dots, v_{mj}), \quad j = 1, 2, \dots, n

That is, each component independently takes the median of all data points on that dimension.

If rotated by 45 degrees, the coordinate-wise median result changes (not rotation equivariant), whereas the geometric median rotates accordingly.

In computation, it only requires sorting per dimension O(mlogm)O(m \log m) or using a selection algorithm O(m)O(m), and it can resist up to 50% of arbitrary extreme outliers.

However, the choice of coordinate axes affects the result. If the data has correlations (e.g., a tilted ellipse), the coordinate-wise median can deviate completely from the true center. It necessarily lies within the value range of each dimension, so it remains within the axis-aligned bounding box of the data, but not necessarily within the convex hull of the data points (because each dimension is chosen independently). In machine learning, it lacks a global optimization objective, unlike the median or geometric median which minimize some sum of distances. It is not suitable for objective functions.

It is robust to outliers when the data has outliers and the dimensions are relatively independent.

For example, in 3D space, we have a 4×34 \times 3 vector matrix:

(5374856712151013)\begin{pmatrix} 5 & 3 & 7 & 4 \\ 8 & 5 & 6 & 7 \\ 12 & 15 & 10 & 13 \end{pmatrix}

The median of the 1st dimension is 4.54.5, the 2nd dimension is 6.56.5, and the 3rd dimension is 12.512.5.

Its Coordinate-wise Median Prototype is (4.5,6.5,12.5)T(4.5, 6.5, 12.5)^T.

Extend to High Norm

Having defined the Euclidean norm, let's revisit the definitions of the mean prototype, geometric median prototype, and coordinate-wise median prototype. Clearly, we can define them using the Euclidean norm.

The mean prototype is the point that minimizes the sum of squared Euclidean distances (L2L_2 distance norm) to all points:

μ=argminxRni=1mxvi22\mathbf{\mu} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_2^2

The geometric median prototype is the point that minimizes the sum of Euclidean distances (L2L_2 distance norm) to all points:

p=argminxRni=1mxvi2\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m ||\mathbf{x} - \mathbf{v}_i||_2

The coordinate-wise median prototype is the point that minimizes the sum of Manhattan distances (L1L_1 distance norm) to all points:

p=argminxRni=1mxvi1\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m ||\mathbf{x} - \mathbf{v}_i||_1

At this point, we can unify all the above concepts:

p=argminxRni=1mxvipq\mathbf{p} = \arg\min_{\mathbf{x} \in \mathbb{R}^n} \sum_{i=1}^m \|\mathbf{x} - \mathbf{v}_i\|_p^q

where pp is the distance norm, and qq is the power to which the distance under that norm is raised. pp determines how distance is measured, while qq determines the penalty on outliers: the larger qq, the more sensitive to outliers.

This formula also describes the core objective function of the Generalized Weber Location Problem.