The Wilson Score - Ranking Algorithm

    The Secret for Zhihu Rank
    Mar 5, 2023

    Wilson Score sorting algorithm, Wilson Score, is used for quality sorting. The data contains positive and negative reviews. Taking into account the number of comments and praise rate, the higher the score, the higher the quality.

    n=u+vp=uvS=(p+zα22nzα22n4n(1p)p+zα2)(1+zα2n)n = u + v \\ p = \frac{u}{v} \\ S = \frac{(p + \frac{z_{α}^{2}}{2n} - \frac{z_{α}^{2}}{2n}\sqrt{4n(1-p)p+z_{α}^{2}} )}{(1+\frac{z_{α}^{2}}{n})}

    uu represents the number of positive examples (good reviews), vv represents the number of negative examples (bad reviews), nn represents the total number of instances (total number of reviews), pp represents the positive rate, and zz is the score of the normal distribution. Number of bits (parameter), SS represents the final Wilson score. zz generally takes a value of 2, which is 95% confidence.

    Here is the quantile table for the normal distribution:

    Algorithm traits:

    1. Properties: The range of score SS is [0,1)[0,1), effect: normalized, suitable for sorting
    2. Properties: When the number of positive examples uu is 00, pp is 00, and the score SS is 00; effect: no positive comments, the score is the lowest;
    3. Properties: When the number of negative examples vv is 00, pp is 11, degenerates to 1/(1+z2/n)1/(1 + z^2 / n), and the score SS is always less than 11; effect: the score has permanent comparability;
    4. Properties: When pp remains unchanged, the larger nn is, the reduction speed of the numerator is smaller than the reduction speed of the denominator, and the greater the score SS, and vice versa; Effect: The positive rate pp is the same, the total number of instances is nThemoren The more , the more SS you score;
    5. Properties: When nn tends to infinity, it degenerates to pp, and the score SS is determined by pp; Effect: When the total number of comments nn is more, the positive rate pp will bring the score STheimprovementofS The improvement of is more obvious;
    6. Properties: When the quantile zz is larger, the total number of reviews nn is more important, and the praise rate pp is less important, and vice versa; Effect: The larger zz is, the total number of reviews nn is more important. The degree of discrimination is low; the smaller zz is, the more important the praise rate pp is;

    Click here to get source code

    Python Implementation.

    def wilson_score(pos, total, p_z=2.):
        pos_rat = pos * 1. / total * 1.  # 正例比率
        score = (pos_rat + (np.square(p_z) / (2. * total))
                 - ((p_z / (2. * total)) * np.sqrt(4. * total * (1. - pos_rat) * pos_rat + np.square(p_z)))) / \
                (1. + np.square(p_z) / total)
        return score
    

    Distribution plot of Wilson score algorithm

    Let's try to put this theory into use with following example.

    Example: Suppose Doctor A has 100 reviews, 1 negative review and 99 positive reviews. Doctor B has 2 reviews, both of which are good reviews. Which one should be ranked first?

    When z=2z=2, that is, 95% confidence level, doctor A's score is 0.9440, doctor B's score is 0.3333, and doctor A ranks in front.


    PS: Rating Level Question: What should we do if we have a five-star evaluation system or a hundred percent evaluation system?

    Just change the Wilson score formula from Bernoulli distribution to normal distribution.

    S=(μ+zα22nzα22n4nσ2+zα2)(1+zα2n)μ(0,1)S = \frac{(μ + \frac{z_{α}^{2}}{2n} - \frac{z_{α}^{2}}{2n}\sqrt{4nσ^{2}+z_{α}^{2}} )}{(1+\frac{z_{α}^{2}}{n})} \\ μ ∈(0,1)

    Note: The mean and variance are both normalized values.

    Python Implementation.

    def wilson_score_norm(mean, var, total, p_z=2.):
        # The mean variance needs to be normalized to fit the quantiles of the normal distribution
        score = (mean + (np.square(p_z) / (2. * total))
                 - ((p_z / (2. * total)) * np.sqrt(4. * total * var + np.square(p_z)))) / \
                (1 + np.square(p_z) / total)
        return score
    

    Example of normalization:

    def test_of_values():
        max = 5.  # Maximum evaluation value.
        min = 1.  # Minimum evaluation value.
        values = np.array([1., 2., 3., 4., 5.])  # Example
    
        norm_values = (values - min) / (max - min)  # Normalize
        total = norm_values.size
        mean = np.mean(norm_values)
        var = np.var(norm_values)
        return total, mean, var
    

    PS: About the z parameter, that is, the normal quantile. The normal quantile affects the distribution of Wilson scores, and the value of the zz parameter is based on the magnitude of the number of samples. For example: the same 100 samples, 90 positive reviews, zz value is 2 or 6, the scores are very different, and the number of samples accommodated (or distinguished) by the system is also very different (the same is 0.82 points and 90% positive reviews, z=2z=2 requires 100 samples, z=6z=6 requires 1000 samples), generally speaking, the larger the magnitude of the number of samples, the larger the value of z.

    print 'score: %s' % wilson_score(90, 90 + 10, p_z=2.)
    print 'score: %s' % wilson_score(90, 90 + 10, p_z=6.)
    print 'score: %s' % wilson_score(900, 900 + 100, p_z=6.)
    
    # take 2-100:score: 0.823802352689
    # take 6-100:score: 0.606942322627
    # take 6-1000:score: 0.828475631056
    

    References: