Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Fowlkes-Mallows scores

  • scikit-learn’s implementation, textbook algorithms and back-of-envelop calculations

from sklearn import metrics
import numpy as np
import pandas as pd
def mycalc(df):
    ct = pd.crosstab(df.y, df.c).values
    tk, pk, qk = -len(df.y), -len(df.y), -len(df.y)
    for row in range(ct.shape[0]):
        t = 0
        for col in range(ct.shape[1]):
            tk += ct[row, col]**2
            t += ct[row, col]
        pk += t**2
    for col in range(ct.shape[1]):
        t = 0
        for row in range(ct.shape[0]):
            t += ct[row, col]
        qk += t**2
            
    FMI = tk / pk**.5 / qk**.5
    print('Fowlkes-Mallows score calculated semi-automatically')
    print(FMI)

Example 1

y = np.array([0, 0, 0, 1, 1, 1], dtype=int)
c = np.array([0, 0, 1, 1, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
pd.crosstab(df.y, df.c, margins=True)
Loading...

Manual calculation of Folkes-Mallows score

tk = 2^2 + 1^2 + 1^2 + 2^2 - 6 = 4

pk = 3^2 + 3^2 - 6 = 12

qk = 2^2 + 2^2 + 2^2 - 6 = 6

FMI = 4 / sqrt(12*6) = 0.4714

print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)
Fowlkes-Mallows score calculated using scikit-learn
0.4714045207910317
Fowlkes-Mallows score calculated semi-automatically
0.4714045207910318

Example #2

y = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3], dtype=int)
c = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
ct = pd.crosstab(df.y, df.c, margins=True).values
pd.crosstab(df.y, df.c, margins=True)
Loading...
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)
Fowlkes-Mallows score calculated using scikit-learn
0.4445299063132172
Fowlkes-Mallows score calculated semi-automatically
0.44452990631321715

Example #3

y = np.array([1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3], dtype=int)
c = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
ct = pd.crosstab(df.y, df.c, margins=True).values
pd.crosstab(df.y, df.c, margins=True)
Loading...
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)
Fowlkes-Mallows score calculated using scikit-learn
0.4968275423500662
Fowlkes-Mallows score calculated semi-automatically
0.4968275423500662

Example #4

y = np.array([0, 1, 2, 0, 3, 4, 5, 1], dtype=int)
c = np.array([1, 1, 0, 0, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)
Fowlkes-Mallows score calculated using scikit-learn
0.0
Fowlkes-Mallows score calculated semi-automatically
0.0

Example #5

c = y
df = pd.DataFrame({'y':y, 'c':c})
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)
Fowlkes-Mallows score calculated using scikit-learn
1.0
Fowlkes-Mallows score calculated semi-automatically
1.0