Fowlkes-Mallows scores
scikit-learn’s implementation, textbook algorithms and back-of-envelop calculations
from sklearn import metrics
import numpy as np
import pandas as pddef mycalc(df):
ct = pd.crosstab(df.y, df.c).values
tk, pk, qk = -len(df.y), -len(df.y), -len(df.y)
for row in range(ct.shape[0]):
t = 0
for col in range(ct.shape[1]):
tk += ct[row, col]**2
t += ct[row, col]
pk += t**2
for col in range(ct.shape[1]):
t = 0
for row in range(ct.shape[0]):
t += ct[row, col]
qk += t**2
FMI = tk / pk**.5 / qk**.5
print('Fowlkes-Mallows score calculated semi-automatically')
print(FMI)Example 1¶
y = np.array([0, 0, 0, 1, 1, 1], dtype=int)
c = np.array([0, 0, 1, 1, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
pd.crosstab(df.y, df.c, margins=True)Loading...
Manual calculation of Folkes-Mallows score
tk = 2^2 + 1^2 + 1^2 + 2^2 - 6 = 4
pk = 3^2 + 3^2 - 6 = 12
qk = 2^2 + 2^2 + 2^2 - 6 = 6
FMI = 4 / sqrt(12*6) = 0.4714
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)Fowlkes-Mallows score calculated using scikit-learn
0.4714045207910317
Fowlkes-Mallows score calculated semi-automatically
0.4714045207910318
Example #2¶
y = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3], dtype=int)
c = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
ct = pd.crosstab(df.y, df.c, margins=True).values
pd.crosstab(df.y, df.c, margins=True)Loading...
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)Fowlkes-Mallows score calculated using scikit-learn
0.4445299063132172
Fowlkes-Mallows score calculated semi-automatically
0.44452990631321715
Example #3¶
y = np.array([1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3], dtype=int)
c = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})
ct = pd.crosstab(df.y, df.c, margins=True).values
pd.crosstab(df.y, df.c, margins=True)Loading...
print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)Fowlkes-Mallows score calculated using scikit-learn
0.4968275423500662
Fowlkes-Mallows score calculated semi-automatically
0.4968275423500662
Example #4¶
y = np.array([0, 1, 2, 0, 3, 4, 5, 1], dtype=int)
c = np.array([1, 1, 0, 0, 2, 2, 2, 2], dtype=int)
df = pd.DataFrame({'y':y, 'c':c})print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)Fowlkes-Mallows score calculated using scikit-learn
0.0
Fowlkes-Mallows score calculated semi-automatically
0.0
Example #5¶
c = y
df = pd.DataFrame({'y':y, 'c':c})print('Fowlkes-Mallows score calculated using scikit-learn')
print(metrics.fowlkes_mallows_score(y, c))
mycalc(df)Fowlkes-Mallows score calculated using scikit-learn
1.0
Fowlkes-Mallows score calculated semi-automatically
1.0