Machine learning abc…, 123… with fast.ai

Jupyter notebook for this exercise can be downloaded here:

Data bundle containing the font catalog used in this exercise is available for download here. The fontcatalog was generated using the code here.

Here we go:

%matplotlib inline
from fastai import *
from fastai.vision import *
np.random.seed(77)

we get:

try:
datapath = Path('/home/kid/mlearn/fontcatalog')
datapath.ls()
except:
datapath = Path('/home/jupyter/tutorials/data/fontcatalog')
datapath.ls()
sz, bs = 175, 128
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(datapath, train='.', valid_pct=.2,
ds_tfms=tfms, size=sz, bs=bs).normalize(imagenet_stats)
data.show_batch(rows=3, figsize=(7,6))
train, valid = [], []
count_train, count_valid, count_grand = 0, 0, 0
for i in range(len(data.train_ds)):
train.append(f'{data.train_ds.y[i]}'.split()[0])
for i in range(len(data.valid_ds)):
valid.append(f'{data.valid_ds.y[i]}'.split()[0])
train = np.asarray(train)
valid = np.asarray(valid)

We get:

Start learning:

learn = create_cnn(data, models.resnet18, metrics=error_rate)
learn.fit_one_cycle(1)
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()
learn.fit_one_cycle(9, max_lr=slice(2e-4,3e-4))
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(15,15))
wrongs = 0
for t in interp.most_confused():
wrongs += t[2]
print(wrongs, 'wrong')

We get:

34 wrong
interp.plot_top_losses(int(wrongs), figsize=(15,11))

We get:

The machine made 34 wrong guesses out of 3515 validation data points. Error rate is less than 1% The wrong guesses are in fact trivial or even quite reasonable mistakes, like mixing up lower-case and upper-case i, o, x, v, z, p, s and w. The only mistake I find unforgivable is mistaking h for k, which occurred twice.

Leave a Reply

Your email address will not be published. Required fields are marked *