Sample Base#

This notebook will do a simple pipeline based on fastai with transfer learning concepts using resnet architecture. Typical ideas I want to implement here

  1. Get a working pipeline with the current dataset ( using Fastai)

  2. Do a submission on Kaggle

  3. Do some iterations using simple data augmentations

We will split the train dataset into 80-20 split. The idea is to have an actual measurement of model performance looking at the training data. However, trick for kaggle here is to train the model on entire training dataset before submitting results and reviewing score on the test.

Note

  • Big Images are not easy to fit on GPU.

  • Resizing big image to smaller sizes tends to loose certain features

  • Next ideas

    • Apply Digit Cleaner concept

    • Create Clean Dataset / Visualize and review accuracy of digit cleaner

    • Figure out way to do a bounding box

    • After that we can apply a few ideas

      • Apply resnet after digit cleaner ( current data had 0.93 error_rate after 1 epoch)

      • Do bounding box and break down dataset into individual digits. Then do clustering to 10 categories. Then use labels to calculate sum_digit

      • Split into individual digit images -> Resize -> Merge (permutations) into single image then train model on new dataset

Imports#

%load_ext autoreload
%autoreload 2
import torch
torch.cuda.empty_cache()
!nvidia-smi
Wed Mar 16 10:16:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   52C    P8    28W / 149W |      3MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
import fastai
from fastai.vision.all import *
from fastai.vision.all import *
from fastai.vision.widgets import *
from aiking.data.external import * #We need to import this after fastai modules
import warnings
from matplotlib import cm
import shutil

warnings.filterwarnings("ignore")
path = untar_data("kaggle_competitions::ultra-mnist"); path

(path/"sample").ls()
(#560) [Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/zzoraczqoe.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/rwrnaoifjc.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/aadalkvtqc.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/qtmqrprqyd.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/wpkrnfycyr.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/ohzlpnyrpp.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/fnffkeomht.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/hzkaomeimm.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/owxpzanmht.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/tapvaidyxs.jpeg')...]
# !mkdir {path}/'sample'
path/'sample'
path/'train'
Path('/Landmark2/pdo/aiking/data/ultra-mnist/train')
(path/"sample").ls()
(#560) [Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/zzoraczqoe.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/rwrnaoifjc.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/aadalkvtqc.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/qtmqrprqyd.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/wpkrnfycyr.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/ohzlpnyrpp.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/fnffkeomht.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/hzkaomeimm.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/owxpzanmht.jpeg'),Path('/Landmark2/pdo/aiking/data/ultra-mnist/sample/tapvaidyxs.jpeg')...]
df_train = pd.read_csv(path/'train_sample.csv')
df_train['id'] = df_train['id'] + ".jpeg"
df_train
id digit_sum
0 haxcbbkrsu.jpeg 21
1 bkdcgutajl.jpeg 13
2 jpayvdwnbi.jpeg 0
3 qtuwojbevu.jpeg 11
4 gtnfnthaiw.jpeg 8
... ... ...
443 xtkzizjujh.jpeg 10
444 rzuqbkpskf.jpeg 16
445 ytotgfuoop.jpeg 6
446 mpfuvzedfn.jpeg 19
447 tuduwmtonr.jpeg 19

448 rows × 2 columns

dls = ImageDataLoaders.from_df(df_train, path, folder='sample', valid_pct=0.2, fn_col=0, label_col=1, bs=16, item_tfms=Resize(299)); dls
<fastai.data.core.DataLoaders at 0x967c12c96a0>
dls.show_batch()
../../_images/04_sample_baseline_13_0.png
!nvidia-smi
Wed Mar 16 10:17:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   60C    P0    61W / 149W |    557MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    878937      C   ...da/envs/aiking/bin/python      554MiB |
+-----------------------------------------------------------------------------+
learn = cnn_learner(dls, resnet34, metrics=[error_rate, accuracy]); learn
<fastai.learner.Learner at 0x967a7dfb190>
learn.fine_tune(3)
epoch train_loss valid_loss error_rate accuracy time
0 5.054008 4.505962 0.966292 0.033708 01:25
epoch train_loss valid_loss error_rate accuracy time
0 3.570383 4.225568 0.966292 0.033708 01:20
1 2.741821 4.159256 0.977528 0.022472 01:10
2 2.019940 4.128638 0.977528 0.022472 01:11