[Pytorch] Supervised Contrastive Learning ๐Ÿ”ฅ

Original Notebook : https://www.kaggle.com/code/debarshichanda/pytorch-supervised-contrastive-learning


SUPERVISED CONTRASTIVE LEARNING

Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from differenet classes. We analyze two possible ersions of the supervised contrastive (Supcon) loss, identifying the best-performing formulation of the loss. On resNet-200, we achieve top-1 accuarcy of 81.4% on the ImageNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions and is more stable to hyperparameter settings such as optimizers and dtat augmentations

Supervised Contrastive Learning : https://arxiv.org/abs/2004.11362

Install Libraries

!pip install -q timm pytorch-metric-learning

Import Packages

import os 
import cv2 
import copy 
import time 
import random 

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 

import torch 
import torch.nn as nn 
import torch.nn.functional as F 
import torch.optim as optim 
from torch.optim import lr_scheduler 
from torch.utils.data import DataLoader, Dataset 
from torch.cuda import amp 

from sklearn.preprocessing import LabelEncoder 
from sklearn.model_selection import train_test_split, StratifiedKFold, GroupKFold 
from sklearn.metrics import roc_auc_score, f1_score 

from tqdm.notebook import tqdm 
from collections import defaultdict 
import albumentations as A 
from albumentations.pytorch import ToTensorV2 

import timm 
from pytorch_metric_learning import losses 

Training Configuration

class CFG: 
  seed = 42 
  model_name = "tf_efficientnet_b4_ns" 
  img_size = 512 
  scheduler = "CosineAnnealingLR" 
  T_max = 10 
  lr = 1e-5 
  min_lr = 1e-6 
  batch_size = 16 
  weight_decay = 1e-6 
  num_epochs = 10 
  num_classes = 11014 
  embedding_size = 512 
  n_fold = 5 
  n_accumulate = 4 
  temperature = 0.1 
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
TRAIN_DIR = "./data/train_images"
TEST_DIR  = "./data/test_images"

Set Seed for Reproducibility

def set_seed(seed = 42): 
  """
  Sets the seed of the entire notebook so results are the same every time we run. 
  This is for REPRODUCIBILITY.
  """ 
  np.random.seed(seed) 
  random.seed(seed) 
  torch.manual_seed(seed) 
  torch.cuda.manual_seed(seed) 
  
  # When running on the Cudnn backend, two further options must be set 
  torch.backends.cudnn.deterministic = True 
  torch.backends.cudnn.benchmark = True 
  
  # Set a fixed value for the hash seed 
  os.environ['PYTHONHASHSEED'] = str(seed) 
  
set_seed(CFG.seed)
df_train = pd.read_csv("./data/train.csv")
df_train['file_path'] = df_train.image.apply(lambda x: os.path.join(TRAIN_DIR, x)) 
df_train.head(5)
#>          posting_id  ...                                          file_path
#> 0   train_129225211  ...  ./data/train_images/0000a68812bc7e98c42888dfb1...
#> 1  train_3386243561  ...  ./data/train_images/00039780dfc94d01db8676fe78...
#> 2  train_2288590299  ...  ./data/train_images/000a190fdd715a2a36faed16e2...
#> 3  train_2406599165  ...  ./data/train_images/00117e4fc239b1b641ff08340b...
#> 4  train_3369186413  ...  ./data/train_images/00136d1cf4edede0203f32f05f...
#> 
#> [5 rows x 6 columns]
le = LabelEncoder() 
df_train.label_group = le.fit_transform(df_train.label_group) 
df_train 
posting_id image image_phash title label_group file_path
train_129225211 0000a68812bc7e98c42888dfb1c07da0.jpg 94974f937d4c2433 Paper Bag Victoria Secret 666 ./data/train_images/0000a68812bc7e98c42888dfb1c07da0.jpg
train_3386243561 00039780dfc94d01db8676fe789ecd05.jpg af3f9460c2838f0f Double Tape 3M VHB 12 mm x 4,5 m ORIGINAL / DOUBLE FOAM TAPE 7572 ./data/train_images/00039780dfc94d01db8676fe789ecd05.jpg
train_2288590299 000a190fdd715a2a36faed16e2c65df7.jpg b94cb00ed3e50f78 Maling TTS Canned Pork Luncheon Meat 397 gr 6172 ./data/train_images/000a190fdd715a2a36faed16e2c65df7.jpg
train_2406599165 00117e4fc239b1b641ff08340b429633.jpg 8514fc58eafea283 Daster Batik Lengan pendek - Motif Acak / Campur - Leher Kancing (DPT001-00) Batik karakter Alhadi 10509 ./data/train_images/00117e4fc239b1b641ff08340b429633.jpg
train_3369186413 00136d1cf4edede0203f32f05f660588.jpg a6f319f924ad708c Nescafe \xc3\x89clair Latte 220ml 9425 ./data/train_images/00136d1cf4edede0203f32f05f660588.jpg
train_2464356923 0013e7355ffc5ff8fb1ccad3e42d92fe.jpg bbd097a7870f4a50 CELANA WANITA (BB 45-84 KG)Harem wanita (bisa cod) 6836 ./data/train_images/0013e7355ffc5ff8fb1ccad3e42d92fe.jpg

Dataset Class

class ShopeeDataset(Dataset): 
  def __init__(self, root_dir, df, transforms = None): 
    self.root_dir = root_dir 
    self.df = df 
    self.transforms = transforms 
    
  def __len__(self): 
    return len(self.df) 
  
  def __getitem__(self, index): 
    img_path = self.df.iloc[index, -1] 
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    label = self.df.iloc[index, -3] 
    
    if self.transforms: 
      img = self.transforms(image = img)["image"] 
      
    return img, label 

Augmentations & Transforms

data_transforms = {
  "train" : A.Compose([A.Resize(CFG.img_size, CFG.img_size), 
                       A.HorizontalFlip(p = 0.5), 
                       A.RandomBrightnessContrast(brightness_limit = (-0.1, 0.1), 
                                                  contrast_limit = (-0.1, 0.1), 
                                                  p = 0.5), 
                       A.Normalize(mean = [0.485, 0.456, 0.406], 
                                   std = [0.229, 0.224, 0.225], 
                                   max_pixel_value = 255.0, 
                                   p = 1.0), 
                       ToTensorV2()], p = 1.), 
                       
  "valid" : A.Compose([A.Resize(CFG.img_size, CFG.img_size), 
                       A.Normalize(mean = [0.485, 0.456, 0.406], 
                                   std = [0.229, 0.224, 0.225], 
                                   max_pixel_value = 255.0, 
                                   p = 1.0), 
                       ToTensorV2()], p = 1.)
}

Training Function

use Automatic Mixed Precision to spped up training process and Gradient Accumulation to increase bach size
Refer this Discussion to know more about mixed precision training
Refer this Discussion to know more about gradient acculation

def train_model(model, criterion, optimizer, scheduler, num_epochs, dataloaders, dataset_sizes, device, fold): 
  start = time.time() 
  best_model_wts = copy.deepcopy(model.state_dict()) 
  best_loss = np.inf 
  history = defaultdict(list) 
  scaler = amp.GradScaler() 
  
  for step, eopch in enumerate(range(1, num_eopchs+1)): 
    print(f"Epoch {epoch}/{num_epochs}")
    print("-" * 10) 
    
    # Each epoch has a training and validation phase 
    for phase in ['train', 'valid']: 
      if(phase == 'train'): 
        model.train() # Set model to training mode 
      else: 
        model.eval() # Set model to evaluation mode 
        
      running_loss = 0.0 
      
      # Iterate over data 
      for inputs, labels in tqdm(dataloaders[phase]): 
        inputs = inputs.to(CFG.device) 
        labels = labels.to(CFG.device) 
        
        # forward 
        # track history if only in train 
        with torch.set_grand_enabled(phase == 'train'): 
          with amp.autocast(enabled = True): 
            outputs = model(inputs) 
            loss = criterion(outputs, labels) 
            loss = loss / CFG.n_accumulate 
            
          # backward only if in training phae 
          if phase == "train": 
            scaler.scale(loss).backward() 
            
          # optimize only if in training phase 
          if phase == "train" and (step + 1) % CFG.n_accumulate == 0: 
            scaler.step(optimzier)
            scaler.update() 
            scheduler.step() 
            
            # zero the parameter gradients 
            optimizer.zero_grad() 
            
        running_loss += loss.item() * inputs.size(0) 
        
      epoch_loss = running_loss/dataset_sizes[phase] 
      history[phase + ' loss'].append(epoch_loss) 
      
      print(f"{phase} Loss: {epoch_loss:.4f}") 
      
      # deep copy the model 
      if phase == 'valid' and epoch_loss <= best_loss: 
        best_loss = epoch_loss 
        best_model_wts = copy.deepcopy(model.state_dict()) 
        PATH = f"Fold{fold}_{est_loss}_epoch_{epoch}.bin"
        torch.save(model.state_dict(), PATH) 
        
    print()
    
  end = time.time() 
  time_elapsed = end-start 
  print("Training complete in {time_elapsed // 3600:.0f}h {(time_elabsed % 3600) // 60:.0f}m {(time_elapsed % 3600) % 60:.0f}s")
  print("Best Loss ", best_loss) 
  
  # load best model weights 
  model.load_state_dict(best_model_wts) 
  
  return model, history 
def run_fold(model, criterion, optimizer, scheduler, device, fold, num_epochs = 10): 
  
  valid_df = df_train[df_train.fold == fold] 
  train_df = df_train[df_train.fold != fold] 
  
  train_data = ShopeeDataset(TRAIN_DIR, train_df, transforms = data_transforms['train']) 
  valid_data = ShopeeDataset(TRAIN_DIR, valid_df, transforms = data_transforms['valid']) 
  
  dataset_sizes = {
    'train' : len(train_data),  
    'valid' : len(valid_data)
  }
  
  train_loader = DataLoader(dataset = train_data, 
                            batch_size = CFG.batch_size, 
                            num_worker = 4, 
                            pin_memory = True, shuffle = True)
                            
  valid_loader = DataLoader(dataset = valid_data, 
                            batch_size = CFG.batch_size, 
                            num_worker = 4, 
                            pin_memonty = True, shuffle = True) 
  
  dataloaders = {
    "train" : train_loader, 
    "valid" : valid_loader
  }
  
  model, history = train_model(model, criterion, optimizer, scheduler, num_epochs, 
                               dataloaders, dataset_sizes, device, fold) 

  return model, history 

Load Model

model = timm.create_model(CFG.model_name, pretrained = True) 
#> /home/rstudio/.local/share/r-miniconda/envs/r-reticulate/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name tf_efficientnet_b4_ns to current tf_efficientnet_b4.ns_jft_in1k.
#>   model = create_fn(
in_features = model.classifier.in_features 
model.classifier = nn.Linear(in_features, CFG.embedding_size) 

out = model(torch.randn(1, 3, CFG.img_size, CFG.img_size)) 
print(f"Embedding shape : {out.shape}") 
#> Embedding shape : torch.Size([1, 512])

model.to(CFG.device) 
#> EfficientNet(
#>   (conv_stem): Conv2dSame(3, 48, kernel_size=(3, 3), stride=(2, 2), bias=False)
#>   (bn1): BatchNormAct2d(
#>     48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>     (drop): Identity()
#>     (act): SiLU(inplace=True)
#>   )
#>   (blocks): Sequential(
#>     (0): Sequential(
#>       (0): DepthwiseSeparableConv(
#>         (conv_dw): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)
#>         (bn1): BatchNormAct2d(
#>           48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(48, 12, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(12, 48, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pw): Conv2d(48, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn2): BatchNormAct2d(
#>           24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): DepthwiseSeparableConv(
#>         (conv_dw): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)
#>         (bn1): BatchNormAct2d(
#>           24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(24, 6, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(6, 24, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pw): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn2): BatchNormAct2d(
#>           24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (1): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2dSame(144, 144, kernel_size=(3, 3), stride=(2, 2), groups=144, bias=False)
#>         (bn2): BatchNormAct2d(
#>           144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(144, 6, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(6, 144, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
#>         (bn2): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(192, 8, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(8, 192, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (2): InvertedResidual(
#>         (conv_pw): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
#>         (bn2): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(192, 8, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(8, 192, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (3): InvertedResidual(
#>         (conv_pw): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
#>         (bn2): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(192, 8, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(8, 192, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (2): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2dSame(192, 192, kernel_size=(5, 5), stride=(2, 2), groups=192, bias=False)
#>         (bn2): BatchNormAct2d(
#>           192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(192, 8, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(8, 192, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(192, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           56, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(56, 336, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(336, 336, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=336, bias=False)
#>         (bn2): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(336, 14, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(14, 336, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(336, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           56, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (2): InvertedResidual(
#>         (conv_pw): Conv2d(56, 336, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(336, 336, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=336, bias=False)
#>         (bn2): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(336, 14, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(14, 336, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(336, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           56, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (3): InvertedResidual(
#>         (conv_pw): Conv2d(56, 336, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(336, 336, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=336, bias=False)
#>         (bn2): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(336, 14, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(14, 336, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(336, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           56, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (3): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(56, 336, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2dSame(336, 336, kernel_size=(3, 3), stride=(2, 2), groups=336, bias=False)
#>         (bn2): BatchNormAct2d(
#>           336, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(336, 14, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(14, 336, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(336, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (2): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (3): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (4): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (5): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           112, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (4): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(672, 672, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=672, bias=False)
#>         (bn2): BatchNormAct2d(
#>           672, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(672, 28, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(28, 672, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (2): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (3): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (4): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (5): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (5): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2dSame(960, 960, kernel_size=(5, 5), stride=(2, 2), groups=960, bias=False)
#>         (bn2): BatchNormAct2d(
#>           960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(960, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (2): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (3): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (4): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (5): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (6): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (7): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 272, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           272, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>     (6): Sequential(
#>       (0): InvertedResidual(
#>         (conv_pw): Conv2d(272, 1632, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(1632, 1632, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1632, bias=False)
#>         (bn2): BatchNormAct2d(
#>           1632, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(1632, 68, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(68, 1632, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(1632, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>       (1): InvertedResidual(
#>         (conv_pw): Conv2d(448, 2688, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn1): BatchNormAct2d(
#>           2688, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (conv_dw): Conv2d(2688, 2688, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2688, bias=False)
#>         (bn2): BatchNormAct2d(
#>           2688, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): SiLU(inplace=True)
#>         )
#>         (aa): Identity()
#>         (se): SqueezeExcite(
#>           (conv_reduce): Conv2d(2688, 112, kernel_size=(1, 1), stride=(1, 1))
#>           (act1): SiLU(inplace=True)
#>           (conv_expand): Conv2d(112, 2688, kernel_size=(1, 1), stride=(1, 1))
#>           (gate): Sigmoid()
#>         )
#>         (conv_pwl): Conv2d(2688, 448, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>         (bn3): BatchNormAct2d(
#>           448, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>           (drop): Identity()
#>           (act): Identity()
#>         )
#>         (drop_path): Identity()
#>       )
#>     )
#>   )
#>   (conv_head): Conv2d(448, 1792, kernel_size=(1, 1), stride=(1, 1), bias=False)
#>   (bn2): BatchNormAct2d(
#>     1792, eps=0.001, momentum=0.1, affine=True, track_running_stats=True
#>     (drop): Identity()
#>     (act): SiLU(inplace=True)
#>   )
#>   (global_pool): SelectAdaptivePool2d(pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
#>   (classifier): Linear(in_features=1792, out_features=512, bias=True)
#> )

Custom Implementation

Implementation converted to Pytorch from [this](https://www.kaggle.com/dimitreoliveira/cassava-leaf-supervised-contrastive-learning) amazing notebook