GPTRec Implementation

Implementation of GPTRec
RecSys
Author

Andrew Boney

Published

February 11, 2026

GPTRec Implementation

Following up from my previous post on the GPTRec paper, I want to implement a GPTRec-style system.

The original author has existing implementations across a couple of repos: gptrec_rl and bert4rec_repro. However, these are broader in scope than I need, and are implemented in TensorFlow (boo!!!).

Instead, I’ll be implementing this using a framework I’m building: rec. This is a work in progress aiming to provide an end-to-end implementation of the kinds of recommendation systems used in industry—encompassing the whole lifecycle from data prep to deployment. While there’s still plenty to add, the baseline should now be robust enough to work at scale. I’ll likely use this framework in future posts too.

In this implementation, I’m going to focus on the GPTRec architecture - ignoring the sub-item tokenisation and Next-K prediction aspects of the paper.

Get repo

!git clone https://github.com/AndrewBoney/rec.git && cd rec && git checkout 7a6475f
fatal: destination path 'rec' already exists and is not an empty directory.
import sys
sys.path.insert(0, 'rec')

Data

First, we’ll use the rec framework to prepare the data.

Generate data

I’ll use the MovieLens 1M dataset—a classic benchmark in recommendation systems research containing 1 million ratings from 6,000 users on 4,000 movies. The rec framework includes a data preparation module that downloads and processes this dataset into a consistent format for training.

import pandas as pd

import os

from rec.data_prep.movielens import generate_movielens_1m
from rec.common.data import DataPaths

root_folder = "../../assets/movielens_rec_data"

generate_movielens_1m(output_dir = root_folder)
paths = DataPaths(
    users_path = os.path.join(root_folder, "prepared", "users.parquet"),
    items_path = os.path.join(root_folder, "prepared", "items.parquet"),
    interactions_train_path = os.path.join(root_folder, "prepared", "interactions_train.parquet"),
    interactions_val_path = os.path.join(root_folder, "prepared", "interactions_val.parquet")
)
users = pd.read_parquet(paths.users_path)
print(users.head())
  user_id   age gender age_group occupation    zip zip_prefix
0       1   1.0      F   age_1.0     occ_10  48067        480
1       2  56.0      M  age_56.0     occ_16  70072        700
2       3  25.0      M  age_25.0     occ_15  55117        551
3       4  45.0      M  age_45.0      occ_7  02460        024
4       5  25.0      M  age_25.0     occ_20  55455        554
items = pd.read_parquet(paths.items_path)
print(items.head())
  item_id                        genres                 genre_grouped  \
0       1   Animation|Children's|Comedy                         other   
1       2  Adventure|Children's|Fantasy  Adventure|Children's|Fantasy   
2       3                Comedy|Romance                Comedy|Romance   
3       4                  Comedy|Drama                  Comedy|Drama   
4       5                        Comedy                        Comedy   

                                title                    title_raw  year  \
0                    Toy Story (1995)                    Toy Story  1995   
1                      Jumanji (1995)                      Jumanji  1995   
2             Grumpier Old Men (1995)             Grumpier Old Men  1995   
3            Waiting to Exhale (1995)            Waiting to Exhale  1995   
4  Father of the Bride Part II (1995)  Father of the Bride Part II  1995   

  year_bucket  
0  year_1990s  
1  year_1990s  
2  year_1990s  
3  year_1990s  
4  year_1990s  
interactions_train = pd.read_parquet(paths.interactions_train_path)
print(interactions_train.head())
  user_id item_id  rating   timestamp
0       1    1193       5  2000-12-31
1       1     661       3  2000-12-31
2       1     914       3  2000-12-31
3       1    3408       4  2000-12-31
4       1    2355       5  2001-01-06
interactions_val = pd.read_parquet(paths.interactions_val_path)
print(interactions_val.head())
  user_id item_id  rating   timestamp
0      36    1266       5  2002-12-22
1      36    2713       1  2002-12-22
2      36     595       4  2002-12-22
3      36     247       4  2002-12-22
4      36    1295       4  2002-12-22

Define Feature Config

This defines the cols used in the dataset, and the types of features they should be converted into.

from rec.common.data import FeatureConfig

feature_config = FeatureConfig(
    user_id_col = "user_id", 
    item_id_col = "item_id",
    user_cat_cols = [],
    item_cat_cols = [], 
    interaction_user_col = "user_id",
    interaction_item_col = "item_id", 
    interaction_time_col = "timestamp"
)

Build Encoders

Before training, we need to convert raw IDs (like u_000001 or i_000042) into integer indices that can be used for embedding lookups—much like tokenization in NLP. The build_encoders function creates a CategoryEncoder for each categorical column, mapping each unique value to an integer index while reserving 0 for unknown values.

from rec.common.data import build_encoders

user_encoders, item_encoders = build_encoders(
    users_path = paths.users_path,
    items_path = paths.items_path,
    interactions_path = paths.interactions_train_path,
    feature_cfg = feature_config
)
user_encoders, item_encoders
({'user_id': <rec.common.data.CategoryEncoder at 0x76e62cbc0f80>},
 {'item_id': <rec.common.data.CategoryEncoder at 0x76e62cbc11c0>})

Build Cardinalities

We also need to define the feature cardinalities, i.e. the number of unique values for each categorical feature. This is used to determine the size of the embeddings.

from rec.common.train import build_cardinalities

user_cardinalities = build_cardinalities(user_encoders, [feature_config.user_id_col])
item_cardinalities = build_cardinalities(item_encoders, [feature_config.item_id_col])
user_cardinalities, item_cardinalities
({'user_id': 6041}, {'item_id': 3884})

In this case we only need user_id and item_id. Note that the cardinality is the number of unique values, plus one for the unknown value.

Build User / Item Map

Sequential recommendation models like GPTRec learn from the order in which users interact with items—predicting the next item based on the sequence of previous ones. To train such models, we need to transform our flat interaction table into an ordered mapping: for each user, a chronologically sorted list of item IDs.

While the rec framework includes a build_user_item_map function, this was designed for non-sequential models where interaction order doesn’t matter—it simply collects the set of items each user has interacted with. For GPTRec, we need a modified version that preserves temporal ordering by sorting on the timestamp column. I’ll likely integrate this into the framework in a future update.

from typing import Dict, List

from rec.common.io import read_parquet_batches
from rec.common.data import FeatureConfig, CategoryEncoder

def build_user_item_map_ordered(
    interactions_path: str,
    feature_cfg: FeatureConfig,
    user_encoders: Dict[str, CategoryEncoder],
    item_encoders: Dict[str, CategoryEncoder],
    chunksize: int = 200_000,
) -> Dict[int, List[int]]:
    """Build user->items map ordered by timestamp (ascending)."""
    user_to_items: Dict[int, List[tuple]] = {}  # uid -> [(timestamp, item_id), ...]
    
    for chunk in read_parquet_batches(interactions_path, chunksize):
        user_ids = user_encoders[feature_cfg.user_id_col].transform(
            chunk[feature_cfg.interaction_user_col].astype(str).tolist()
        )
        item_ids = item_encoders[feature_cfg.item_id_col].transform(
            chunk[feature_cfg.interaction_item_col].astype(str).tolist()
        )
        timestamps = chunk[feature_cfg.interaction_time_col].tolist()
        
        for uid, iid, ts in zip(user_ids, item_ids, timestamps):
            uid, iid = int(uid), int(iid)
            if uid not in user_to_items:
                user_to_items[uid] = []
            user_to_items[uid].append((ts, iid))
    
    # Sort by timestamp and extract just the item ids
    return {
        uid: [iid for _, iid in sorted(items)]
        for uid, items in user_to_items.items()
    }

train_user_item_map = build_user_item_map_ordered(
    paths.interactions_train_path,
    feature_config,
    user_encoders,
    item_encoders,
)

val_user_item_map = build_user_item_map_ordered(
    paths.interactions_val_path,
    feature_config,
    user_encoders,
    item_encoders,
)
print(train_user_item_map[36])
[1, 11, 21, 30, 32, 34, 47, 143, 171, 177, 195, 221, 230, 231, 244, 254, 314, 326, 353, 377, 438, 446, 477, 497, 521, 548, 584, 585, 586, 594, 643, 700, 771, 776, 842, 1064, 1066, 1082, 1112, 1120, 1121, 1157, 1173, 1225, 1230, 1231, 1240, 1251, 1255, 1259, 1266, 1278, 1301, 1336, 1352, 1373, 1375, 1376, 1383, 1425, 1449, 1456, 1483, 1492, 1506, 1530, 1534, 1540, 1544, 1596, 1608, 1618, 1631, 1696, 1761, 1764, 1807, 1808, 1814, 1839, 1841, 1855, 1895, 1900, 1932, 1944, 1955, 1997, 2010, 2013, 2026, 2032, 2040, 2069, 2077, 2087, 2175, 2177, 2180, 2182, 2226, 2244, 2253, 2284, 2287, 2303, 2327, 2338, 2339, 2350, 2356, 2364, 2365, 2401, 2422, 2434, 2473, 2503, 2512, 2531, 2537, 2560, 2603, 2615, 2619, 2625, 2631, 2632, 2634, 2638, 2648, 2654, 2678, 2693, 2694, 2695, 2723, 2729, 2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004, 3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233, 3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555, 3603, 3629, 3644, 3677, 3685, 3717, 3725, 3752, 3767, 3794, 3883, 6, 10, 16, 109, 110, 160, 163, 164, 227, 233, 346, 374, 451, 454, 463, 471, 524, 590, 605, 642, 725, 779, 856, 901, 913, 958, 1063, 1065, 1075, 1077, 1079, 1179, 1180, 1183, 1191, 1193, 1197, 1210, 1215, 1221, 1246, 1253, 1257, 1280, 1288, 1338, 1350, 1386, 1452, 1546, 1563, 1569, 1575, 1576, 1600, 1629, 1674, 1743, 1852, 1892, 1960, 1990, 1996, 2050, 2210, 2220, 2266, 2285, 2323, 2325, 2472, 2696, 2747, 2803, 2815, 2822, 2848, 2899, 2904, 2998, 3033, 3079, 3128, 3188, 3373, 3380, 3444, 3459, 3483, 3638, 3659, 3695, 1843, 3340, 258, 290, 365, 504, 538, 552, 589, 607, 848, 1024, 1086, 1181, 1204, 1245, 1354, 1356, 1557, 1567, 1886, 1933, 1945, 2037, 2126, 2205, 2461, 2549, 2572, 2734, 2880, 2921, 3036, 3350, 3412, 3508, 3567, 3634, 7, 1273, 197, 550, 553, 578, 1132, 1175, 1184, 1264, 1272, 1367, 1844, 2047, 2099, 2459, 2594, 2883, 2917, 3074, 3291, 3758, 2706]
print(val_user_item_map[36])
[245, 294, 592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571, 3718]

In a future iteration, I may extend this to include timestamps in the output mapping. This would enable time-based positional embeddings—encoding when interactions occurred rather than just their relative order. For now, I’ll keep things simple with index-based positional embeddings.

Build Feature Store

The FeatureStore class provides efficient feature lookup for users and items during training and inference. Rather than repeatedly encoding features on-the-fly, it pre-encodes all user and item features into tensors at initialization—storing them in memory for fast indexed access.

Key functionality: - Pre-encoded tensors: All categorical and dense features are encoded once and stored as PyTorch tensors with zero-padding at index 0 (for unknown/missing values) - Index mappings: Maintains user_index and item_index dictionaries that map encoded IDs to their row positions in the feature tensors - Batch lookups: get_user_features() and get_item_features() retrieve all features for a batch of IDs in a single operation - Item catalog access: get_all_item_features() and get_all_item_ids() provide full item catalog access—useful for scoring all items during inference

from rec.common.data import FeatureStore

fs = FeatureStore(
    user_df=users,
    item_df=items,
    user_encoders=user_encoders,
    item_encoders=item_encoders,
    feature_cfg = feature_config
)

print(dir(fs))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'feature_cfg', 'get_all_item_features', 'get_all_item_ids', 'get_item_features', 'get_user_features', 'item_encoders', 'item_features', 'item_id_tensor', 'item_index', 'map_item_ids_to_indices', 'user_encoders', 'user_features', 'user_index']

Build Dataset

For this I want a dataset that generates a padded sequence of items for each user.

First, work out a good max sequence length based on the distribution in the data

import numpy as np

lens = {k : len(v) for k, v in train_user_item_map.items()}

print("Max len:", max(lens.values()))
print("Avg len:", np.mean(list(lens.values())))
print("Std len:", np.std(list(lens.values())))
Max len: 2314
Avg len: 163.83573439311144
Std len: 190.44394716254214
max_len = 200

print(f"Pct > {max_len}:", round(len([None for l in lens.values() if l > max_len]) / len(lens) * 100, 4) , "%")
Pct > 200: 25.7824 %

With a max sequence length of 200, we capture the full history for ~75% of users. While this loses some information it allows us to work in a compute limited environment.

Now let’s build the datasets. For sequential recommendation, we need two different dataset types:

  1. Training dataset: Uses a sliding window approach where, given a sequence of items [A, B, C, D], the model learns to predict each next item from the preceding context: A→B, [A,B]→C, [A,B,C]→D. This is implemented by shifting input and labels by one position.

  2. Evaluation dataset: Uses the full training history as context and held-out validation items as targets. This mirrors the real inference scenario: given everything we know about a user’s past behavior, can we predict what they’ll interact with next?

Both datasets use left-padding (padding at the start of sequences) so the most recent item is always at the same position—this works naturally with causal attention where we predict the next token based on previous ones.

import torch

from torch.utils.data import Dataset, DataLoader

class SequentialTrainDataset(Dataset):
    """Training dataset: generates sequences for next-item prediction."""
    PAD_TOKEN = 0
    
    def __init__(
        self,
        user_item_map: Dict[int, List[int]],
        max_length: int = 50,
        min_length: int = 2,
    ) -> None:
        super().__init__()
        self.max_length = max_length
        self.user_item_map = user_item_map
        self.user_ids = [
            uid for uid, items in user_item_map.items() 
            if len(items) >= min_length
        ]
    
    def __len__(self) -> int:
        return len(self.user_ids)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        user_id = self.user_ids[idx]
        items = self.user_item_map[user_id]
        
        # Need max_length + 1 items to get max_length input/target pairs
        if len(items) > self.max_length + 1:
            items = items[-(self.max_length + 1):]
        
        # input/target shifted by 1
        input_items = items[:-1]
        labels = items[1:]
        actual_len = len(input_items)
        
        # Left-pad to max_length
        pad_len = self.max_length - actual_len
        input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        label_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        input_seq[pad_len:] = input_items
        label_seq[pad_len:] = labels
        
        attention_mask = np.zeros(self.max_length, dtype=np.float32)
        attention_mask[pad_len:] = 1.0
        
        return {
            "user_id": torch.tensor(user_id, dtype=torch.long),
            "input_ids": torch.from_numpy(input_seq),
            "labels": torch.from_numpy(label_seq),
            "attention_mask": torch.from_numpy(attention_mask),
            "seq_length": torch.tensor(actual_len, dtype=torch.long),
        }


class SequentialEvalDataset(Dataset):
    """
    Eval dataset for retrieval metrics.
    
    Returns user's training history as context, and val items as targets.
    Compatible with evaluate_retrieval pattern - model produces scores,
    we compare top-k against val items.
    """
    PAD_TOKEN = 0
    
    def __init__(
        self,
        train_user_item_map: Dict[int, List[int]],
        val_user_item_map: Dict[int, List[int]],
        max_length: int = 50,
    ) -> None:
        super().__init__()
        self.max_length = max_length
        self.train_map = train_user_item_map
        self.val_map = val_user_item_map
        
        # Users with val items AND some training history
        self.user_ids = [
            uid for uid in val_user_item_map
            if len(val_user_item_map[uid]) >= 1 and uid in train_user_item_map
        ]
    
    def __len__(self) -> int:
        return len(self.user_ids)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        user_id = self.user_ids[idx]
        context_items = self.train_map.get(user_id, [])
        target_items = self.val_map[user_id]
        
        # Truncate context to max_length (keep most recent)
        if len(context_items) > self.max_length:
            context_items = context_items[-self.max_length:]

        actual_len = len(context_items)
        pad_len = self.max_length - actual_len
        
        input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        input_seq[pad_len:] = context_items
        
        attention_mask = np.zeros(self.max_length, dtype=np.float32)
        attention_mask[pad_len:] = 1.0
        
        return {
            "user_id": torch.tensor(user_id, dtype=torch.long),
            "input_ids": torch.from_numpy(input_seq),
            "attention_mask": torch.from_numpy(attention_mask),
            "seq_length": torch.tensor(actual_len, dtype=torch.long),
            # Targets for metric computation (variable length)
            "target_items": torch.tensor(target_items, dtype=torch.long),
        }

def collate_eval_batches(batch):
    return {
        "user_id" : torch.stack([ex["user_id"] for ex in batch]),
        "input_ids": torch.stack([ex["input_ids"] for ex in batch]),
        "attention_mask": torch.stack([ex["attention_mask"] for ex in batch]),
        "seq_length": torch.stack([ex["seq_length"] for ex in batch]),
        "target_items": [ex["target_items"] for ex in batch],  # keep as list of tensors for ragged
    }

A few implementation details worth noting:

  • PAD_TOKEN = 0: We reserve index 0 for padding, which aligns with the +1 offset we built into our encoders earlier
  • Minimum length filtering: Training requires at least 2 items (one for input, one for target), so we filter out users with singleton interactions
  • Variable-length targets: The eval dataset keeps targets as a list of tensors rather than padding them, since different users have different numbers of validation interactions. The custom collate_eval_batches function handles this ragged structure.
# Training
train_dataset = SequentialTrainDataset(train_user_item_map, max_length=max_len, min_length=2)

# Evaluation - pass train history as context
val_dataset = SequentialEvalDataset(
    train_user_item_map,    
    val_user_item_map, 
    max_length=max_len
)
# Check a train sample
train_dataset[0]
{'user_id': tensor(1),
 'input_ids': tensor([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,  149,  258,  528,  591,  605,  656,  712,  903,
          908,  927, 1010, 1016, 1017, 1023, 1082, 1177, 1180, 1190, 1227, 1251,
         1268, 1673, 1769, 1893, 1894, 1950, 1960, 2253, 2272, 2330, 2624, 2694,
         2723, 2729, 2736, 2850, 3037, 3046, 3118, 3340,    1,   48,  524,  585,
          592,  736,  774, 1507, 1527, 1839, 2226, 2287]),
 'labels': tensor([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,  258,  528,  591,  605,  656,  712,  903,  908,
          927, 1010, 1016, 1017, 1023, 1082, 1177, 1180, 1190, 1227, 1251, 1268,
         1673, 1769, 1893, 1894, 1950, 1960, 2253, 2272, 2330, 2624, 2694, 2723,
         2729, 2736, 2850, 3037, 3046, 3118, 3340,    1,   48,  524,  585,  592,
          736,  774, 1507, 1527, 1839, 2226, 2287, 2619]),
 'attention_mask': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1.]),
 'seq_length': tensor(52)}
# Check a val sample
val_dataset[0]
{'user_id': tensor(36),
 'input_ids': tensor([2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004,
         3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233,
         3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555,
         3603, 3629, 3644, 3677, 3685, 3717, 3725, 3752, 3767, 3794, 3883,    6,
           10,   16,  109,  110,  160,  163,  164,  227,  233,  346,  374,  451,
          454,  463,  471,  524,  590,  605,  642,  725,  779,  856,  901,  913,
          958, 1063, 1065, 1075, 1077, 1079, 1179, 1180, 1183, 1191, 1193, 1197,
         1210, 1215, 1221, 1246, 1253, 1257, 1280, 1288, 1338, 1350, 1386, 1452,
         1546, 1563, 1569, 1575, 1576, 1600, 1629, 1674, 1743, 1852, 1892, 1960,
         1990, 1996, 2050, 2210, 2220, 2266, 2285, 2323, 2325, 2472, 2696, 2747,
         2803, 2815, 2822, 2848, 2899, 2904, 2998, 3033, 3079, 3128, 3188, 3373,
         3380, 3444, 3459, 3483, 3638, 3659, 3695, 1843, 3340,  258,  290,  365,
          504,  538,  552,  589,  607,  848, 1024, 1086, 1181, 1204, 1245, 1354,
         1356, 1557, 1567, 1886, 1933, 1945, 2037, 2126, 2205, 2461, 2549, 2572,
         2734, 2880, 2921, 3036, 3350, 3412, 3508, 3567, 3634,    7, 1273,  197,
          550,  553,  578, 1132, 1175, 1184, 1264, 1272, 1367, 1844, 2047, 2099,
         2459, 2594, 2883, 2917, 3074, 3291, 3758, 2706]),
 'attention_mask': tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
         1., 1.]),
 'seq_length': tensor(200),
 'target_items': tensor([ 245,  294,  592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571,
         3718])}
batch_size = 16

train_dl = DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
val_dl = DataLoader(val_dataset, batch_size = batch_size, collate_fn = collate_eval_batches)
train_batch = next(iter(train_dl))
train_batch
{'user_id': tensor([2993, 2311, 5298, 2819, 2634,  144,  230, 4548, 4676,  754, 5792, 3857,
         2707,    3, 2189,   87]),
 'input_ids': tensor([[   0,    0,    0,  ..., 3687, 3741, 3772],
         [   0,    0,    0,  ..., 3257, 3340, 3382],
         [   0,    0,    0,  ..., 3570, 3633, 3634],
         ...,
         [   0,    0,    0,  ..., 3484, 3551, 3603],
         [   0,    0,    0,  ..., 3547, 3686, 3695],
         [   0,    0,    0,  ..., 3604, 3725, 1198]]),
 'labels': tensor([[   0,    0,    0,  ..., 3741, 3772, 3800],
         [   0,    0,    0,  ..., 3340, 3382, 3442],
         [   0,    0,    0,  ..., 3633, 3634, 3635],
         ...,
         [   0,    0,    0,  ..., 3551, 3603, 3799],
         [   0,    0,    0,  ..., 3686, 3695, 3725],
         [   0,    0,    0,  ..., 3725, 1198, 1569]]),
 'attention_mask': tensor([[0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         ...,
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.]]),
 'seq_length': tensor([ 82,  22,  58,  19, 200,  31, 178,  20,  32, 156, 200,  53,  22,  50,
          34,  58])}
val_batch = next(iter(val_dl))
val_batch
{'user_id': tensor([ 36,  59,  65, 102, 131, 146, 157, 164, 169, 184, 192, 193, 195, 229,
         231, 237]),
 'input_ids': tensor([[2790, 2847, 2850,  ..., 3291, 3758, 2706],
         [   0,    0,    0,  ..., 3743, 3747, 3841],
         [   0,    0,    0,  ..., 1193, 2560, 3752],
         ...,
         [   0,    0,    0,  ..., 3687, 3725, 3847],
         [   0,    0,    0,  ..., 2848, 2917, 3459],
         [   0,    0,    0,  ..., 1637,  354, 2041]]),
 'attention_mask': tensor([[1., 1., 1.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         ...,
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.]]),
 'seq_length': tensor([200,  93, 119,  32, 200, 200, 200,  25, 200,  28, 200, 177, 200,  83,
          48, 161]),
 'target_items': [tensor([ 245,  294,  592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571,
          3718]),
  tensor([  17,   25,   32,   58,  109,  198,  222,  300,  312,  374,  506,  512,
           538,  587,  605,  741,  888,  889,  890,  892,  894,  903,  919,  927,
           932,  934,  942,  948,  954, 1016, 1023, 1053, 1066, 1073, 1082, 1157,
          1160, 1167, 1182, 1184, 1203, 1209, 1213, 1216, 1222, 1225, 1233, 1247,
          1248, 1250, 1263, 1264, 1265, 1269, 1277, 1284, 1285, 1337, 1360, 1398,
          1540, 1598, 1608, 1867, 1883, 1891, 1892, 1952, 1960, 1997, 1999, 2002,
          2079, 2091, 2175, 2223, 2232, 2268, 2284, 2301, 2328, 2338, 2464, 2497,
          2503, 2576, 2589, 2592, 2602, 2618, 2644, 2648, 2663, 2694, 2851, 2873,
          2875, 2929, 2948, 2950, 2993, 3089, 3117, 3156, 3178, 3192, 3239, 3291,
          3295, 3340, 3346, 3367, 3403, 3477, 3481, 3587, 3616, 3617, 3802, 3842]),
  tensor([2126,  942]),
  tensor([  10,   24,  552,  592,  841,  912,  920,  922, 1080, 1082, 1164, 1247,
          1479, 1674, 1986, 2002, 2265, 2268, 2269, 2287, 2301, 2322, 2327, 2328,
          2366, 2417, 2434, 2473, 2507, 2512, 2531, 2615, 2618, 2624, 2632, 2646,
          2654, 2655, 2656, 2693, 2695, 2773, 2790, 2795, 2840, 2858, 2874, 2879,
          2881, 2891, 2907, 2919, 2922, 2929, 2971, 3089, 3092, 3108, 3151, 3480,
          3512, 3567, 3686, 3718, 3723, 3730, 3794, 3883,  741,  895,  902,  913,
           917,  919,  932,  940,  942, 1069, 1190, 1202, 1250, 1884, 2138, 2330,
          2867, 3367, 3400,  294, 1177, 1232, 3645,  202,  911,  915, 1943, 1944,
          1972, 2144, 2809, 3031]),
  tensor([888]),
  tensor([  10,  291,  887,  939,  967, 1237, 1337, 1998, 2298, 2428, 2544, 2852,
          2879, 2922, 3367, 3571, 3784,  924, 3536, 1164, 1229, 1324, 3224]),
  tensor([ 208,  294,  524,  901, 1075, 1187, 1191, 1200, 1204, 1205, 1244, 1280,
          1386, 1876, 2876, 2994, 3128, 3274, 3451, 3586, 3665, 3685, 3743, 1951,
          2126]),
  tensor([3847]),
  tensor([2869, 1066]),
  tensor([   1,   10,   34,  260,  294,  315,  361,  548,  585,  591,  592,  593,
           908,  912,  922,  975, 1010, 1016, 1020, 1059, 1082, 1164, 1208, 1247,
          1355, 1360, 1527, 1674, 1780, 1839, 1851, 1950, 1986, 2010, 2012, 2013,
          2015, 2017, 2019, 2022, 2028, 2038, 2070, 2072, 2226, 2287, 2619, 2874,
          2879, 2881, 2922, 3328, 3567, 3571, 3683, 3723]),
  tensor([3880]),
  tensor([  39,  233,  331,  354,  374,  454,  477,  590,  941,  944, 1019, 1059,
          1112, 1147, 1157, 1160, 1180, 1181, 1182, 1187, 1207, 1208, 1215, 1245,
          1274, 1354, 1374, 1697, 1817, 1844, 1846, 1889, 1952, 2026, 2031, 2082,
          2140, 2298, 2328, 2503, 2648, 2655, 2723, 2803, 2844, 2869, 2900, 2905,
          2919, 2932, 2951, 3004, 3020, 3028, 3085, 3199, 3266, 3293, 3328, 3338,
          3480, 3508, 3635, 3702, 3858]),
  tensor([2492, 1212, 3124, 1219,  389, 2517, 2476, 3662, 1737,    6, 1341, 2870,
          1996, 2444, 3318, 2562, 2840, 2929, 3024, 3066, 3259, 2272, 3882, 3403,
          1936,  899, 3027, 3020, 3022, 1885, 2994, 2045, 3436,  172,  931, 2420,
          3525, 1816,  889, 1183, 2106, 2228, 2229, 2305,  939, 3618, 1075, 3590,
          1881, 2519, 1277,  628, 3678, 2740, 1235, 1430,  407,  465, 1167, 1282,
          1655, 2680, 2723, 2724, 2939, 3696,  326, 1336, 1354, 1355, 2179]),
  tensor([  26,   34,   62,   86,  109,  160,  171,  174,  258,  263,  316,  452,
           586,  590,  592,  605,  642,  716,  757,  838, 1025, 1046, 1082, 1179,
          1193, 1207, 1213, 1263, 1371, 1373, 1394, 1417, 1459, 1476, 1549, 1594,
          1664, 1681, 1687, 1696, 1740, 1743, 1752, 1766, 1816, 1829, 1841, 1850,
          1934, 1944, 1957, 1991, 1994, 2086, 2163, 2229, 2233, 2237, 2248, 2360,
          2363, 2365, 2374, 2379, 2436, 2473, 2522, 2560, 2589, 2608, 2620, 2644,
          2650, 2678, 2698, 2703, 2822, 2982, 3039, 3047, 3079, 3080, 3092, 3105,
          3106, 3108, 3109, 3110, 3118, 3121, 3187, 3217, 3230, 3249, 3258, 3292,
          3357, 3384, 3385, 3442, 3445, 3468, 3486, 3487, 3495, 3510, 3526, 3530,
          3549, 3649, 3655, 3676, 3677, 3717, 3718, 3756, 3762, 3782, 3788, 3794,
          3813, 3828, 3841, 3846, 3874, 3879, 3880, 2771, 3021]),
  tensor([ 258,  538,  586, 1074, 1183, 1208, 1239, 1240, 1354, 1608, 2125, 2531,
          2596, 2648, 2649, 2929, 3107, 3804, 1112, 1024, 1179, 1181, 1203, 1227,
          1267, 1288, 1629, 1932, 2042, 2076, 2126, 2180, 2729, 2850, 3293, 3329,
          3437,    1,  736, 2919, 3046, 1906, 1907, 1908, 1909, 1910, 1911, 1912,
          1913]),
  tensor([1000, 3618, 2386, 2580, 2013])]}
val_batch["input_ids"].shape
torch.Size([16, 200])
train_batch["input_ids"].shape
torch.Size([16, 200])

Model

Model Architecture

At its core, GPTRec applies the same autoregressive language modeling approach that powers GPT to sequential recommendation. Just as GPT predicts the next word given previous words, GPTRec predicts the next item given a user’s interaction history.

The architecture follows a familiar transformer pattern:

  1. Item Embeddings: Each item gets a learned embedding vector. We also add positional embeddings so the model knows where each item appears in the sequence.

  2. Causal Transformer: The key ingredient. Unlike BERT-style models that can look at the full sequence bidirectionally, we use causal (autoregressive) masking—each position can only attend to earlier positions. This matches our inference scenario: predict what comes next based only on what we’ve seen so far.

  3. Weight Tying: The output projection layer shares weights with the item embedding layer. This is a common trick in language models that reduces parameters and often improves performance—the intuition being that the “meaning” of an item should be consistent whether we’re encoding it as input or predicting it as output.

The compute_loss method implements standard cross-entropy loss with ignore_index=0 to skip padding tokens—we only want to learn from real predictions, not from predicting padding.

One notable simplification: unlike the full GPTRec paper which explores SVD-based embedding initialization and various training optimizations, I’m using standard randomly-initialized embeddings here. For a small dataset like this, it should work fine.

import torch
import torch.nn as nn
import math

class GPTRecModel(nn.Module):
    """
    GPTRec: GPT-style autoregressive transformer for sequential recommendation.
    
    Uses causal masking so each position only attends to previous positions,
    enabling next-item prediction.
    """
    
    def __init__(
        self,
        n_items: int,
        d_model: int = 64,
        n_heads: int = 2,
        n_layers: int = 2,
        d_ff: int = 256,
        max_seq_len: int = 50,
        dropout: float = 0.1,
        pad_token: int = 0,
    ):
        super().__init__()
        self.pad_token = pad_token
        self.d_model = d_model
        
        # Item embedding (+1 for padding token at index 0)
        self.item_embedding = nn.Embedding(n_items + 1, d_model, padding_idx=pad_token)
        self.pos_embedding = nn.Embedding(max_seq_len, d_model)
        
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(d_model)
        
        # Transformer encoder with causal masking
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=n_heads,
            dim_feedforward=d_ff,
            dropout=dropout,
            activation='gelu',
            batch_first=True,
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
        
        # Output projection to item scores
        self.output_layer = nn.Linear(d_model, n_items + 1)
        self.output_layer.weight = self.item_embedding.weight  # tie weights

    def _generate_causal_mask(self, seq_len: int, device: torch.device) -> torch.Tensor:
        """Generate causal mask: positions can only attend to earlier positions."""
        mask = torch.triu(torch.ones(seq_len, seq_len, device=device), diagonal=1)
        mask = mask.masked_fill(mask == 1, float('-inf'))
        return mask
    
    def forward(
        self,
        input_ids: torch.Tensor,           # (batch, seq_len)
        attention_mask: torch.Tensor = None,  # (batch, seq_len) - 1 for real, 0 for pad
    ) -> torch.Tensor:
        batch_size, seq_len = input_ids.shape
        device = input_ids.device
        
        # Embeddings
        positions = torch.arange(seq_len, device=device).unsqueeze(0).expand(batch_size, -1)
        x = self.item_embedding(input_ids) + self.pos_embedding(positions)
        x = self.dropout(self.layer_norm(x))
        
        # Causal mask
        causal_mask = self._generate_causal_mask(seq_len, device)
        
        # Padding mask (convert to float: 0.0 = attend, -inf = ignore)
        src_key_padding_mask = torch.where(attention_mask == 1, 0.0, float('-inf'))

        # Transformer
        x = self.transformer(
            x,
            mask=causal_mask,
            src_key_padding_mask=src_key_padding_mask,
        )
        
        # Project to item logits
        logits = self.output_layer(x)  # (batch, seq_len, n_items+1)
        return logits
    
    def compute_loss(self, batch, ignore_index=0):
        """Cross-entropy loss for next-item prediction, ignoring padding."""
        logits = self(batch['input_ids'], batch['attention_mask'])  # (B, seq_len, n_items+1)
        
        # Reshape for cross-entropy: (B*seq_len, n_items+1) vs (B*seq_len,)
        logits_flat = logits.view(-1, logits.size(-1))
        labels_flat = batch['labels'].view(-1)
        
        loss = nn.functional.cross_entropy(
            logits_flat, 
            labels_flat, 
            ignore_index=ignore_index  # Ignore padding positions
        )
        return loss
n_items = item_cardinalities['item_id']  # 201

model = GPTRecModel(
    n_items=n_items,
    d_model=64,
    n_heads=2,
    n_layers=2,
    d_ff=256,
    max_seq_len=max_len,
    dropout=0.2,
)

# Test forward pass
logits = model(val_batch['input_ids'], val_batch['attention_mask'])
print("Logits shape:", logits.shape)  # (1, seq_len, n_items+1)
Logits shape: torch.Size([16, 200, 3885])
model.compute_loss(train_batch)
tensor(36.3798, grad_fn=<NllLossBackward0>)
sum([p.numel() for p in model.parameters()])
365421

Train / Evaluate

#from rec.retrieval.metrics import aggregate_retrieval_metrics
from rec.retrieval.metrics import *
from rec.retrieval.metrics import _as_list

# requires rewrite from library version to add dcg for comparison with GPTRec paper. will integrate into library in future iterations
def aggregate_retrieval_metrics(
    topk_indices: torch.Tensor,
    relevant_indices: Sequence[torch.Tensor],
    ks: Iterable[int],
) -> Dict[str, float]:
    ks_list = _as_list(ks)
    if not ks_list or topk_indices.numel() == 0:
        return {}

    max_k = max(ks_list)
    if topk_indices.size(1) < max_k:
        raise ValueError("topk_indices must have at least max(k) columns")

    totals = {f"recall@{k}": 0.0 for k in ks_list}
    totals.update({f"precision@{k}": 0.0 for k in ks_list})
    totals.update({f"dcg@{k}": 0.0 for k in ks_list})
    totals.update({f"ndcg@{k}": 0.0 for k in ks_list})
    totals["mrr"] = 0.0

    num_users = topk_indices.size(0)
    for idx in range(num_users):
        topk = topk_indices[idx]
        rel = relevant_indices[idx]
        if rel.numel() == 0:
            continue
        hits = torch.isin(topk, rel)

        totals["mrr"] += mrr(hits)
        num_rel = int(rel.numel())
        for k in ks_list:
            totals[f"recall@{k}"] += recall_at_k(hits, num_rel, k)
            totals[f"precision@{k}"] += precision_at_k(hits, k)
            dcg = dcg_at_k(hits, k)
            ideal_dcg = idcg_at_k(num_rel, k)
            ndcg = dcg / ideal_dcg if ideal_dcg > 0 else 0.0
            totals[f"dcg@{k}"] += dcg
            totals[f"ndcg@{k}"] += ndcg

    if num_users == 0:
        return {}
    return {k: v / float(num_users) for k, v in totals.items()}

def evaluate_gptrec(
    model: GPTRecModel,
    val_dataloader: DataLoader,
    train_user_item_map: Dict[int, List[int]],
    n_items: int,
    ks: List[int] = [5, 10, 20],
    device: torch.device = None,
) -> Dict[str, float]:
    """
    Evaluate GPTRec model on retrieval metrics.
    
    For each user, we:
    1. Get model's logits from the last position (next-item prediction)
    2. Mask out items the user has already seen in training
    3. Get top-k predictions
    4. Compare against validation targets
    """
    if device is None:
        device = next(model.parameters()).device
    
    model.eval()
    
    topk_indices_list = []
    relevant_indices_list = []
    max_k = max(ks)
    
    with torch.no_grad():
        for batch in tqdm(val_dataloader):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            user_ids = batch['user_id']
            target_items = batch['target_items']  # list of tensors
            
            # Forward pass
            logits = model(input_ids, attention_mask)  # (B, seq_len, n_items+1)
            
            # Get logits from last position for next-item prediction
            last_logits = logits[:, -1, :]  # (B, n_items+1)
            
            # For each user in batch
            for i in range(len(user_ids)):
                uid = user_ids[i].item()
                scores = last_logits[i].clone()  # (n_items+1,)
                
                # Mask out seen items (set to -inf)
                seen_items = train_user_item_map.get(uid, [])
                if seen_items:
                    seen_tensor = torch.tensor(seen_items, device=device)
                    scores[seen_tensor] = float('-inf')
                
                # Also mask out padding token (index 0)
                scores[0] = float('-inf')
                
                # Get top-k predictions
                topk = torch.topk(scores, min(max_k, n_items)).indices
                topk_indices_list.append(topk.cpu())
                
                # Target items for this user
                relevant_indices_list.append(target_items[i])
    
    if not topk_indices_list:
        return {}
    
    topk_tensor = torch.stack(topk_indices_list, dim=0)
    metrics = aggregate_retrieval_metrics(topk_tensor, relevant_indices_list, ks)
    return metrics
from tqdm import tqdm

num_epochs = 5
batch_size = 32
lr = 5e-4

optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

for epoch in range(num_epochs):
    model.train()
    train_losses = []
    
    # Training loop
    pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}")
    for batch in pbar:
        optimizer.zero_grad()
        loss = model.compute_loss(batch)
        loss.backward()
        optimizer.step()
        
        train_losses.append(loss.item())
        pbar.set_postfix(loss=f"{loss.item():.2f}")
    
    avg_train_loss = sum(train_losses) / len(train_losses)
    
    # Retrieval metrics (with progress bar)
    metrics = evaluate_gptrec(
        model, val_dl, train_user_item_map,
        n_items=n_items, ks=[5, 10, 20],
    )
    
    # Summary line
    print(f"\nEpoch {epoch+1}: train_loss={avg_train_loss:.4f} | R@10={metrics.get('recall@10', 0):.4f}",  
        f"| DCG@10={metrics.get('dcg@10', 0):.4f} | NDCG@10={metrics.get('ndcg@10', 0):.4f}")
Epoch 1/5:   0%|          | 0/189 [00:00<?, ?it/s]Epoch 1/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=36.64]Epoch 1/5:   1%|          | 1/189 [00:01<05:34,  1.78s/it, loss=36.64]Epoch 1/5:   1%|          | 1/189 [00:03<05:34,  1.78s/it, loss=36.19]Epoch 1/5:   1%|          | 2/189 [00:03<05:31,  1.77s/it, loss=36.19]Epoch 1/5:   1%|          | 2/189 [00:05<05:31,  1.77s/it, loss=36.06]Epoch 1/5:   2%|▏         | 3/189 [00:05<05:12,  1.68s/it, loss=36.06]Epoch 1/5:   2%|▏         | 3/189 [00:06<05:12,  1.68s/it, loss=35.71]Epoch 1/5:   2%|▏         | 4/189 [00:06<05:08,  1.67s/it, loss=35.71]Epoch 1/5:   2%|▏         | 4/189 [00:08<05:08,  1.67s/it, loss=35.56]Epoch 1/5:   3%|▎         | 5/189 [00:08<05:09,  1.68s/it, loss=35.56]Epoch 1/5:   3%|▎         | 5/189 [00:10<05:09,  1.68s/it, loss=35.01]Epoch 1/5:   3%|▎         | 6/189 [00:10<05:01,  1.65s/it, loss=35.01]Epoch 1/5:   3%|▎         | 6/189 [00:11<05:01,  1.65s/it, loss=34.56]Epoch 1/5:   4%|▎         | 7/189 [00:11<04:55,  1.62s/it, loss=34.56]Epoch 1/5:   4%|▎         | 7/189 [00:13<04:55,  1.62s/it, loss=34.11]Epoch 1/5:   4%|▍         | 8/189 [00:13<04:46,  1.58s/it, loss=34.11]Epoch 1/5:   4%|▍         | 8/189 [00:14<04:46,  1.58s/it, loss=33.33]Epoch 1/5:   5%|▍         | 9/189 [00:14<04:42,  1.57s/it, loss=33.33]Epoch 1/5:   5%|▍         | 9/189 [00:16<04:42,  1.57s/it, loss=32.68]Epoch 1/5:   5%|▌         | 10/189 [00:16<04:46,  1.60s/it, loss=32.68]Epoch 1/5:   5%|▌         | 10/189 [00:17<04:46,  1.60s/it, loss=32.68]Epoch 1/5:   6%|▌         | 11/189 [00:17<04:37,  1.56s/it, loss=32.68]Epoch 1/5:   6%|▌         | 11/189 [00:19<04:37,  1.56s/it, loss=31.85]Epoch 1/5:   6%|▋         | 12/189 [00:19<04:39,  1.58s/it, loss=31.85]Epoch 1/5:   6%|▋         | 12/189 [00:21<04:39,  1.58s/it, loss=31.54]Epoch 1/5:   7%|▋         | 13/189 [00:21<04:39,  1.59s/it, loss=31.54]Epoch 1/5:   7%|▋         | 13/189 [00:22<04:39,  1.59s/it, loss=30.80]Epoch 1/5:   7%|▋         | 14/189 [00:22<04:34,  1.57s/it, loss=30.80]Epoch 1/5:   7%|▋         | 14/189 [00:24<04:34,  1.57s/it, loss=30.32]Epoch 1/5:   8%|▊         | 15/189 [00:24<04:36,  1.59s/it, loss=30.32]Epoch 1/5:   8%|▊         | 15/189 [00:25<04:36,  1.59s/it, loss=29.31]Epoch 1/5:   8%|▊         | 16/189 [00:25<04:25,  1.54s/it, loss=29.31]Epoch 1/5:   8%|▊         | 16/189 [00:27<04:25,  1.54s/it, loss=28.87]Epoch 1/5:   9%|▉         | 17/189 [00:27<04:30,  1.58s/it, loss=28.87]Epoch 1/5:   9%|▉         | 17/189 [00:28<04:30,  1.58s/it, loss=28.11]Epoch 1/5:  10%|▉         | 18/189 [00:28<04:31,  1.59s/it, loss=28.11]Epoch 1/5:  10%|▉         | 18/189 [00:30<04:31,  1.59s/it, loss=27.85]Epoch 1/5:  10%|█         | 19/189 [00:30<04:32,  1.60s/it, loss=27.85]Epoch 1/5:  10%|█         | 19/189 [00:32<04:32,  1.60s/it, loss=27.22]Epoch 1/5:  11%|█         | 20/189 [00:32<04:33,  1.62s/it, loss=27.22]Epoch 1/5:  11%|█         | 20/189 [00:33<04:33,  1.62s/it, loss=27.05]Epoch 1/5:  11%|█         | 21/189 [00:33<04:31,  1.61s/it, loss=27.05]Epoch 1/5:  11%|█         | 21/189 [00:35<04:31,  1.61s/it, loss=26.80]Epoch 1/5:  12%|█▏        | 22/189 [00:35<04:37,  1.66s/it, loss=26.80]Epoch 1/5:  12%|█▏        | 22/189 [00:37<04:37,  1.66s/it, loss=26.49]Epoch 1/5:  12%|█▏        | 23/189 [00:37<04:35,  1.66s/it, loss=26.49]Epoch 1/5:  12%|█▏        | 23/189 [00:38<04:35,  1.66s/it, loss=25.91]Epoch 1/5:  13%|█▎        | 24/189 [00:38<04:36,  1.67s/it, loss=25.91]Epoch 1/5:  13%|█▎        | 24/189 [00:40<04:36,  1.67s/it, loss=25.62]Epoch 1/5:  13%|█▎        | 25/189 [00:40<04:28,  1.64s/it, loss=25.62]Epoch 1/5:  13%|█▎        | 25/189 [00:42<04:28,  1.64s/it, loss=25.39]Epoch 1/5:  14%|█▍        | 26/189 [00:42<04:25,  1.63s/it, loss=25.39]Epoch 1/5:  14%|█▍        | 26/189 [00:43<04:25,  1.63s/it, loss=24.72]Epoch 1/5:  14%|█▍        | 27/189 [00:43<04:26,  1.65s/it, loss=24.72]Epoch 1/5:  14%|█▍        | 27/189 [00:45<04:26,  1.65s/it, loss=24.38]Epoch 1/5:  15%|█▍        | 28/189 [00:45<04:22,  1.63s/it, loss=24.38]Epoch 1/5:  15%|█▍        | 28/189 [00:47<04:22,  1.63s/it, loss=24.32]Epoch 1/5:  15%|█▌        | 29/189 [00:47<04:22,  1.64s/it, loss=24.32]Epoch 1/5:  15%|█▌        | 29/189 [00:48<04:22,  1.64s/it, loss=24.00]Epoch 1/5:  16%|█▌        | 30/189 [00:48<04:18,  1.63s/it, loss=24.00]Epoch 1/5:  16%|█▌        | 30/189 [00:50<04:18,  1.63s/it, loss=23.63]Epoch 1/5:  16%|█▋        | 31/189 [00:50<04:25,  1.68s/it, loss=23.63]Epoch 1/5:  16%|█▋        | 31/189 [00:51<04:25,  1.68s/it, loss=23.35]Epoch 1/5:  17%|█▋        | 32/189 [00:51<04:18,  1.64s/it, loss=23.35]Epoch 1/5:  17%|█▋        | 32/189 [00:53<04:18,  1.64s/it, loss=23.33]Epoch 1/5:  17%|█▋        | 33/189 [00:53<04:17,  1.65s/it, loss=23.33]Epoch 1/5:  17%|█▋        | 33/189 [00:55<04:17,  1.65s/it, loss=23.14]Epoch 1/5:  18%|█▊        | 34/189 [00:55<04:16,  1.66s/it, loss=23.14]Epoch 1/5:  18%|█▊        | 34/189 [00:56<04:16,  1.66s/it, loss=23.01]Epoch 1/5:  19%|█▊        | 35/189 [00:56<04:14,  1.65s/it, loss=23.01]Epoch 1/5:  19%|█▊        | 35/189 [00:58<04:14,  1.65s/it, loss=22.90]Epoch 1/5:  19%|█▉        | 36/189 [00:58<04:09,  1.63s/it, loss=22.90]Epoch 1/5:  19%|█▉        | 36/189 [01:00<04:09,  1.63s/it, loss=22.78]Epoch 1/5:  20%|█▉        | 37/189 [01:00<04:14,  1.67s/it, loss=22.78]Epoch 1/5:  20%|█▉        | 37/189 [01:02<04:14,  1.67s/it, loss=22.71]Epoch 1/5:  20%|██        | 38/189 [01:02<04:14,  1.69s/it, loss=22.71]Epoch 1/5:  20%|██        | 38/189 [01:03<04:14,  1.69s/it, loss=22.54]Epoch 1/5:  21%|██        | 39/189 [01:03<04:14,  1.70s/it, loss=22.54]Epoch 1/5:  21%|██        | 39/189 [01:05<04:14,  1.70s/it, loss=22.40]Epoch 1/5:  21%|██        | 40/189 [01:05<04:15,  1.71s/it, loss=22.40]Epoch 1/5:  21%|██        | 40/189 [01:07<04:15,  1.71s/it, loss=22.37]Epoch 1/5:  22%|██▏       | 41/189 [01:07<04:18,  1.75s/it, loss=22.37]Epoch 1/5:  22%|██▏       | 41/189 [01:09<04:18,  1.75s/it, loss=22.06]Epoch 1/5:  22%|██▏       | 42/189 [01:09<04:13,  1.73s/it, loss=22.06]Epoch 1/5:  22%|██▏       | 42/189 [01:10<04:13,  1.73s/it, loss=22.00]Epoch 1/5:  23%|██▎       | 43/189 [01:10<04:10,  1.72s/it, loss=22.00]Epoch 1/5:  23%|██▎       | 43/189 [01:12<04:10,  1.72s/it, loss=22.13]Epoch 1/5:  23%|██▎       | 44/189 [01:12<04:05,  1.70s/it, loss=22.13]Epoch 1/5:  23%|██▎       | 44/189 [01:13<04:05,  1.70s/it, loss=21.73]Epoch 1/5:  24%|██▍       | 45/189 [01:13<03:59,  1.66s/it, loss=21.73]Epoch 1/5:  24%|██▍       | 45/189 [01:15<03:59,  1.66s/it, loss=21.77]Epoch 1/5:  24%|██▍       | 46/189 [01:15<03:52,  1.63s/it, loss=21.77]Epoch 1/5:  24%|██▍       | 46/189 [01:17<03:52,  1.63s/it, loss=21.65]Epoch 1/5:  25%|██▍       | 47/189 [01:17<03:52,  1.64s/it, loss=21.65]Epoch 1/5:  25%|██▍       | 47/189 [01:18<03:52,  1.64s/it, loss=21.58]Epoch 1/5:  25%|██▌       | 48/189 [01:18<03:43,  1.58s/it, loss=21.58]Epoch 1/5:  25%|██▌       | 48/189 [01:20<03:43,  1.58s/it, loss=21.53]Epoch 1/5:  26%|██▌       | 49/189 [01:20<03:37,  1.55s/it, loss=21.53]Epoch 1/5:  26%|██▌       | 49/189 [01:21<03:37,  1.55s/it, loss=21.26]Epoch 1/5:  26%|██▋       | 50/189 [01:21<03:27,  1.50s/it, loss=21.26]Epoch 1/5:  26%|██▋       | 50/189 [01:23<03:27,  1.50s/it, loss=21.38]Epoch 1/5:  27%|██▋       | 51/189 [01:23<03:31,  1.54s/it, loss=21.38]Epoch 1/5:  27%|██▋       | 51/189 [01:24<03:31,  1.54s/it, loss=21.17]Epoch 1/5:  28%|██▊       | 52/189 [01:24<03:26,  1.51s/it, loss=21.17]Epoch 1/5:  28%|██▊       | 52/189 [01:26<03:26,  1.51s/it, loss=21.18]Epoch 1/5:  28%|██▊       | 53/189 [01:26<03:27,  1.52s/it, loss=21.18]Epoch 1/5:  28%|██▊       | 53/189 [01:27<03:27,  1.52s/it, loss=21.15]Epoch 1/5:  29%|██▊       | 54/189 [01:27<03:19,  1.48s/it, loss=21.15]Epoch 1/5:  29%|██▊       | 54/189 [01:28<03:19,  1.48s/it, loss=21.16]Epoch 1/5:  29%|██▉       | 55/189 [01:28<03:14,  1.45s/it, loss=21.16]Epoch 1/5:  29%|██▉       | 55/189 [01:30<03:14,  1.45s/it, loss=20.84]Epoch 1/5:  30%|██▉       | 56/189 [01:30<03:18,  1.50s/it, loss=20.84]Epoch 1/5:  30%|██▉       | 56/189 [01:32<03:18,  1.50s/it, loss=20.65]Epoch 1/5:  30%|███       | 57/189 [01:32<03:20,  1.52s/it, loss=20.65]Epoch 1/5:  30%|███       | 57/189 [01:33<03:20,  1.52s/it, loss=20.82]Epoch 1/5:  31%|███       | 58/189 [01:33<03:17,  1.51s/it, loss=20.82]Epoch 1/5:  31%|███       | 58/189 [01:35<03:17,  1.51s/it, loss=20.35]Epoch 1/5:  31%|███       | 59/189 [01:35<03:23,  1.56s/it, loss=20.35]Epoch 1/5:  31%|███       | 59/189 [01:36<03:23,  1.56s/it, loss=20.53]Epoch 1/5:  32%|███▏      | 60/189 [01:36<03:20,  1.55s/it, loss=20.53]Epoch 1/5:  32%|███▏      | 60/189 [01:38<03:20,  1.55s/it, loss=20.51]Epoch 1/5:  32%|███▏      | 61/189 [01:38<03:19,  1.56s/it, loss=20.51]Epoch 1/5:  32%|███▏      | 61/189 [01:39<03:19,  1.56s/it, loss=20.36]Epoch 1/5:  33%|███▎      | 62/189 [01:39<03:19,  1.57s/it, loss=20.36]Epoch 1/5:  33%|███▎      | 62/189 [01:41<03:19,  1.57s/it, loss=20.41]Epoch 1/5:  33%|███▎      | 63/189 [01:41<03:20,  1.59s/it, loss=20.41]Epoch 1/5:  33%|███▎      | 63/189 [01:43<03:20,  1.59s/it, loss=20.24]Epoch 1/5:  34%|███▍      | 64/189 [01:43<03:23,  1.63s/it, loss=20.24]Epoch 1/5:  34%|███▍      | 64/189 [01:44<03:23,  1.63s/it, loss=20.16]Epoch 1/5:  34%|███▍      | 65/189 [01:44<03:14,  1.57s/it, loss=20.16]Epoch 1/5:  34%|███▍      | 65/189 [01:46<03:14,  1.57s/it, loss=20.13]Epoch 1/5:  35%|███▍      | 66/189 [01:46<03:19,  1.63s/it, loss=20.13]Epoch 1/5:  35%|███▍      | 66/189 [01:48<03:19,  1.63s/it, loss=19.93]Epoch 1/5:  35%|███▌      | 67/189 [01:48<03:17,  1.62s/it, loss=19.93]Epoch 1/5:  35%|███▌      | 67/189 [01:49<03:17,  1.62s/it, loss=20.23]Epoch 1/5:  36%|███▌      | 68/189 [01:49<03:12,  1.59s/it, loss=20.23]Epoch 1/5:  36%|███▌      | 68/189 [01:51<03:12,  1.59s/it, loss=19.74]Epoch 1/5:  37%|███▋      | 69/189 [01:51<03:10,  1.59s/it, loss=19.74]Epoch 1/5:  37%|███▋      | 69/189 [01:52<03:10,  1.59s/it, loss=19.88]Epoch 1/5:  37%|███▋      | 70/189 [01:52<03:07,  1.58s/it, loss=19.88]Epoch 1/5:  37%|███▋      | 70/189 [01:54<03:07,  1.58s/it, loss=19.78]Epoch 1/5:  38%|███▊      | 71/189 [01:54<03:02,  1.55s/it, loss=19.78]Epoch 1/5:  38%|███▊      | 71/189 [01:55<03:02,  1.55s/it, loss=19.82]Epoch 1/5:  38%|███▊      | 72/189 [01:55<02:59,  1.53s/it, loss=19.82]Epoch 1/5:  38%|███▊      | 72/189 [01:57<02:59,  1.53s/it, loss=19.66]Epoch 1/5:  39%|███▊      | 73/189 [01:57<02:59,  1.55s/it, loss=19.66]Epoch 1/5:  39%|███▊      | 73/189 [01:58<02:59,  1.55s/it, loss=19.61]Epoch 1/5:  39%|███▉      | 74/189 [01:58<03:03,  1.60s/it, loss=19.61]Epoch 1/5:  39%|███▉      | 74/189 [02:00<03:03,  1.60s/it, loss=19.48]Epoch 1/5:  40%|███▉      | 75/189 [02:00<03:04,  1.62s/it, loss=19.48]Epoch 1/5:  40%|███▉      | 75/189 [02:02<03:04,  1.62s/it, loss=19.44]Epoch 1/5:  40%|████      | 76/189 [02:02<03:01,  1.61s/it, loss=19.44]Epoch 1/5:  40%|████      | 76/189 [02:03<03:01,  1.61s/it, loss=19.40]Epoch 1/5:  41%|████      | 77/189 [02:03<03:01,  1.62s/it, loss=19.40]Epoch 1/5:  41%|████      | 77/189 [02:05<03:01,  1.62s/it, loss=19.42]Epoch 1/5:  41%|████▏     | 78/189 [02:05<02:57,  1.60s/it, loss=19.42]Epoch 1/5:  41%|████▏     | 78/189 [02:07<02:57,  1.60s/it, loss=18.89]Epoch 1/5:  42%|████▏     | 79/189 [02:07<02:56,  1.61s/it, loss=18.89]Epoch 1/5:  42%|████▏     | 79/189 [02:08<02:56,  1.61s/it, loss=19.14]Epoch 1/5:  42%|████▏     | 80/189 [02:08<02:55,  1.61s/it, loss=19.14]Epoch 1/5:  42%|████▏     | 80/189 [02:10<02:55,  1.61s/it, loss=19.00]Epoch 1/5:  43%|████▎     | 81/189 [02:10<02:56,  1.63s/it, loss=19.00]Epoch 1/5:  43%|████▎     | 81/189 [02:12<02:56,  1.63s/it, loss=18.78]Epoch 1/5:  43%|████▎     | 82/189 [02:12<02:58,  1.67s/it, loss=18.78]Epoch 1/5:  43%|████▎     | 82/189 [02:13<02:58,  1.67s/it, loss=18.91]Epoch 1/5:  44%|████▍     | 83/189 [02:13<02:52,  1.63s/it, loss=18.91]Epoch 1/5:  44%|████▍     | 83/189 [02:15<02:52,  1.63s/it, loss=18.71]Epoch 1/5:  44%|████▍     | 84/189 [02:15<02:49,  1.62s/it, loss=18.71]Epoch 1/5:  44%|████▍     | 84/189 [02:16<02:49,  1.62s/it, loss=18.55]Epoch 1/5:  45%|████▍     | 85/189 [02:16<02:42,  1.56s/it, loss=18.55]Epoch 1/5:  45%|████▍     | 85/189 [02:18<02:42,  1.56s/it, loss=18.64]Epoch 1/5:  46%|████▌     | 86/189 [02:18<02:40,  1.56s/it, loss=18.64]Epoch 1/5:  46%|████▌     | 86/189 [02:19<02:40,  1.56s/it, loss=18.65]Epoch 1/5:  46%|████▌     | 87/189 [02:19<02:41,  1.58s/it, loss=18.65]Epoch 1/5:  46%|████▌     | 87/189 [02:21<02:41,  1.58s/it, loss=18.56]Epoch 1/5:  47%|████▋     | 88/189 [02:21<02:39,  1.58s/it, loss=18.56]Epoch 1/5:  47%|████▋     | 88/189 [02:23<02:39,  1.58s/it, loss=18.59]Epoch 1/5:  47%|████▋     | 89/189 [02:23<02:42,  1.63s/it, loss=18.59]Epoch 1/5:  47%|████▋     | 89/189 [02:24<02:42,  1.63s/it, loss=18.41]Epoch 1/5:  48%|████▊     | 90/189 [02:24<02:40,  1.62s/it, loss=18.41]Epoch 1/5:  48%|████▊     | 90/189 [02:26<02:40,  1.62s/it, loss=18.22]Epoch 1/5:  48%|████▊     | 91/189 [02:26<02:42,  1.66s/it, loss=18.22]Epoch 1/5:  48%|████▊     | 91/189 [02:28<02:42,  1.66s/it, loss=18.27]Epoch 1/5:  49%|████▊     | 92/189 [02:28<02:43,  1.68s/it, loss=18.27]Epoch 1/5:  49%|████▊     | 92/189 [02:29<02:43,  1.68s/it, loss=17.92]Epoch 1/5:  49%|████▉     | 93/189 [02:29<02:35,  1.62s/it, loss=17.92]Epoch 1/5:  49%|████▉     | 93/189 [02:31<02:35,  1.62s/it, loss=18.26]Epoch 1/5:  50%|████▉     | 94/189 [02:31<02:31,  1.60s/it, loss=18.26]Epoch 1/5:  50%|████▉     | 94/189 [02:32<02:31,  1.60s/it, loss=18.02]Epoch 1/5:  50%|█████     | 95/189 [02:32<02:31,  1.61s/it, loss=18.02]Epoch 1/5:  50%|█████     | 95/189 [02:34<02:31,  1.61s/it, loss=18.19]Epoch 1/5:  51%|█████     | 96/189 [02:34<02:32,  1.64s/it, loss=18.19]Epoch 1/5:  51%|█████     | 96/189 [02:36<02:32,  1.64s/it, loss=17.99]Epoch 1/5:  51%|█████▏    | 97/189 [02:36<02:28,  1.62s/it, loss=17.99]Epoch 1/5:  51%|█████▏    | 97/189 [02:37<02:28,  1.62s/it, loss=17.63]Epoch 1/5:  52%|█████▏    | 98/189 [02:37<02:29,  1.64s/it, loss=17.63]Epoch 1/5:  52%|█████▏    | 98/189 [02:39<02:29,  1.64s/it, loss=17.92]Epoch 1/5:  52%|█████▏    | 99/189 [02:39<02:26,  1.62s/it, loss=17.92]Epoch 1/5:  52%|█████▏    | 99/189 [02:41<02:26,  1.62s/it, loss=17.82]Epoch 1/5:  53%|█████▎    | 100/189 [02:41<02:26,  1.65s/it, loss=17.82]Epoch 1/5:  53%|█████▎    | 100/189 [02:42<02:26,  1.65s/it, loss=17.55]Epoch 1/5:  53%|█████▎    | 101/189 [02:42<02:23,  1.64s/it, loss=17.55]Epoch 1/5:  53%|█████▎    | 101/189 [02:44<02:23,  1.64s/it, loss=17.63]Epoch 1/5:  54%|█████▍    | 102/189 [02:44<02:22,  1.63s/it, loss=17.63]Epoch 1/5:  54%|█████▍    | 102/189 [02:46<02:22,  1.63s/it, loss=17.74]Epoch 1/5:  54%|█████▍    | 103/189 [02:46<02:20,  1.64s/it, loss=17.74]Epoch 1/5:  54%|█████▍    | 103/189 [02:47<02:20,  1.64s/it, loss=17.44]Epoch 1/5:  55%|█████▌    | 104/189 [02:47<02:14,  1.58s/it, loss=17.44]Epoch 1/5:  55%|█████▌    | 104/189 [02:49<02:14,  1.58s/it, loss=17.60]Epoch 1/5:  56%|█████▌    | 105/189 [02:49<02:10,  1.56s/it, loss=17.60]Epoch 1/5:  56%|█████▌    | 105/189 [02:50<02:10,  1.56s/it, loss=17.58]Epoch 1/5:  56%|█████▌    | 106/189 [02:50<02:11,  1.58s/it, loss=17.58]Epoch 1/5:  56%|█████▌    | 106/189 [02:52<02:11,  1.58s/it, loss=17.23]Epoch 1/5:  57%|█████▋    | 107/189 [02:52<02:13,  1.63s/it, loss=17.23]Epoch 1/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.63s/it, loss=17.34]Epoch 1/5:  57%|█████▋    | 108/189 [02:53<02:11,  1.62s/it, loss=17.34]Epoch 1/5:  57%|█████▋    | 108/189 [02:55<02:11,  1.62s/it, loss=17.37]Epoch 1/5:  58%|█████▊    | 109/189 [02:55<02:14,  1.68s/it, loss=17.37]Epoch 1/5:  58%|█████▊    | 109/189 [02:57<02:14,  1.68s/it, loss=17.22]Epoch 1/5:  58%|█████▊    | 110/189 [02:57<02:15,  1.72s/it, loss=17.22]Epoch 1/5:  58%|█████▊    | 110/189 [02:59<02:15,  1.72s/it, loss=17.15]Epoch 1/5:  59%|█████▊    | 111/189 [02:59<02:11,  1.69s/it, loss=17.15]Epoch 1/5:  59%|█████▊    | 111/189 [03:00<02:11,  1.69s/it, loss=17.01]Epoch 1/5:  59%|█████▉    | 112/189 [03:00<02:10,  1.70s/it, loss=17.01]Epoch 1/5:  59%|█████▉    | 112/189 [03:02<02:10,  1.70s/it, loss=17.17]Epoch 1/5:  60%|█████▉    | 113/189 [03:02<02:11,  1.73s/it, loss=17.17]Epoch 1/5:  60%|█████▉    | 113/189 [03:04<02:11,  1.73s/it, loss=17.01]Epoch 1/5:  60%|██████    | 114/189 [03:04<02:05,  1.67s/it, loss=17.01]Epoch 1/5:  60%|██████    | 114/189 [03:05<02:05,  1.67s/it, loss=16.85]Epoch 1/5:  61%|██████    | 115/189 [03:05<02:00,  1.63s/it, loss=16.85]Epoch 1/5:  61%|██████    | 115/189 [03:07<02:00,  1.63s/it, loss=16.88]Epoch 1/5:  61%|██████▏   | 116/189 [03:07<02:01,  1.67s/it, loss=16.88]Epoch 1/5:  61%|██████▏   | 116/189 [03:09<02:01,  1.67s/it, loss=16.82]Epoch 1/5:  62%|██████▏   | 117/189 [03:09<01:59,  1.66s/it, loss=16.82]Epoch 1/5:  62%|██████▏   | 117/189 [03:10<01:59,  1.66s/it, loss=16.83]Epoch 1/5:  62%|██████▏   | 118/189 [03:10<01:57,  1.65s/it, loss=16.83]Epoch 1/5:  62%|██████▏   | 118/189 [03:12<01:57,  1.65s/it, loss=16.63]Epoch 1/5:  63%|██████▎   | 119/189 [03:12<01:54,  1.63s/it, loss=16.63]Epoch 1/5:  63%|██████▎   | 119/189 [03:14<01:54,  1.63s/it, loss=16.44]Epoch 1/5:  63%|██████▎   | 120/189 [03:14<01:52,  1.64s/it, loss=16.44]Epoch 1/5:  63%|██████▎   | 120/189 [03:15<01:52,  1.64s/it, loss=16.58]Epoch 1/5:  64%|██████▍   | 121/189 [03:15<01:50,  1.62s/it, loss=16.58]Epoch 1/5:  64%|██████▍   | 121/189 [03:17<01:50,  1.62s/it, loss=16.61]Epoch 1/5:  65%|██████▍   | 122/189 [03:17<01:50,  1.65s/it, loss=16.61]Epoch 1/5:  65%|██████▍   | 122/189 [03:19<01:50,  1.65s/it, loss=16.14]Epoch 1/5:  65%|██████▌   | 123/189 [03:19<01:49,  1.66s/it, loss=16.14]Epoch 1/5:  65%|██████▌   | 123/189 [03:20<01:49,  1.66s/it, loss=16.71]Epoch 1/5:  66%|██████▌   | 124/189 [03:20<01:43,  1.59s/it, loss=16.71]Epoch 1/5:  66%|██████▌   | 124/189 [03:22<01:43,  1.59s/it, loss=16.30]Epoch 1/5:  66%|██████▌   | 125/189 [03:22<01:40,  1.58s/it, loss=16.30]Epoch 1/5:  66%|██████▌   | 125/189 [03:23<01:40,  1.58s/it, loss=16.33]Epoch 1/5:  67%|██████▋   | 126/189 [03:23<01:38,  1.56s/it, loss=16.33]Epoch 1/5:  67%|██████▋   | 126/189 [03:25<01:38,  1.56s/it, loss=16.26]Epoch 1/5:  67%|██████▋   | 127/189 [03:25<01:36,  1.56s/it, loss=16.26]Epoch 1/5:  67%|██████▋   | 127/189 [03:26<01:36,  1.56s/it, loss=16.13]Epoch 1/5:  68%|██████▊   | 128/189 [03:26<01:34,  1.55s/it, loss=16.13]Epoch 1/5:  68%|██████▊   | 128/189 [03:28<01:34,  1.55s/it, loss=15.99]Epoch 1/5:  68%|██████▊   | 129/189 [03:28<01:35,  1.58s/it, loss=15.99]Epoch 1/5:  68%|██████▊   | 129/189 [03:29<01:35,  1.58s/it, loss=16.44]Epoch 1/5:  69%|██████▉   | 130/189 [03:29<01:34,  1.60s/it, loss=16.44]Epoch 1/5:  69%|██████▉   | 130/189 [03:31<01:34,  1.60s/it, loss=15.88]Epoch 1/5:  69%|██████▉   | 131/189 [03:31<01:31,  1.58s/it, loss=15.88]Epoch 1/5:  69%|██████▉   | 131/189 [03:33<01:31,  1.58s/it, loss=16.16]Epoch 1/5:  70%|██████▉   | 132/189 [03:33<01:31,  1.60s/it, loss=16.16]Epoch 1/5:  70%|██████▉   | 132/189 [03:34<01:31,  1.60s/it, loss=15.84]Epoch 1/5:  70%|███████   | 133/189 [03:34<01:28,  1.58s/it, loss=15.84]Epoch 1/5:  70%|███████   | 133/189 [03:36<01:28,  1.58s/it, loss=15.77]Epoch 1/5:  71%|███████   | 134/189 [03:36<01:27,  1.59s/it, loss=15.77]Epoch 1/5:  71%|███████   | 134/189 [03:37<01:27,  1.59s/it, loss=15.82]Epoch 1/5:  71%|███████▏  | 135/189 [03:37<01:27,  1.62s/it, loss=15.82]Epoch 1/5:  71%|███████▏  | 135/189 [03:39<01:27,  1.62s/it, loss=15.74]Epoch 1/5:  72%|███████▏  | 136/189 [03:39<01:31,  1.72s/it, loss=15.74]Epoch 1/5:  72%|███████▏  | 136/189 [03:41<01:31,  1.72s/it, loss=15.69]Epoch 1/5:  72%|███████▏  | 137/189 [03:41<01:30,  1.75s/it, loss=15.69]Epoch 1/5:  72%|███████▏  | 137/189 [03:43<01:30,  1.75s/it, loss=15.77]Epoch 1/5:  73%|███████▎  | 138/189 [03:43<01:30,  1.78s/it, loss=15.77]Epoch 1/5:  73%|███████▎  | 138/189 [03:45<01:30,  1.78s/it, loss=15.66]Epoch 1/5:  74%|███████▎  | 139/189 [03:45<01:29,  1.79s/it, loss=15.66]Epoch 1/5:  74%|███████▎  | 139/189 [03:47<01:29,  1.79s/it, loss=15.55]Epoch 1/5:  74%|███████▍  | 140/189 [03:47<01:27,  1.79s/it, loss=15.55]Epoch 1/5:  74%|███████▍  | 140/189 [03:48<01:27,  1.79s/it, loss=15.68]Epoch 1/5:  75%|███████▍  | 141/189 [03:48<01:23,  1.74s/it, loss=15.68]Epoch 1/5:  75%|███████▍  | 141/189 [03:50<01:23,  1.74s/it, loss=15.45]Epoch 1/5:  75%|███████▌  | 142/189 [03:50<01:18,  1.67s/it, loss=15.45]Epoch 1/5:  75%|███████▌  | 142/189 [03:52<01:18,  1.67s/it, loss=15.60]Epoch 1/5:  76%|███████▌  | 143/189 [03:52<01:18,  1.70s/it, loss=15.60]Epoch 1/5:  76%|███████▌  | 143/189 [03:53<01:18,  1.70s/it, loss=15.44]Epoch 1/5:  76%|███████▌  | 144/189 [03:53<01:14,  1.65s/it, loss=15.44]Epoch 1/5:  76%|███████▌  | 144/189 [03:55<01:14,  1.65s/it, loss=15.45]Epoch 1/5:  77%|███████▋  | 145/189 [03:55<01:13,  1.68s/it, loss=15.45]Epoch 1/5:  77%|███████▋  | 145/189 [03:57<01:13,  1.68s/it, loss=15.33]Epoch 1/5:  77%|███████▋  | 146/189 [03:57<01:13,  1.71s/it, loss=15.33]Epoch 1/5:  77%|███████▋  | 146/189 [03:58<01:13,  1.71s/it, loss=15.21]Epoch 1/5:  78%|███████▊  | 147/189 [03:58<01:11,  1.69s/it, loss=15.21]Epoch 1/5:  78%|███████▊  | 147/189 [04:00<01:11,  1.69s/it, loss=15.24]Epoch 1/5:  78%|███████▊  | 148/189 [04:00<01:08,  1.66s/it, loss=15.24]Epoch 1/5:  78%|███████▊  | 148/189 [04:01<01:08,  1.66s/it, loss=15.07]Epoch 1/5:  79%|███████▉  | 149/189 [04:01<01:05,  1.63s/it, loss=15.07]Epoch 1/5:  79%|███████▉  | 149/189 [04:03<01:05,  1.63s/it, loss=15.05]Epoch 1/5:  79%|███████▉  | 150/189 [04:03<01:02,  1.60s/it, loss=15.05]Epoch 1/5:  79%|███████▉  | 150/189 [04:05<01:02,  1.60s/it, loss=15.25]Epoch 1/5:  80%|███████▉  | 151/189 [04:05<01:00,  1.60s/it, loss=15.25]Epoch 1/5:  80%|███████▉  | 151/189 [04:06<01:00,  1.60s/it, loss=14.99]Epoch 1/5:  80%|████████  | 152/189 [04:06<00:59,  1.61s/it, loss=14.99]Epoch 1/5:  80%|████████  | 152/189 [04:08<00:59,  1.61s/it, loss=14.92]Epoch 1/5:  81%|████████  | 153/189 [04:08<00:57,  1.61s/it, loss=14.92]Epoch 1/5:  81%|████████  | 153/189 [04:09<00:57,  1.61s/it, loss=15.06]Epoch 1/5:  81%|████████▏ | 154/189 [04:09<00:56,  1.61s/it, loss=15.06]Epoch 1/5:  81%|████████▏ | 154/189 [04:11<00:56,  1.61s/it, loss=14.89]Epoch 1/5:  82%|████████▏ | 155/189 [04:11<00:55,  1.64s/it, loss=14.89]Epoch 1/5:  82%|████████▏ | 155/189 [04:13<00:55,  1.64s/it, loss=14.79]Epoch 1/5:  83%|████████▎ | 156/189 [04:13<00:55,  1.67s/it, loss=14.79]Epoch 1/5:  83%|████████▎ | 156/189 [04:15<00:55,  1.67s/it, loss=14.68]Epoch 1/5:  83%|████████▎ | 157/189 [04:15<00:54,  1.69s/it, loss=14.68]Epoch 1/5:  83%|████████▎ | 157/189 [04:16<00:54,  1.69s/it, loss=14.75]Epoch 1/5:  84%|████████▎ | 158/189 [04:16<00:52,  1.69s/it, loss=14.75]Epoch 1/5:  84%|████████▎ | 158/189 [04:18<00:52,  1.69s/it, loss=14.70]Epoch 1/5:  84%|████████▍ | 159/189 [04:18<00:50,  1.70s/it, loss=14.70]Epoch 1/5:  84%|████████▍ | 159/189 [04:20<00:50,  1.70s/it, loss=14.68]Epoch 1/5:  85%|████████▍ | 160/189 [04:20<00:49,  1.70s/it, loss=14.68]Epoch 1/5:  85%|████████▍ | 160/189 [04:21<00:49,  1.70s/it, loss=14.69]Epoch 1/5:  85%|████████▌ | 161/189 [04:21<00:47,  1.71s/it, loss=14.69]Epoch 1/5:  85%|████████▌ | 161/189 [04:23<00:47,  1.71s/it, loss=14.70]Epoch 1/5:  86%|████████▌ | 162/189 [04:23<00:45,  1.68s/it, loss=14.70]Epoch 1/5:  86%|████████▌ | 162/189 [04:25<00:45,  1.68s/it, loss=14.55]Epoch 1/5:  86%|████████▌ | 163/189 [04:25<00:43,  1.66s/it, loss=14.55]Epoch 1/5:  86%|████████▌ | 163/189 [04:26<00:43,  1.66s/it, loss=14.70]Epoch 1/5:  87%|████████▋ | 164/189 [04:26<00:41,  1.67s/it, loss=14.70]Epoch 1/5:  87%|████████▋ | 164/189 [04:28<00:41,  1.67s/it, loss=14.55]Epoch 1/5:  87%|████████▋ | 165/189 [04:28<00:39,  1.66s/it, loss=14.55]Epoch 1/5:  87%|████████▋ | 165/189 [04:30<00:39,  1.66s/it, loss=14.54]Epoch 1/5:  88%|████████▊ | 166/189 [04:30<00:38,  1.68s/it, loss=14.54]Epoch 1/5:  88%|████████▊ | 166/189 [04:31<00:38,  1.68s/it, loss=14.35]Epoch 1/5:  88%|████████▊ | 167/189 [04:31<00:36,  1.66s/it, loss=14.35]Epoch 1/5:  88%|████████▊ | 167/189 [04:33<00:36,  1.66s/it, loss=14.41]Epoch 1/5:  89%|████████▉ | 168/189 [04:33<00:35,  1.67s/it, loss=14.41]Epoch 1/5:  89%|████████▉ | 168/189 [04:35<00:35,  1.67s/it, loss=14.46]Epoch 1/5:  89%|████████▉ | 169/189 [04:35<00:33,  1.66s/it, loss=14.46]Epoch 1/5:  89%|████████▉ | 169/189 [04:36<00:33,  1.66s/it, loss=14.36]Epoch 1/5:  90%|████████▉ | 170/189 [04:36<00:30,  1.63s/it, loss=14.36]Epoch 1/5:  90%|████████▉ | 170/189 [04:38<00:30,  1.63s/it, loss=14.38]Epoch 1/5:  90%|█████████ | 171/189 [04:38<00:30,  1.68s/it, loss=14.38]Epoch 1/5:  90%|█████████ | 171/189 [04:40<00:30,  1.68s/it, loss=14.19]Epoch 1/5:  91%|█████████ | 172/189 [04:40<00:28,  1.68s/it, loss=14.19]Epoch 1/5:  91%|█████████ | 172/189 [04:41<00:28,  1.68s/it, loss=14.05]Epoch 1/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.62s/it, loss=14.05]Epoch 1/5:  92%|█████████▏| 173/189 [04:43<00:25,  1.62s/it, loss=14.13]Epoch 1/5:  92%|█████████▏| 174/189 [04:43<00:24,  1.65s/it, loss=14.13]Epoch 1/5:  92%|█████████▏| 174/189 [04:44<00:24,  1.65s/it, loss=14.16]Epoch 1/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.59s/it, loss=14.16]Epoch 1/5:  93%|█████████▎| 175/189 [04:46<00:22,  1.59s/it, loss=14.07]Epoch 1/5:  93%|█████████▎| 176/189 [04:46<00:20,  1.62s/it, loss=14.07]Epoch 1/5:  93%|█████████▎| 176/189 [04:48<00:20,  1.62s/it, loss=13.89]Epoch 1/5:  94%|█████████▎| 177/189 [04:48<00:18,  1.58s/it, loss=13.89]Epoch 1/5:  94%|█████████▎| 177/189 [04:49<00:18,  1.58s/it, loss=13.97]Epoch 1/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.62s/it, loss=13.97]Epoch 1/5:  94%|█████████▍| 178/189 [04:51<00:17,  1.62s/it, loss=14.08]Epoch 1/5:  95%|█████████▍| 179/189 [04:51<00:16,  1.64s/it, loss=14.08]Epoch 1/5:  95%|█████████▍| 179/189 [04:53<00:16,  1.64s/it, loss=13.94]Epoch 1/5:  95%|█████████▌| 180/189 [04:53<00:14,  1.63s/it, loss=13.94]Epoch 1/5:  95%|█████████▌| 180/189 [04:54<00:14,  1.63s/it, loss=13.85]Epoch 1/5:  96%|█████████▌| 181/189 [04:54<00:13,  1.64s/it, loss=13.85]Epoch 1/5:  96%|█████████▌| 181/189 [04:56<00:13,  1.64s/it, loss=13.88]Epoch 1/5:  96%|█████████▋| 182/189 [04:56<00:11,  1.63s/it, loss=13.88]Epoch 1/5:  96%|█████████▋| 182/189 [04:58<00:11,  1.63s/it, loss=13.67]Epoch 1/5:  97%|█████████▋| 183/189 [04:58<00:09,  1.66s/it, loss=13.67]Epoch 1/5:  97%|█████████▋| 183/189 [04:59<00:09,  1.66s/it, loss=13.83]Epoch 1/5:  97%|█████████▋| 184/189 [04:59<00:08,  1.65s/it, loss=13.83]Epoch 1/5:  97%|█████████▋| 184/189 [05:01<00:08,  1.65s/it, loss=13.61]Epoch 1/5:  98%|█████████▊| 185/189 [05:01<00:06,  1.58s/it, loss=13.61]Epoch 1/5:  98%|█████████▊| 185/189 [05:02<00:06,  1.58s/it, loss=13.65]Epoch 1/5:  98%|█████████▊| 186/189 [05:02<00:04,  1.62s/it, loss=13.65]Epoch 1/5:  98%|█████████▊| 186/189 [05:04<00:04,  1.62s/it, loss=13.69]Epoch 1/5:  99%|█████████▉| 187/189 [05:04<00:03,  1.61s/it, loss=13.69]Epoch 1/5:  99%|█████████▉| 187/189 [05:05<00:03,  1.61s/it, loss=13.57]Epoch 1/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.59s/it, loss=13.57]Epoch 1/5:  99%|█████████▉| 188/189 [05:07<00:01,  1.59s/it, loss=13.67]Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00,  1.60s/it, loss=13.67]Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00,  1.63s/it, loss=13.67]
  0%|          | 0/23 [00:00<?, ?it/s]  4%|▍         | 1/23 [00:00<00:08,  2.64it/s]  9%|▊         | 2/23 [00:00<00:08,  2.60it/s] 13%|█▎        | 3/23 [00:01<00:07,  2.57it/s] 17%|█▋        | 4/23 [00:01<00:06,  2.94it/s] 22%|██▏       | 5/23 [00:01<00:06,  2.82it/s] 26%|██▌       | 6/23 [00:02<00:06,  2.67it/s] 30%|███       | 7/23 [00:02<00:05,  2.85it/s] 35%|███▍      | 8/23 [00:02<00:05,  2.92it/s] 39%|███▉      | 9/23 [00:03<00:05,  2.68it/s] 43%|████▎     | 10/23 [00:03<00:04,  2.78it/s] 48%|████▊     | 11/23 [00:04<00:04,  2.71it/s] 52%|█████▏    | 12/23 [00:04<00:03,  2.76it/s] 57%|█████▋    | 13/23 [00:04<00:03,  2.78it/s] 61%|██████    | 14/23 [00:05<00:03,  2.80it/s] 65%|██████▌   | 15/23 [00:05<00:02,  2.84it/s] 70%|██████▉   | 16/23 [00:05<00:02,  2.80it/s] 74%|███████▍  | 17/23 [00:06<00:02,  2.73it/s] 78%|███████▊  | 18/23 [00:06<00:01,  2.67it/s] 83%|████████▎ | 19/23 [00:06<00:01,  2.60it/s] 87%|████████▋ | 20/23 [00:07<00:01,  2.55it/s] 91%|█████████▏| 21/23 [00:07<00:00,  2.73it/s] 96%|█████████▌| 22/23 [00:08<00:00,  2.67it/s]100%|██████████| 23/23 [00:08<00:00,  2.61it/s]100%|██████████| 23/23 [00:08<00:00,  2.71it/s]

Epoch 1: train_loss=19.5774 | R@10=0.0075 | DCG@10=0.0666 | NDCG@10=0.0156
Epoch 2/5:   0%|          | 0/189 [00:00<?, ?it/s]Epoch 2/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=13.54]Epoch 2/5:   1%|          | 1/189 [00:01<05:20,  1.71s/it, loss=13.54]Epoch 2/5:   1%|          | 1/189 [00:03<05:20,  1.71s/it, loss=13.40]Epoch 2/5:   1%|          | 2/189 [00:03<05:08,  1.65s/it, loss=13.40]Epoch 2/5:   1%|          | 2/189 [00:04<05:08,  1.65s/it, loss=13.47]Epoch 2/5:   2%|▏         | 3/189 [00:04<04:51,  1.56s/it, loss=13.47]Epoch 2/5:   2%|▏         | 3/189 [00:06<04:51,  1.56s/it, loss=13.47]Epoch 2/5:   2%|▏         | 4/189 [00:06<04:50,  1.57s/it, loss=13.47]Epoch 2/5:   2%|▏         | 4/189 [00:07<04:50,  1.57s/it, loss=13.30]Epoch 2/5:   3%|▎         | 5/189 [00:07<04:46,  1.56s/it, loss=13.30]Epoch 2/5:   3%|▎         | 5/189 [00:09<04:46,  1.56s/it, loss=13.31]Epoch 2/5:   3%|▎         | 6/189 [00:09<04:49,  1.58s/it, loss=13.31]Epoch 2/5:   3%|▎         | 6/189 [00:11<04:49,  1.58s/it, loss=13.23]Epoch 2/5:   4%|▎         | 7/189 [00:11<04:46,  1.58s/it, loss=13.23]Epoch 2/5:   4%|▎         | 7/189 [00:12<04:46,  1.58s/it, loss=13.24]Epoch 2/5:   4%|▍         | 8/189 [00:12<04:33,  1.51s/it, loss=13.24]Epoch 2/5:   4%|▍         | 8/189 [00:13<04:33,  1.51s/it, loss=13.35]Epoch 2/5:   5%|▍         | 9/189 [00:13<04:31,  1.51s/it, loss=13.35]Epoch 2/5:   5%|▍         | 9/189 [00:15<04:31,  1.51s/it, loss=13.15]Epoch 2/5:   5%|▌         | 10/189 [00:15<04:31,  1.52s/it, loss=13.15]Epoch 2/5:   5%|▌         | 10/189 [00:17<04:31,  1.52s/it, loss=13.11]Epoch 2/5:   6%|▌         | 11/189 [00:17<04:29,  1.52s/it, loss=13.11]Epoch 2/5:   6%|▌         | 11/189 [00:18<04:29,  1.52s/it, loss=13.15]Epoch 2/5:   6%|▋         | 12/189 [00:18<04:35,  1.56s/it, loss=13.15]Epoch 2/5:   6%|▋         | 12/189 [00:20<04:35,  1.56s/it, loss=13.11]Epoch 2/5:   7%|▋         | 13/189 [00:20<04:40,  1.59s/it, loss=13.11]Epoch 2/5:   7%|▋         | 13/189 [00:22<04:40,  1.59s/it, loss=13.27]Epoch 2/5:   7%|▋         | 14/189 [00:22<04:45,  1.63s/it, loss=13.27]Epoch 2/5:   7%|▋         | 14/189 [00:23<04:45,  1.63s/it, loss=13.05]Epoch 2/5:   8%|▊         | 15/189 [00:23<04:50,  1.67s/it, loss=13.05]Epoch 2/5:   8%|▊         | 15/189 [00:25<04:50,  1.67s/it, loss=12.91]Epoch 2/5:   8%|▊         | 16/189 [00:25<04:47,  1.66s/it, loss=12.91]Epoch 2/5:   8%|▊         | 16/189 [00:27<04:47,  1.66s/it, loss=12.93]Epoch 2/5:   9%|▉         | 17/189 [00:27<04:40,  1.63s/it, loss=12.93]Epoch 2/5:   9%|▉         | 17/189 [00:28<04:40,  1.63s/it, loss=13.04]Epoch 2/5:  10%|▉         | 18/189 [00:28<04:39,  1.63s/it, loss=13.04]Epoch 2/5:  10%|▉         | 18/189 [00:30<04:39,  1.63s/it, loss=12.89]Epoch 2/5:  10%|█         | 19/189 [00:30<04:38,  1.64s/it, loss=12.89]Epoch 2/5:  10%|█         | 19/189 [00:31<04:38,  1.64s/it, loss=12.66]Epoch 2/5:  11%|█         | 20/189 [00:31<04:38,  1.65s/it, loss=12.66]Epoch 2/5:  11%|█         | 20/189 [00:33<04:38,  1.65s/it, loss=12.61]Epoch 2/5:  11%|█         | 21/189 [00:33<04:42,  1.68s/it, loss=12.61]Epoch 2/5:  11%|█         | 21/189 [00:35<04:42,  1.68s/it, loss=12.86]Epoch 2/5:  12%|█▏        | 22/189 [00:35<04:35,  1.65s/it, loss=12.86]Epoch 2/5:  12%|█▏        | 22/189 [00:36<04:35,  1.65s/it, loss=13.02]Epoch 2/5:  12%|█▏        | 23/189 [00:36<04:29,  1.63s/it, loss=13.02]Epoch 2/5:  12%|█▏        | 23/189 [00:38<04:29,  1.63s/it, loss=12.67]Epoch 2/5:  13%|█▎        | 24/189 [00:38<04:26,  1.62s/it, loss=12.67]Epoch 2/5:  13%|█▎        | 24/189 [00:40<04:26,  1.62s/it, loss=12.71]Epoch 2/5:  13%|█▎        | 25/189 [00:40<04:28,  1.64s/it, loss=12.71]Epoch 2/5:  13%|█▎        | 25/189 [00:41<04:28,  1.64s/it, loss=12.63]Epoch 2/5:  14%|█▍        | 26/189 [00:41<04:19,  1.59s/it, loss=12.63]Epoch 2/5:  14%|█▍        | 26/189 [00:43<04:19,  1.59s/it, loss=12.68]Epoch 2/5:  14%|█▍        | 27/189 [00:43<04:26,  1.64s/it, loss=12.68]Epoch 2/5:  14%|█▍        | 27/189 [00:45<04:26,  1.64s/it, loss=12.62]Epoch 2/5:  15%|█▍        | 28/189 [00:45<04:26,  1.66s/it, loss=12.62]Epoch 2/5:  15%|█▍        | 28/189 [00:46<04:26,  1.66s/it, loss=12.56]Epoch 2/5:  15%|█▌        | 29/189 [00:46<04:20,  1.63s/it, loss=12.56]Epoch 2/5:  15%|█▌        | 29/189 [00:48<04:20,  1.63s/it, loss=12.73]Epoch 2/5:  16%|█▌        | 30/189 [00:48<04:18,  1.63s/it, loss=12.73]Epoch 2/5:  16%|█▌        | 30/189 [00:49<04:18,  1.63s/it, loss=12.63]Epoch 2/5:  16%|█▋        | 31/189 [00:49<04:18,  1.64s/it, loss=12.63]Epoch 2/5:  16%|█▋        | 31/189 [00:51<04:18,  1.64s/it, loss=12.51]Epoch 2/5:  17%|█▋        | 32/189 [00:51<04:15,  1.63s/it, loss=12.51]Epoch 2/5:  17%|█▋        | 32/189 [00:53<04:15,  1.63s/it, loss=12.16]Epoch 2/5:  17%|█▋        | 33/189 [00:53<04:12,  1.62s/it, loss=12.16]Epoch 2/5:  17%|█▋        | 33/189 [00:54<04:12,  1.62s/it, loss=12.49]Epoch 2/5:  18%|█▊        | 34/189 [00:54<04:18,  1.67s/it, loss=12.49]Epoch 2/5:  18%|█▊        | 34/189 [00:56<04:18,  1.67s/it, loss=12.27]Epoch 2/5:  19%|█▊        | 35/189 [00:56<04:19,  1.68s/it, loss=12.27]Epoch 2/5:  19%|█▊        | 35/189 [00:58<04:19,  1.68s/it, loss=12.26]Epoch 2/5:  19%|█▉        | 36/189 [00:58<04:17,  1.68s/it, loss=12.26]Epoch 2/5:  19%|█▉        | 36/189 [00:59<04:17,  1.68s/it, loss=12.36]Epoch 2/5:  20%|█▉        | 37/189 [00:59<04:12,  1.66s/it, loss=12.36]Epoch 2/5:  20%|█▉        | 37/189 [01:01<04:12,  1.66s/it, loss=12.24]Epoch 2/5:  20%|██        | 38/189 [01:01<04:04,  1.62s/it, loss=12.24]Epoch 2/5:  20%|██        | 38/189 [01:03<04:04,  1.62s/it, loss=12.26]Epoch 2/5:  21%|██        | 39/189 [01:03<04:06,  1.64s/it, loss=12.26]Epoch 2/5:  21%|██        | 39/189 [01:04<04:06,  1.64s/it, loss=12.12]Epoch 2/5:  21%|██        | 40/189 [01:04<04:08,  1.67s/it, loss=12.12]Epoch 2/5:  21%|██        | 40/189 [01:06<04:08,  1.67s/it, loss=12.33]Epoch 2/5:  22%|██▏       | 41/189 [01:06<04:09,  1.68s/it, loss=12.33]Epoch 2/5:  22%|██▏       | 41/189 [01:08<04:09,  1.68s/it, loss=12.16]Epoch 2/5:  22%|██▏       | 42/189 [01:08<04:01,  1.65s/it, loss=12.16]Epoch 2/5:  22%|██▏       | 42/189 [01:09<04:01,  1.65s/it, loss=12.10]Epoch 2/5:  23%|██▎       | 43/189 [01:09<03:54,  1.60s/it, loss=12.10]Epoch 2/5:  23%|██▎       | 43/189 [01:11<03:54,  1.60s/it, loss=11.91]Epoch 2/5:  23%|██▎       | 44/189 [01:11<03:52,  1.60s/it, loss=11.91]Epoch 2/5:  23%|██▎       | 44/189 [01:12<03:52,  1.60s/it, loss=12.02]Epoch 2/5:  24%|██▍       | 45/189 [01:12<03:53,  1.62s/it, loss=12.02]Epoch 2/5:  24%|██▍       | 45/189 [01:14<03:53,  1.62s/it, loss=12.11]Epoch 2/5:  24%|██▍       | 46/189 [01:14<03:48,  1.59s/it, loss=12.11]Epoch 2/5:  24%|██▍       | 46/189 [01:16<03:48,  1.59s/it, loss=12.06]Epoch 2/5:  25%|██▍       | 47/189 [01:16<03:56,  1.67s/it, loss=12.06]Epoch 2/5:  25%|██▍       | 47/189 [01:18<03:56,  1.67s/it, loss=12.16]Epoch 2/5:  25%|██▌       | 48/189 [01:18<04:00,  1.71s/it, loss=12.16]Epoch 2/5:  25%|██▌       | 48/189 [01:19<04:00,  1.71s/it, loss=11.98]Epoch 2/5:  26%|██▌       | 49/189 [01:19<03:55,  1.68s/it, loss=11.98]Epoch 2/5:  26%|██▌       | 49/189 [01:21<03:55,  1.68s/it, loss=12.18]Epoch 2/5:  26%|██▋       | 50/189 [01:21<03:53,  1.68s/it, loss=12.18]Epoch 2/5:  26%|██▋       | 50/189 [01:23<03:53,  1.68s/it, loss=11.85]Epoch 2/5:  27%|██▋       | 51/189 [01:23<03:55,  1.70s/it, loss=11.85]Epoch 2/5:  27%|██▋       | 51/189 [01:24<03:55,  1.70s/it, loss=11.98]Epoch 2/5:  28%|██▊       | 52/189 [01:24<03:43,  1.63s/it, loss=11.98]Epoch 2/5:  28%|██▊       | 52/189 [01:26<03:43,  1.63s/it, loss=12.01]Epoch 2/5:  28%|██▊       | 53/189 [01:26<03:41,  1.63s/it, loss=12.01]Epoch 2/5:  28%|██▊       | 53/189 [01:27<03:41,  1.63s/it, loss=11.81]Epoch 2/5:  29%|██▊       | 54/189 [01:27<03:41,  1.64s/it, loss=11.81]Epoch 2/5:  29%|██▊       | 54/189 [01:29<03:41,  1.64s/it, loss=11.78]Epoch 2/5:  29%|██▉       | 55/189 [01:29<03:41,  1.66s/it, loss=11.78]Epoch 2/5:  29%|██▉       | 55/189 [01:31<03:41,  1.66s/it, loss=11.84]Epoch 2/5:  30%|██▉       | 56/189 [01:31<03:41,  1.67s/it, loss=11.84]Epoch 2/5:  30%|██▉       | 56/189 [01:32<03:41,  1.67s/it, loss=12.02]Epoch 2/5:  30%|███       | 57/189 [01:32<03:37,  1.64s/it, loss=12.02]Epoch 2/5:  30%|███       | 57/189 [01:34<03:37,  1.64s/it, loss=11.94]Epoch 2/5:  31%|███       | 58/189 [01:34<03:31,  1.61s/it, loss=11.94]Epoch 2/5:  31%|███       | 58/189 [01:36<03:31,  1.61s/it, loss=11.72]Epoch 2/5:  31%|███       | 59/189 [01:36<03:31,  1.63s/it, loss=11.72]Epoch 2/5:  31%|███       | 59/189 [01:37<03:31,  1.63s/it, loss=11.60]Epoch 2/5:  32%|███▏      | 60/189 [01:37<03:28,  1.61s/it, loss=11.60]Epoch 2/5:  32%|███▏      | 60/189 [01:39<03:28,  1.61s/it, loss=11.65]Epoch 2/5:  32%|███▏      | 61/189 [01:39<03:29,  1.64s/it, loss=11.65]Epoch 2/5:  32%|███▏      | 61/189 [01:41<03:29,  1.64s/it, loss=11.49]Epoch 2/5:  33%|███▎      | 62/189 [01:41<03:29,  1.65s/it, loss=11.49]Epoch 2/5:  33%|███▎      | 62/189 [01:42<03:29,  1.65s/it, loss=11.53]Epoch 2/5:  33%|███▎      | 63/189 [01:42<03:27,  1.65s/it, loss=11.53]Epoch 2/5:  33%|███▎      | 63/189 [01:44<03:27,  1.65s/it, loss=11.93]Epoch 2/5:  34%|███▍      | 64/189 [01:44<03:21,  1.61s/it, loss=11.93]Epoch 2/5:  34%|███▍      | 64/189 [01:45<03:21,  1.61s/it, loss=11.78]Epoch 2/5:  34%|███▍      | 65/189 [01:45<03:16,  1.58s/it, loss=11.78]Epoch 2/5:  34%|███▍      | 65/189 [01:47<03:16,  1.58s/it, loss=11.73]Epoch 2/5:  35%|███▍      | 66/189 [01:47<03:24,  1.66s/it, loss=11.73]Epoch 2/5:  35%|███▍      | 66/189 [01:49<03:24,  1.66s/it, loss=11.46]Epoch 2/5:  35%|███▌      | 67/189 [01:49<03:18,  1.63s/it, loss=11.46]Epoch 2/5:  35%|███▌      | 67/189 [01:50<03:18,  1.63s/it, loss=11.55]Epoch 2/5:  36%|███▌      | 68/189 [01:50<03:16,  1.63s/it, loss=11.55]Epoch 2/5:  36%|███▌      | 68/189 [01:52<03:16,  1.63s/it, loss=11.59]Epoch 2/5:  37%|███▋      | 69/189 [01:52<03:13,  1.62s/it, loss=11.59]Epoch 2/5:  37%|███▋      | 69/189 [01:53<03:13,  1.62s/it, loss=11.51]Epoch 2/5:  37%|███▋      | 70/189 [01:53<03:10,  1.60s/it, loss=11.51]Epoch 2/5:  37%|███▋      | 70/189 [01:55<03:10,  1.60s/it, loss=11.60]Epoch 2/5:  38%|███▊      | 71/189 [01:55<03:10,  1.62s/it, loss=11.60]Epoch 2/5:  38%|███▊      | 71/189 [01:57<03:10,  1.62s/it, loss=11.33]Epoch 2/5:  38%|███▊      | 72/189 [01:57<03:06,  1.60s/it, loss=11.33]Epoch 2/5:  38%|███▊      | 72/189 [01:58<03:06,  1.60s/it, loss=11.61]Epoch 2/5:  39%|███▊      | 73/189 [01:58<03:04,  1.59s/it, loss=11.61]Epoch 2/5:  39%|███▊      | 73/189 [02:00<03:04,  1.59s/it, loss=11.45]Epoch 2/5:  39%|███▉      | 74/189 [02:00<02:57,  1.55s/it, loss=11.45]Epoch 2/5:  39%|███▉      | 74/189 [02:01<02:57,  1.55s/it, loss=11.39]Epoch 2/5:  40%|███▉      | 75/189 [02:01<02:54,  1.53s/it, loss=11.39]Epoch 2/5:  40%|███▉      | 75/189 [02:03<02:54,  1.53s/it, loss=11.38]Epoch 2/5:  40%|████      | 76/189 [02:03<02:53,  1.53s/it, loss=11.38]Epoch 2/5:  40%|████      | 76/189 [02:04<02:53,  1.53s/it, loss=11.39]Epoch 2/5:  41%|████      | 77/189 [02:04<02:53,  1.55s/it, loss=11.39]Epoch 2/5:  41%|████      | 77/189 [02:06<02:53,  1.55s/it, loss=11.31]Epoch 2/5:  41%|████▏     | 78/189 [02:06<02:52,  1.55s/it, loss=11.31]Epoch 2/5:  41%|████▏     | 78/189 [02:07<02:52,  1.55s/it, loss=11.40]Epoch 2/5:  42%|████▏     | 79/189 [02:07<02:51,  1.56s/it, loss=11.40]Epoch 2/5:  42%|████▏     | 79/189 [02:09<02:51,  1.56s/it, loss=11.34]Epoch 2/5:  42%|████▏     | 80/189 [02:09<02:45,  1.52s/it, loss=11.34]Epoch 2/5:  42%|████▏     | 80/189 [02:10<02:45,  1.52s/it, loss=11.21]Epoch 2/5:  43%|████▎     | 81/189 [02:10<02:44,  1.53s/it, loss=11.21]Epoch 2/5:  43%|████▎     | 81/189 [02:12<02:44,  1.53s/it, loss=11.26]Epoch 2/5:  43%|████▎     | 82/189 [02:12<02:47,  1.57s/it, loss=11.26]Epoch 2/5:  43%|████▎     | 82/189 [02:14<02:47,  1.57s/it, loss=11.34]Epoch 2/5:  44%|████▍     | 83/189 [02:14<02:51,  1.62s/it, loss=11.34]Epoch 2/5:  44%|████▍     | 83/189 [02:15<02:51,  1.62s/it, loss=11.28]Epoch 2/5:  44%|████▍     | 84/189 [02:15<02:48,  1.60s/it, loss=11.28]Epoch 2/5:  44%|████▍     | 84/189 [02:17<02:48,  1.60s/it, loss=11.10]Epoch 2/5:  45%|████▍     | 85/189 [02:17<02:46,  1.61s/it, loss=11.10]Epoch 2/5:  45%|████▍     | 85/189 [02:18<02:46,  1.61s/it, loss=10.87]Epoch 2/5:  46%|████▌     | 86/189 [02:18<02:41,  1.57s/it, loss=10.87]Epoch 2/5:  46%|████▌     | 86/189 [02:20<02:41,  1.57s/it, loss=11.29]Epoch 2/5:  46%|████▌     | 87/189 [02:20<02:37,  1.54s/it, loss=11.29]Epoch 2/5:  46%|████▌     | 87/189 [02:22<02:37,  1.54s/it, loss=11.03]Epoch 2/5:  47%|████▋     | 88/189 [02:22<02:38,  1.57s/it, loss=11.03]Epoch 2/5:  47%|████▋     | 88/189 [02:23<02:38,  1.57s/it, loss=11.04]Epoch 2/5:  47%|████▋     | 89/189 [02:23<02:35,  1.56s/it, loss=11.04]Epoch 2/5:  47%|████▋     | 89/189 [02:25<02:35,  1.56s/it, loss=11.05]Epoch 2/5:  48%|████▊     | 90/189 [02:25<02:36,  1.58s/it, loss=11.05]Epoch 2/5:  48%|████▊     | 90/189 [02:26<02:36,  1.58s/it, loss=11.04]Epoch 2/5:  48%|████▊     | 91/189 [02:26<02:30,  1.54s/it, loss=11.04]Epoch 2/5:  48%|████▊     | 91/189 [02:28<02:30,  1.54s/it, loss=11.02]Epoch 2/5:  49%|████▊     | 92/189 [02:28<02:27,  1.52s/it, loss=11.02]Epoch 2/5:  49%|████▊     | 92/189 [02:29<02:27,  1.52s/it, loss=10.98]Epoch 2/5:  49%|████▉     | 93/189 [02:29<02:28,  1.55s/it, loss=10.98]Epoch 2/5:  49%|████▉     | 93/189 [02:31<02:28,  1.55s/it, loss=11.15]Epoch 2/5:  50%|████▉     | 94/189 [02:31<02:26,  1.54s/it, loss=11.15]Epoch 2/5:  50%|████▉     | 94/189 [02:32<02:26,  1.54s/it, loss=11.03]Epoch 2/5:  50%|█████     | 95/189 [02:32<02:18,  1.47s/it, loss=11.03]Epoch 2/5:  50%|█████     | 95/189 [02:34<02:18,  1.47s/it, loss=10.97]Epoch 2/5:  51%|█████     | 96/189 [02:34<02:16,  1.47s/it, loss=10.97]Epoch 2/5:  51%|█████     | 96/189 [02:35<02:16,  1.47s/it, loss=10.85]Epoch 2/5:  51%|█████▏    | 97/189 [02:35<02:22,  1.55s/it, loss=10.85]Epoch 2/5:  51%|█████▏    | 97/189 [02:37<02:22,  1.55s/it, loss=10.86]Epoch 2/5:  52%|█████▏    | 98/189 [02:37<02:28,  1.63s/it, loss=10.86]Epoch 2/5:  52%|█████▏    | 98/189 [02:39<02:28,  1.63s/it, loss=10.85]Epoch 2/5:  52%|█████▏    | 99/189 [02:39<02:29,  1.67s/it, loss=10.85]Epoch 2/5:  52%|█████▏    | 99/189 [02:41<02:29,  1.67s/it, loss=10.84]Epoch 2/5:  53%|█████▎    | 100/189 [02:41<02:28,  1.67s/it, loss=10.84]Epoch 2/5:  53%|█████▎    | 100/189 [02:42<02:28,  1.67s/it, loss=10.78]Epoch 2/5:  53%|█████▎    | 101/189 [02:42<02:28,  1.68s/it, loss=10.78]Epoch 2/5:  53%|█████▎    | 101/189 [02:44<02:28,  1.68s/it, loss=10.84]Epoch 2/5:  54%|█████▍    | 102/189 [02:44<02:21,  1.63s/it, loss=10.84]Epoch 2/5:  54%|█████▍    | 102/189 [02:45<02:21,  1.63s/it, loss=10.75]Epoch 2/5:  54%|█████▍    | 103/189 [02:45<02:17,  1.59s/it, loss=10.75]Epoch 2/5:  54%|█████▍    | 103/189 [02:47<02:17,  1.59s/it, loss=10.67]Epoch 2/5:  55%|█████▌    | 104/189 [02:47<02:15,  1.60s/it, loss=10.67]Epoch 2/5:  55%|█████▌    | 104/189 [02:48<02:15,  1.60s/it, loss=10.66]Epoch 2/5:  56%|█████▌    | 105/189 [02:48<02:14,  1.60s/it, loss=10.66]Epoch 2/5:  56%|█████▌    | 105/189 [02:50<02:14,  1.60s/it, loss=10.69]Epoch 2/5:  56%|█████▌    | 106/189 [02:50<02:12,  1.60s/it, loss=10.69]Epoch 2/5:  56%|█████▌    | 106/189 [02:52<02:12,  1.60s/it, loss=10.86]Epoch 2/5:  57%|█████▋    | 107/189 [02:52<02:13,  1.62s/it, loss=10.86]Epoch 2/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.62s/it, loss=10.67]Epoch 2/5:  57%|█████▋    | 108/189 [02:53<02:10,  1.62s/it, loss=10.67]Epoch 2/5:  57%|█████▋    | 108/189 [02:55<02:10,  1.62s/it, loss=10.60]Epoch 2/5:  58%|█████▊    | 109/189 [02:55<02:08,  1.61s/it, loss=10.60]Epoch 2/5:  58%|█████▊    | 109/189 [02:57<02:08,  1.61s/it, loss=10.68]Epoch 2/5:  58%|█████▊    | 110/189 [02:57<02:10,  1.66s/it, loss=10.68]Epoch 2/5:  58%|█████▊    | 110/189 [02:58<02:10,  1.66s/it, loss=10.80]Epoch 2/5:  59%|█████▊    | 111/189 [02:58<02:05,  1.61s/it, loss=10.80]Epoch 2/5:  59%|█████▊    | 111/189 [03:00<02:05,  1.61s/it, loss=10.63]Epoch 2/5:  59%|█████▉    | 112/189 [03:00<02:02,  1.59s/it, loss=10.63]Epoch 2/5:  59%|█████▉    | 112/189 [03:01<02:02,  1.59s/it, loss=10.75]Epoch 2/5:  60%|█████▉    | 113/189 [03:01<02:00,  1.58s/it, loss=10.75]Epoch 2/5:  60%|█████▉    | 113/189 [03:03<02:00,  1.58s/it, loss=10.67]Epoch 2/5:  60%|██████    | 114/189 [03:03<02:00,  1.60s/it, loss=10.67]Epoch 2/5:  60%|██████    | 114/189 [03:05<02:00,  1.60s/it, loss=10.79]Epoch 2/5:  61%|██████    | 115/189 [03:05<01:58,  1.60s/it, loss=10.79]Epoch 2/5:  61%|██████    | 115/189 [03:06<01:58,  1.60s/it, loss=10.52]Epoch 2/5:  61%|██████▏   | 116/189 [03:06<01:53,  1.55s/it, loss=10.52]Epoch 2/5:  61%|██████▏   | 116/189 [03:08<01:53,  1.55s/it, loss=10.52]Epoch 2/5:  62%|██████▏   | 117/189 [03:08<01:52,  1.57s/it, loss=10.52]Epoch 2/5:  62%|██████▏   | 117/189 [03:09<01:52,  1.57s/it, loss=10.55]Epoch 2/5:  62%|██████▏   | 118/189 [03:09<01:53,  1.59s/it, loss=10.55]Epoch 2/5:  62%|██████▏   | 118/189 [03:11<01:53,  1.59s/it, loss=10.54]Epoch 2/5:  63%|██████▎   | 119/189 [03:11<01:51,  1.59s/it, loss=10.54]Epoch 2/5:  63%|██████▎   | 119/189 [03:12<01:51,  1.59s/it, loss=10.55]Epoch 2/5:  63%|██████▎   | 120/189 [03:12<01:49,  1.58s/it, loss=10.55]Epoch 2/5:  63%|██████▎   | 120/189 [03:14<01:49,  1.58s/it, loss=10.53]Epoch 2/5:  64%|██████▍   | 121/189 [03:14<01:48,  1.60s/it, loss=10.53]Epoch 2/5:  64%|██████▍   | 121/189 [03:16<01:48,  1.60s/it, loss=10.71]Epoch 2/5:  65%|██████▍   | 122/189 [03:16<01:47,  1.61s/it, loss=10.71]Epoch 2/5:  65%|██████▍   | 122/189 [03:17<01:47,  1.61s/it, loss=10.54]Epoch 2/5:  65%|██████▌   | 123/189 [03:17<01:45,  1.60s/it, loss=10.54]Epoch 2/5:  65%|██████▌   | 123/189 [03:19<01:45,  1.60s/it, loss=10.54]Epoch 2/5:  66%|██████▌   | 124/189 [03:19<01:43,  1.60s/it, loss=10.54]Epoch 2/5:  66%|██████▌   | 124/189 [03:20<01:43,  1.60s/it, loss=10.51]Epoch 2/5:  66%|██████▌   | 125/189 [03:20<01:41,  1.59s/it, loss=10.51]Epoch 2/5:  66%|██████▌   | 125/189 [03:22<01:41,  1.59s/it, loss=10.56]Epoch 2/5:  67%|██████▋   | 126/189 [03:22<01:41,  1.60s/it, loss=10.56]Epoch 2/5:  67%|██████▋   | 126/189 [03:24<01:41,  1.60s/it, loss=10.40]Epoch 2/5:  67%|██████▋   | 127/189 [03:24<01:40,  1.62s/it, loss=10.40]Epoch 2/5:  67%|██████▋   | 127/189 [03:25<01:40,  1.62s/it, loss=10.46]Epoch 2/5:  68%|██████▊   | 128/189 [03:25<01:40,  1.64s/it, loss=10.46]Epoch 2/5:  68%|██████▊   | 128/189 [03:27<01:40,  1.64s/it, loss=10.54]Epoch 2/5:  68%|██████▊   | 129/189 [03:27<01:36,  1.62s/it, loss=10.54]Epoch 2/5:  68%|██████▊   | 129/189 [03:29<01:36,  1.62s/it, loss=10.52]Epoch 2/5:  69%|██████▉   | 130/189 [03:29<01:35,  1.63s/it, loss=10.52]Epoch 2/5:  69%|██████▉   | 130/189 [03:30<01:35,  1.63s/it, loss=10.44]Epoch 2/5:  69%|██████▉   | 131/189 [03:30<01:35,  1.64s/it, loss=10.44]Epoch 2/5:  69%|██████▉   | 131/189 [03:32<01:35,  1.64s/it, loss=10.26]Epoch 2/5:  70%|██████▉   | 132/189 [03:32<01:31,  1.61s/it, loss=10.26]Epoch 2/5:  70%|██████▉   | 132/189 [03:33<01:31,  1.61s/it, loss=10.31]Epoch 2/5:  70%|███████   | 133/189 [03:33<01:29,  1.61s/it, loss=10.31]Epoch 2/5:  70%|███████   | 133/189 [03:35<01:29,  1.61s/it, loss=10.32]Epoch 2/5:  71%|███████   | 134/189 [03:35<01:27,  1.60s/it, loss=10.32]Epoch 2/5:  71%|███████   | 134/189 [03:37<01:27,  1.60s/it, loss=10.37]Epoch 2/5:  71%|███████▏  | 135/189 [03:37<01:25,  1.58s/it, loss=10.37]Epoch 2/5:  71%|███████▏  | 135/189 [03:38<01:25,  1.58s/it, loss=10.20]Epoch 2/5:  72%|███████▏  | 136/189 [03:38<01:20,  1.52s/it, loss=10.20]Epoch 2/5:  72%|███████▏  | 136/189 [03:39<01:20,  1.52s/it, loss=10.31]Epoch 2/5:  72%|███████▏  | 137/189 [03:39<01:20,  1.54s/it, loss=10.31]Epoch 2/5:  72%|███████▏  | 137/189 [03:41<01:20,  1.54s/it, loss=10.19]Epoch 2/5:  73%|███████▎  | 138/189 [03:41<01:20,  1.57s/it, loss=10.19]Epoch 2/5:  73%|███████▎  | 138/189 [03:43<01:20,  1.57s/it, loss=10.21]Epoch 2/5:  74%|███████▎  | 139/189 [03:43<01:16,  1.52s/it, loss=10.21]Epoch 2/5:  74%|███████▎  | 139/189 [03:44<01:16,  1.52s/it, loss=10.06]Epoch 2/5:  74%|███████▍  | 140/189 [03:44<01:17,  1.58s/it, loss=10.06]Epoch 2/5:  74%|███████▍  | 140/189 [03:46<01:17,  1.58s/it, loss=10.29]Epoch 2/5:  75%|███████▍  | 141/189 [03:46<01:14,  1.55s/it, loss=10.29]Epoch 2/5:  75%|███████▍  | 141/189 [03:47<01:14,  1.55s/it, loss=10.24]Epoch 2/5:  75%|███████▌  | 142/189 [03:47<01:12,  1.55s/it, loss=10.24]Epoch 2/5:  75%|███████▌  | 142/189 [03:49<01:12,  1.55s/it, loss=10.08]Epoch 2/5:  76%|███████▌  | 143/189 [03:49<01:11,  1.56s/it, loss=10.08]Epoch 2/5:  76%|███████▌  | 143/189 [03:50<01:11,  1.56s/it, loss=10.15]Epoch 2/5:  76%|███████▌  | 144/189 [03:50<01:09,  1.55s/it, loss=10.15]Epoch 2/5:  76%|███████▌  | 144/189 [03:52<01:09,  1.55s/it, loss=10.10]Epoch 2/5:  77%|███████▋  | 145/189 [03:52<01:06,  1.51s/it, loss=10.10]Epoch 2/5:  77%|███████▋  | 145/189 [03:53<01:06,  1.51s/it, loss=10.13]Epoch 2/5:  77%|███████▋  | 146/189 [03:53<01:07,  1.56s/it, loss=10.13]Epoch 2/5:  77%|███████▋  | 146/189 [03:55<01:07,  1.56s/it, loss=10.02]Epoch 2/5:  78%|███████▊  | 147/189 [03:55<01:05,  1.55s/it, loss=10.02]Epoch 2/5:  78%|███████▊  | 147/189 [03:57<01:05,  1.55s/it, loss=10.05]Epoch 2/5:  78%|███████▊  | 148/189 [03:57<01:03,  1.56s/it, loss=10.05]Epoch 2/5:  78%|███████▊  | 148/189 [03:58<01:03,  1.56s/it, loss=9.91] Epoch 2/5:  79%|███████▉  | 149/189 [03:58<01:01,  1.54s/it, loss=9.91]Epoch 2/5:  79%|███████▉  | 149/189 [04:00<01:01,  1.54s/it, loss=10.09]Epoch 2/5:  79%|███████▉  | 150/189 [04:00<01:01,  1.57s/it, loss=10.09]Epoch 2/5:  79%|███████▉  | 150/189 [04:01<01:01,  1.57s/it, loss=10.13]Epoch 2/5:  80%|███████▉  | 151/189 [04:01<00:59,  1.56s/it, loss=10.13]Epoch 2/5:  80%|███████▉  | 151/189 [04:03<00:59,  1.56s/it, loss=10.09]Epoch 2/5:  80%|████████  | 152/189 [04:03<00:58,  1.59s/it, loss=10.09]Epoch 2/5:  80%|████████  | 152/189 [04:05<00:58,  1.59s/it, loss=10.00]Epoch 2/5:  81%|████████  | 153/189 [04:05<00:58,  1.63s/it, loss=10.00]Epoch 2/5:  81%|████████  | 153/189 [04:06<00:58,  1.63s/it, loss=10.14]Epoch 2/5:  81%|████████▏ | 154/189 [04:06<00:56,  1.62s/it, loss=10.14]Epoch 2/5:  81%|████████▏ | 154/189 [04:08<00:56,  1.62s/it, loss=10.18]Epoch 2/5:  82%|████████▏ | 155/189 [04:08<00:55,  1.64s/it, loss=10.18]Epoch 2/5:  82%|████████▏ | 155/189 [04:09<00:55,  1.64s/it, loss=10.18]Epoch 2/5:  83%|████████▎ | 156/189 [04:09<00:52,  1.60s/it, loss=10.18]Epoch 2/5:  83%|████████▎ | 156/189 [04:11<00:52,  1.60s/it, loss=9.97] Epoch 2/5:  83%|████████▎ | 157/189 [04:11<00:51,  1.61s/it, loss=9.97]Epoch 2/5:  83%|████████▎ | 157/189 [04:13<00:51,  1.61s/it, loss=9.97]Epoch 2/5:  84%|████████▎ | 158/189 [04:13<00:49,  1.60s/it, loss=9.97]Epoch 2/5:  84%|████████▎ | 158/189 [04:14<00:49,  1.60s/it, loss=9.90]Epoch 2/5:  84%|████████▍ | 159/189 [04:14<00:47,  1.58s/it, loss=9.90]Epoch 2/5:  84%|████████▍ | 159/189 [04:16<00:47,  1.58s/it, loss=10.04]Epoch 2/5:  85%|████████▍ | 160/189 [04:16<00:46,  1.60s/it, loss=10.04]Epoch 2/5:  85%|████████▍ | 160/189 [04:17<00:46,  1.60s/it, loss=10.13]Epoch 2/5:  85%|████████▌ | 161/189 [04:17<00:44,  1.60s/it, loss=10.13]Epoch 2/5:  85%|████████▌ | 161/189 [04:19<00:44,  1.60s/it, loss=9.84] Epoch 2/5:  86%|████████▌ | 162/189 [04:19<00:44,  1.66s/it, loss=9.84]Epoch 2/5:  86%|████████▌ | 162/189 [04:21<00:44,  1.66s/it, loss=10.01]Epoch 2/5:  86%|████████▌ | 163/189 [04:21<00:43,  1.66s/it, loss=10.01]Epoch 2/5:  86%|████████▌ | 163/189 [04:22<00:43,  1.66s/it, loss=9.88] Epoch 2/5:  87%|████████▋ | 164/189 [04:22<00:41,  1.64s/it, loss=9.88]Epoch 2/5:  87%|████████▋ | 164/189 [04:24<00:41,  1.64s/it, loss=9.91]Epoch 2/5:  87%|████████▋ | 165/189 [04:24<00:37,  1.55s/it, loss=9.91]Epoch 2/5:  87%|████████▋ | 165/189 [04:25<00:37,  1.55s/it, loss=9.69]Epoch 2/5:  88%|████████▊ | 166/189 [04:26<00:36,  1.59s/it, loss=9.69]Epoch 2/5:  88%|████████▊ | 166/189 [04:27<00:36,  1.59s/it, loss=9.80]Epoch 2/5:  88%|████████▊ | 167/189 [04:27<00:35,  1.63s/it, loss=9.80]Epoch 2/5:  88%|████████▊ | 167/189 [04:29<00:35,  1.63s/it, loss=9.80]Epoch 2/5:  89%|████████▉ | 168/189 [04:29<00:35,  1.67s/it, loss=9.80]Epoch 2/5:  89%|████████▉ | 168/189 [04:31<00:35,  1.67s/it, loss=9.77]Epoch 2/5:  89%|████████▉ | 169/189 [04:31<00:33,  1.67s/it, loss=9.77]Epoch 2/5:  89%|████████▉ | 169/189 [04:32<00:33,  1.67s/it, loss=9.89]Epoch 2/5:  90%|████████▉ | 170/189 [04:32<00:29,  1.57s/it, loss=9.89]Epoch 2/5:  90%|████████▉ | 170/189 [04:34<00:29,  1.57s/it, loss=9.91]Epoch 2/5:  90%|█████████ | 171/189 [04:34<00:28,  1.56s/it, loss=9.91]Epoch 2/5:  90%|█████████ | 171/189 [04:35<00:28,  1.56s/it, loss=9.86]Epoch 2/5:  91%|█████████ | 172/189 [04:35<00:27,  1.62s/it, loss=9.86]Epoch 2/5:  91%|█████████ | 172/189 [04:37<00:27,  1.62s/it, loss=9.86]Epoch 2/5:  92%|█████████▏| 173/189 [04:37<00:25,  1.58s/it, loss=9.86]Epoch 2/5:  92%|█████████▏| 173/189 [04:38<00:25,  1.58s/it, loss=9.79]Epoch 2/5:  92%|█████████▏| 174/189 [04:38<00:24,  1.61s/it, loss=9.79]Epoch 2/5:  92%|█████████▏| 174/189 [04:40<00:24,  1.61s/it, loss=9.75]Epoch 2/5:  93%|█████████▎| 175/189 [04:40<00:22,  1.64s/it, loss=9.75]Epoch 2/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.64s/it, loss=9.79]Epoch 2/5:  93%|█████████▎| 176/189 [04:42<00:20,  1.60s/it, loss=9.79]Epoch 2/5:  93%|█████████▎| 176/189 [04:43<00:20,  1.60s/it, loss=9.85]Epoch 2/5:  94%|█████████▎| 177/189 [04:43<00:19,  1.61s/it, loss=9.85]Epoch 2/5:  94%|█████████▎| 177/189 [04:45<00:19,  1.61s/it, loss=9.70]Epoch 2/5:  94%|█████████▍| 178/189 [04:45<00:16,  1.54s/it, loss=9.70]Epoch 2/5:  94%|█████████▍| 178/189 [04:46<00:16,  1.54s/it, loss=9.86]Epoch 2/5:  95%|█████████▍| 179/189 [04:46<00:16,  1.60s/it, loss=9.86]Epoch 2/5:  95%|█████████▍| 179/189 [04:48<00:16,  1.60s/it, loss=9.78]Epoch 2/5:  95%|█████████▌| 180/189 [04:48<00:14,  1.64s/it, loss=9.78]Epoch 2/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.64s/it, loss=9.79]Epoch 2/5:  96%|█████████▌| 181/189 [04:50<00:13,  1.64s/it, loss=9.79]Epoch 2/5:  96%|█████████▌| 181/189 [04:51<00:13,  1.64s/it, loss=9.75]Epoch 2/5:  96%|█████████▋| 182/189 [04:51<00:11,  1.58s/it, loss=9.75]Epoch 2/5:  96%|█████████▋| 182/189 [04:53<00:11,  1.58s/it, loss=9.74]Epoch 2/5:  97%|█████████▋| 183/189 [04:53<00:09,  1.52s/it, loss=9.74]Epoch 2/5:  97%|█████████▋| 183/189 [04:54<00:09,  1.52s/it, loss=9.68]Epoch 2/5:  97%|█████████▋| 184/189 [04:54<00:07,  1.58s/it, loss=9.68]Epoch 2/5:  97%|█████████▋| 184/189 [04:56<00:07,  1.58s/it, loss=9.75]Epoch 2/5:  98%|█████████▊| 185/189 [04:56<00:06,  1.63s/it, loss=9.75]Epoch 2/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.63s/it, loss=9.68]Epoch 2/5:  98%|█████████▊| 186/189 [04:58<00:04,  1.66s/it, loss=9.68]Epoch 2/5:  98%|█████████▊| 186/189 [05:00<00:04,  1.66s/it, loss=9.66]Epoch 2/5:  99%|█████████▉| 187/189 [05:00<00:03,  1.68s/it, loss=9.66]Epoch 2/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.68s/it, loss=9.63]Epoch 2/5:  99%|█████████▉| 188/189 [05:01<00:01,  1.63s/it, loss=9.63]Epoch 2/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.63s/it, loss=9.46]Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00,  1.61s/it, loss=9.46]Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00,  1.60s/it, loss=9.46]
  0%|          | 0/23 [00:00<?, ?it/s]  4%|▍         | 1/23 [00:00<00:08,  2.53it/s]  9%|▊         | 2/23 [00:00<00:08,  2.54it/s] 13%|█▎        | 3/23 [00:01<00:07,  2.62it/s] 17%|█▋        | 4/23 [00:01<00:06,  2.95it/s] 22%|██▏       | 5/23 [00:01<00:06,  2.85it/s] 26%|██▌       | 6/23 [00:02<00:06,  2.82it/s] 30%|███       | 7/23 [00:02<00:05,  2.84it/s] 35%|███▍      | 8/23 [00:02<00:05,  2.98it/s] 39%|███▉      | 9/23 [00:03<00:04,  2.84it/s] 43%|████▎     | 10/23 [00:03<00:04,  2.79it/s] 48%|████▊     | 11/23 [00:03<00:04,  2.80it/s] 52%|█████▏    | 12/23 [00:04<00:03,  2.77it/s] 57%|█████▋    | 13/23 [00:04<00:03,  2.67it/s] 61%|██████    | 14/23 [00:04<00:03,  2.86it/s] 65%|██████▌   | 15/23 [00:05<00:02,  3.00it/s] 70%|██████▉   | 16/23 [00:05<00:02,  2.87it/s] 74%|███████▍  | 17/23 [00:06<00:02,  2.81it/s] 78%|███████▊  | 18/23 [00:06<00:01,  2.95it/s] 83%|████████▎ | 19/23 [00:06<00:01,  2.89it/s] 87%|████████▋ | 20/23 [00:07<00:01,  2.98it/s] 91%|█████████▏| 21/23 [00:07<00:00,  2.99it/s] 96%|█████████▌| 22/23 [00:07<00:00,  3.13it/s]100%|██████████| 23/23 [00:07<00:00,  3.31it/s]100%|██████████| 23/23 [00:07<00:00,  2.91it/s]

Epoch 2: train_loss=11.1752 | R@10=0.0129 | DCG@10=0.1115 | NDCG@10=0.0265
Epoch 3/5:   0%|          | 0/189 [00:00<?, ?it/s]Epoch 3/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=9.65]Epoch 3/5:   1%|          | 1/189 [00:01<05:10,  1.65s/it, loss=9.65]Epoch 3/5:   1%|          | 1/189 [00:03<05:10,  1.65s/it, loss=9.52]Epoch 3/5:   1%|          | 2/189 [00:03<05:03,  1.63s/it, loss=9.52]Epoch 3/5:   1%|          | 2/189 [00:04<05:03,  1.63s/it, loss=9.52]Epoch 3/5:   2%|▏         | 3/189 [00:04<04:59,  1.61s/it, loss=9.52]Epoch 3/5:   2%|▏         | 3/189 [00:06<04:59,  1.61s/it, loss=9.53]Epoch 3/5:   2%|▏         | 4/189 [00:06<04:55,  1.60s/it, loss=9.53]Epoch 3/5:   2%|▏         | 4/189 [00:07<04:55,  1.60s/it, loss=9.53]Epoch 3/5:   3%|▎         | 5/189 [00:07<04:47,  1.56s/it, loss=9.53]Epoch 3/5:   3%|▎         | 5/189 [00:09<04:47,  1.56s/it, loss=9.52]Epoch 3/5:   3%|▎         | 6/189 [00:09<04:48,  1.58s/it, loss=9.52]Epoch 3/5:   3%|▎         | 6/189 [00:11<04:48,  1.58s/it, loss=9.50]Epoch 3/5:   4%|▎         | 7/189 [00:11<04:58,  1.64s/it, loss=9.50]Epoch 3/5:   4%|▎         | 7/189 [00:12<04:58,  1.64s/it, loss=9.53]Epoch 3/5:   4%|▍         | 8/189 [00:12<04:54,  1.63s/it, loss=9.53]Epoch 3/5:   4%|▍         | 8/189 [00:14<04:54,  1.63s/it, loss=9.64]Epoch 3/5:   5%|▍         | 9/189 [00:14<04:49,  1.61s/it, loss=9.64]Epoch 3/5:   5%|▍         | 9/189 [00:16<04:49,  1.61s/it, loss=9.59]Epoch 3/5:   5%|▌         | 10/189 [00:16<04:45,  1.59s/it, loss=9.59]Epoch 3/5:   5%|▌         | 10/189 [00:17<04:45,  1.59s/it, loss=9.43]Epoch 3/5:   6%|▌         | 11/189 [00:17<04:50,  1.63s/it, loss=9.43]Epoch 3/5:   6%|▌         | 11/189 [00:19<04:50,  1.63s/it, loss=9.32]Epoch 3/5:   6%|▋         | 12/189 [00:19<04:45,  1.61s/it, loss=9.32]Epoch 3/5:   6%|▋         | 12/189 [00:20<04:45,  1.61s/it, loss=9.40]Epoch 3/5:   7%|▋         | 13/189 [00:20<04:44,  1.61s/it, loss=9.40]Epoch 3/5:   7%|▋         | 13/189 [00:22<04:44,  1.61s/it, loss=9.55]Epoch 3/5:   7%|▋         | 14/189 [00:22<04:42,  1.61s/it, loss=9.55]Epoch 3/5:   7%|▋         | 14/189 [00:24<04:42,  1.61s/it, loss=9.57]Epoch 3/5:   8%|▊         | 15/189 [00:24<04:41,  1.62s/it, loss=9.57]Epoch 3/5:   8%|▊         | 15/189 [00:25<04:41,  1.62s/it, loss=9.39]Epoch 3/5:   8%|▊         | 16/189 [00:25<04:35,  1.59s/it, loss=9.39]Epoch 3/5:   8%|▊         | 16/189 [00:27<04:35,  1.59s/it, loss=9.52]Epoch 3/5:   9%|▉         | 17/189 [00:27<04:22,  1.53s/it, loss=9.52]Epoch 3/5:   9%|▉         | 17/189 [00:28<04:22,  1.53s/it, loss=9.48]Epoch 3/5:  10%|▉         | 18/189 [00:28<04:23,  1.54s/it, loss=9.48]Epoch 3/5:  10%|▉         | 18/189 [00:30<04:23,  1.54s/it, loss=9.33]Epoch 3/5:  10%|█         | 19/189 [00:30<04:30,  1.59s/it, loss=9.33]Epoch 3/5:  10%|█         | 19/189 [00:32<04:30,  1.59s/it, loss=9.42]Epoch 3/5:  11%|█         | 20/189 [00:32<04:33,  1.62s/it, loss=9.42]Epoch 3/5:  11%|█         | 20/189 [00:33<04:33,  1.62s/it, loss=9.43]Epoch 3/5:  11%|█         | 21/189 [00:33<04:37,  1.65s/it, loss=9.43]Epoch 3/5:  11%|█         | 21/189 [00:35<04:37,  1.65s/it, loss=9.48]Epoch 3/5:  12%|█▏        | 22/189 [00:35<04:22,  1.57s/it, loss=9.48]Epoch 3/5:  12%|█▏        | 22/189 [00:36<04:22,  1.57s/it, loss=9.58]Epoch 3/5:  12%|█▏        | 23/189 [00:36<04:14,  1.53s/it, loss=9.58]Epoch 3/5:  12%|█▏        | 23/189 [00:38<04:14,  1.53s/it, loss=9.31]Epoch 3/5:  13%|█▎        | 24/189 [00:38<04:20,  1.58s/it, loss=9.31]Epoch 3/5:  13%|█▎        | 24/189 [00:40<04:20,  1.58s/it, loss=9.34]Epoch 3/5:  13%|█▎        | 25/189 [00:40<04:26,  1.62s/it, loss=9.34]Epoch 3/5:  13%|█▎        | 25/189 [00:41<04:26,  1.62s/it, loss=9.42]Epoch 3/5:  14%|█▍        | 26/189 [00:41<04:24,  1.62s/it, loss=9.42]Epoch 3/5:  14%|█▍        | 26/189 [00:43<04:24,  1.62s/it, loss=9.40]Epoch 3/5:  14%|█▍        | 27/189 [00:43<04:23,  1.63s/it, loss=9.40]Epoch 3/5:  14%|█▍        | 27/189 [00:44<04:23,  1.63s/it, loss=9.49]Epoch 3/5:  15%|█▍        | 28/189 [00:44<04:24,  1.64s/it, loss=9.49]Epoch 3/5:  15%|█▍        | 28/189 [00:46<04:24,  1.64s/it, loss=9.20]Epoch 3/5:  15%|█▌        | 29/189 [00:46<04:21,  1.63s/it, loss=9.20]Epoch 3/5:  15%|█▌        | 29/189 [00:48<04:21,  1.63s/it, loss=9.21]Epoch 3/5:  16%|█▌        | 30/189 [00:48<04:19,  1.63s/it, loss=9.21]Epoch 3/5:  16%|█▌        | 30/189 [00:49<04:19,  1.63s/it, loss=9.41]Epoch 3/5:  16%|█▋        | 31/189 [00:49<04:13,  1.61s/it, loss=9.41]Epoch 3/5:  16%|█▋        | 31/189 [00:51<04:13,  1.61s/it, loss=9.42]Epoch 3/5:  17%|█▋        | 32/189 [00:51<04:09,  1.59s/it, loss=9.42]Epoch 3/5:  17%|█▋        | 32/189 [00:52<04:09,  1.59s/it, loss=9.21]Epoch 3/5:  17%|█▋        | 33/189 [00:52<04:07,  1.59s/it, loss=9.21]Epoch 3/5:  17%|█▋        | 33/189 [00:54<04:07,  1.59s/it, loss=9.26]Epoch 3/5:  18%|█▊        | 34/189 [00:54<03:58,  1.54s/it, loss=9.26]Epoch 3/5:  18%|█▊        | 34/189 [00:55<03:58,  1.54s/it, loss=9.36]Epoch 3/5:  19%|█▊        | 35/189 [00:55<04:03,  1.58s/it, loss=9.36]Epoch 3/5:  19%|█▊        | 35/189 [00:57<04:03,  1.58s/it, loss=9.28]Epoch 3/5:  19%|█▉        | 36/189 [00:57<03:59,  1.57s/it, loss=9.28]Epoch 3/5:  19%|█▉        | 36/189 [00:59<03:59,  1.57s/it, loss=9.32]Epoch 3/5:  20%|█▉        | 37/189 [00:59<03:58,  1.57s/it, loss=9.32]Epoch 3/5:  20%|█▉        | 37/189 [01:00<03:58,  1.57s/it, loss=9.22]Epoch 3/5:  20%|██        | 38/189 [01:00<03:54,  1.55s/it, loss=9.22]Epoch 3/5:  20%|██        | 38/189 [01:02<03:54,  1.55s/it, loss=9.31]Epoch 3/5:  21%|██        | 39/189 [01:02<03:56,  1.58s/it, loss=9.31]Epoch 3/5:  21%|██        | 39/189 [01:03<03:56,  1.58s/it, loss=9.29]Epoch 3/5:  21%|██        | 40/189 [01:03<04:00,  1.62s/it, loss=9.29]Epoch 3/5:  21%|██        | 40/189 [01:05<04:00,  1.62s/it, loss=8.99]Epoch 3/5:  22%|██▏       | 41/189 [01:05<03:58,  1.61s/it, loss=8.99]Epoch 3/5:  22%|██▏       | 41/189 [01:07<03:58,  1.61s/it, loss=9.22]Epoch 3/5:  22%|██▏       | 42/189 [01:07<04:01,  1.65s/it, loss=9.22]Epoch 3/5:  22%|██▏       | 42/189 [01:08<04:01,  1.65s/it, loss=9.40]Epoch 3/5:  23%|██▎       | 43/189 [01:08<04:02,  1.66s/it, loss=9.40]Epoch 3/5:  23%|██▎       | 43/189 [01:10<04:02,  1.66s/it, loss=9.22]Epoch 3/5:  23%|██▎       | 44/189 [01:10<03:57,  1.64s/it, loss=9.22]Epoch 3/5:  23%|██▎       | 44/189 [01:12<03:57,  1.64s/it, loss=9.08]Epoch 3/5:  24%|██▍       | 45/189 [01:12<03:53,  1.62s/it, loss=9.08]Epoch 3/5:  24%|██▍       | 45/189 [01:13<03:53,  1.62s/it, loss=9.07]Epoch 3/5:  24%|██▍       | 46/189 [01:13<03:50,  1.61s/it, loss=9.07]Epoch 3/5:  24%|██▍       | 46/189 [01:15<03:50,  1.61s/it, loss=9.15]Epoch 3/5:  25%|██▍       | 47/189 [01:15<03:48,  1.61s/it, loss=9.15]Epoch 3/5:  25%|██▍       | 47/189 [01:17<03:48,  1.61s/it, loss=9.08]Epoch 3/5:  25%|██▌       | 48/189 [01:17<03:51,  1.64s/it, loss=9.08]Epoch 3/5:  25%|██▌       | 48/189 [01:18<03:51,  1.64s/it, loss=9.02]Epoch 3/5:  26%|██▌       | 49/189 [01:18<03:53,  1.67s/it, loss=9.02]Epoch 3/5:  26%|██▌       | 49/189 [01:20<03:53,  1.67s/it, loss=9.21]Epoch 3/5:  26%|██▋       | 50/189 [01:20<03:46,  1.63s/it, loss=9.21]Epoch 3/5:  26%|██▋       | 50/189 [01:22<03:46,  1.63s/it, loss=9.07]Epoch 3/5:  27%|██▋       | 51/189 [01:22<03:47,  1.65s/it, loss=9.07]Epoch 3/5:  27%|██▋       | 51/189 [01:23<03:47,  1.65s/it, loss=9.21]Epoch 3/5:  28%|██▊       | 52/189 [01:23<03:46,  1.65s/it, loss=9.21]Epoch 3/5:  28%|██▊       | 52/189 [01:25<03:46,  1.65s/it, loss=9.07]Epoch 3/5:  28%|██▊       | 53/189 [01:25<03:49,  1.69s/it, loss=9.07]Epoch 3/5:  28%|██▊       | 53/189 [01:27<03:49,  1.69s/it, loss=8.92]Epoch 3/5:  29%|██▊       | 54/189 [01:27<03:47,  1.68s/it, loss=8.92]Epoch 3/5:  29%|██▊       | 54/189 [01:28<03:47,  1.68s/it, loss=8.94]Epoch 3/5:  29%|██▉       | 55/189 [01:28<03:43,  1.67s/it, loss=8.94]Epoch 3/5:  29%|██▉       | 55/189 [01:30<03:43,  1.67s/it, loss=9.11]Epoch 3/5:  30%|██▉       | 56/189 [01:30<03:42,  1.67s/it, loss=9.11]Epoch 3/5:  30%|██▉       | 56/189 [01:32<03:42,  1.67s/it, loss=8.98]Epoch 3/5:  30%|███       | 57/189 [01:32<03:43,  1.69s/it, loss=8.98]Epoch 3/5:  30%|███       | 57/189 [01:33<03:43,  1.69s/it, loss=9.16]Epoch 3/5:  31%|███       | 58/189 [01:33<03:41,  1.69s/it, loss=9.16]Epoch 3/5:  31%|███       | 58/189 [01:35<03:41,  1.69s/it, loss=9.05]Epoch 3/5:  31%|███       | 59/189 [01:35<03:45,  1.73s/it, loss=9.05]Epoch 3/5:  31%|███       | 59/189 [01:37<03:45,  1.73s/it, loss=9.19]Epoch 3/5:  32%|███▏      | 60/189 [01:37<03:33,  1.65s/it, loss=9.19]Epoch 3/5:  32%|███▏      | 60/189 [01:38<03:33,  1.65s/it, loss=9.11]Epoch 3/5:  32%|███▏      | 61/189 [01:38<03:30,  1.64s/it, loss=9.11]Epoch 3/5:  32%|███▏      | 61/189 [01:40<03:30,  1.64s/it, loss=8.92]Epoch 3/5:  33%|███▎      | 62/189 [01:40<03:20,  1.58s/it, loss=8.92]Epoch 3/5:  33%|███▎      | 62/189 [01:41<03:20,  1.58s/it, loss=8.90]Epoch 3/5:  33%|███▎      | 63/189 [01:41<03:24,  1.62s/it, loss=8.90]Epoch 3/5:  33%|███▎      | 63/189 [01:43<03:24,  1.62s/it, loss=9.07]Epoch 3/5:  34%|███▍      | 64/189 [01:43<03:26,  1.65s/it, loss=9.07]Epoch 3/5:  34%|███▍      | 64/189 [01:45<03:26,  1.65s/it, loss=8.88]Epoch 3/5:  34%|███▍      | 65/189 [01:45<03:21,  1.63s/it, loss=8.88]Epoch 3/5:  34%|███▍      | 65/189 [01:46<03:21,  1.63s/it, loss=9.10]Epoch 3/5:  35%|███▍      | 66/189 [01:46<03:21,  1.64s/it, loss=9.10]Epoch 3/5:  35%|███▍      | 66/189 [01:48<03:21,  1.64s/it, loss=8.83]Epoch 3/5:  35%|███▌      | 67/189 [01:48<03:09,  1.55s/it, loss=8.83]Epoch 3/5:  35%|███▌      | 67/189 [01:49<03:09,  1.55s/it, loss=8.89]Epoch 3/5:  36%|███▌      | 68/189 [01:49<03:09,  1.57s/it, loss=8.89]Epoch 3/5:  36%|███▌      | 68/189 [01:51<03:09,  1.57s/it, loss=9.25]Epoch 3/5:  37%|███▋      | 69/189 [01:51<03:07,  1.57s/it, loss=9.25]Epoch 3/5:  37%|███▋      | 69/189 [01:53<03:07,  1.57s/it, loss=9.03]Epoch 3/5:  37%|███▋      | 70/189 [01:53<03:11,  1.61s/it, loss=9.03]Epoch 3/5:  37%|███▋      | 70/189 [01:54<03:11,  1.61s/it, loss=9.02]Epoch 3/5:  38%|███▊      | 71/189 [01:54<03:10,  1.61s/it, loss=9.02]Epoch 3/5:  38%|███▊      | 71/189 [01:56<03:10,  1.61s/it, loss=8.99]Epoch 3/5:  38%|███▊      | 72/189 [01:56<03:05,  1.58s/it, loss=8.99]Epoch 3/5:  38%|███▊      | 72/189 [01:57<03:05,  1.58s/it, loss=9.13]Epoch 3/5:  39%|███▊      | 73/189 [01:57<03:06,  1.61s/it, loss=9.13]Epoch 3/5:  39%|███▊      | 73/189 [01:59<03:06,  1.61s/it, loss=8.98]Epoch 3/5:  39%|███▉      | 74/189 [01:59<03:03,  1.59s/it, loss=8.98]Epoch 3/5:  39%|███▉      | 74/189 [02:01<03:03,  1.59s/it, loss=8.89]Epoch 3/5:  40%|███▉      | 75/189 [02:01<03:06,  1.64s/it, loss=8.89]Epoch 3/5:  40%|███▉      | 75/189 [02:02<03:06,  1.64s/it, loss=8.95]Epoch 3/5:  40%|████      | 76/189 [02:02<03:08,  1.67s/it, loss=8.95]Epoch 3/5:  40%|████      | 76/189 [02:04<03:08,  1.67s/it, loss=8.99]Epoch 3/5:  41%|████      | 77/189 [02:04<03:09,  1.69s/it, loss=8.99]Epoch 3/5:  41%|████      | 77/189 [02:06<03:09,  1.69s/it, loss=8.89]Epoch 3/5:  41%|████▏     | 78/189 [02:06<03:03,  1.65s/it, loss=8.89]Epoch 3/5:  41%|████▏     | 78/189 [02:07<03:03,  1.65s/it, loss=8.82]Epoch 3/5:  42%|████▏     | 79/189 [02:07<03:00,  1.64s/it, loss=8.82]Epoch 3/5:  42%|████▏     | 79/189 [02:09<03:00,  1.64s/it, loss=8.82]Epoch 3/5:  42%|████▏     | 80/189 [02:09<03:01,  1.67s/it, loss=8.82]Epoch 3/5:  42%|████▏     | 80/189 [02:11<03:01,  1.67s/it, loss=8.84]Epoch 3/5:  43%|████▎     | 81/189 [02:11<02:56,  1.63s/it, loss=8.84]Epoch 3/5:  43%|████▎     | 81/189 [02:12<02:56,  1.63s/it, loss=8.93]Epoch 3/5:  43%|████▎     | 82/189 [02:12<02:47,  1.56s/it, loss=8.93]Epoch 3/5:  43%|████▎     | 82/189 [02:14<02:47,  1.56s/it, loss=9.03]Epoch 3/5:  44%|████▍     | 83/189 [02:14<02:48,  1.59s/it, loss=9.03]Epoch 3/5:  44%|████▍     | 83/189 [02:15<02:48,  1.59s/it, loss=8.86]Epoch 3/5:  44%|████▍     | 84/189 [02:15<02:47,  1.60s/it, loss=8.86]Epoch 3/5:  44%|████▍     | 84/189 [02:17<02:47,  1.60s/it, loss=8.88]Epoch 3/5:  45%|████▍     | 85/189 [02:17<02:48,  1.62s/it, loss=8.88]Epoch 3/5:  45%|████▍     | 85/189 [02:19<02:48,  1.62s/it, loss=8.93]Epoch 3/5:  46%|████▌     | 86/189 [02:19<02:46,  1.61s/it, loss=8.93]Epoch 3/5:  46%|████▌     | 86/189 [02:20<02:46,  1.61s/it, loss=8.93]Epoch 3/5:  46%|████▌     | 87/189 [02:20<02:46,  1.63s/it, loss=8.93]Epoch 3/5:  46%|████▌     | 87/189 [02:22<02:46,  1.63s/it, loss=8.96]Epoch 3/5:  47%|████▋     | 88/189 [02:22<02:41,  1.60s/it, loss=8.96]Epoch 3/5:  47%|████▋     | 88/189 [02:24<02:41,  1.60s/it, loss=8.91]Epoch 3/5:  47%|████▋     | 89/189 [02:24<02:43,  1.64s/it, loss=8.91]Epoch 3/5:  47%|████▋     | 89/189 [02:25<02:43,  1.64s/it, loss=8.89]Epoch 3/5:  48%|████▊     | 90/189 [02:25<02:37,  1.59s/it, loss=8.89]Epoch 3/5:  48%|████▊     | 90/189 [02:27<02:37,  1.59s/it, loss=8.83]Epoch 3/5:  48%|████▊     | 91/189 [02:27<02:36,  1.59s/it, loss=8.83]Epoch 3/5:  48%|████▊     | 91/189 [02:28<02:36,  1.59s/it, loss=8.94]Epoch 3/5:  49%|████▊     | 92/189 [02:28<02:32,  1.58s/it, loss=8.94]Epoch 3/5:  49%|████▊     | 92/189 [02:30<02:32,  1.58s/it, loss=8.79]Epoch 3/5:  49%|████▉     | 93/189 [02:30<02:29,  1.56s/it, loss=8.79]Epoch 3/5:  49%|████▉     | 93/189 [02:31<02:29,  1.56s/it, loss=8.89]Epoch 3/5:  50%|████▉     | 94/189 [02:31<02:29,  1.58s/it, loss=8.89]Epoch 3/5:  50%|████▉     | 94/189 [02:33<02:29,  1.58s/it, loss=8.75]Epoch 3/5:  50%|█████     | 95/189 [02:33<02:26,  1.56s/it, loss=8.75]Epoch 3/5:  50%|█████     | 95/189 [02:34<02:26,  1.56s/it, loss=8.76]Epoch 3/5:  51%|█████     | 96/189 [02:34<02:27,  1.59s/it, loss=8.76]Epoch 3/5:  51%|█████     | 96/189 [02:36<02:27,  1.59s/it, loss=8.77]Epoch 3/5:  51%|█████▏    | 97/189 [02:36<02:26,  1.59s/it, loss=8.77]Epoch 3/5:  51%|█████▏    | 97/189 [02:38<02:26,  1.59s/it, loss=8.86]Epoch 3/5:  52%|█████▏    | 98/189 [02:38<02:24,  1.59s/it, loss=8.86]Epoch 3/5:  52%|█████▏    | 98/189 [02:39<02:24,  1.59s/it, loss=8.92]Epoch 3/5:  52%|█████▏    | 99/189 [02:39<02:25,  1.62s/it, loss=8.92]Epoch 3/5:  52%|█████▏    | 99/189 [02:41<02:25,  1.62s/it, loss=8.67]Epoch 3/5:  53%|█████▎    | 100/189 [02:41<02:24,  1.62s/it, loss=8.67]Epoch 3/5:  53%|█████▎    | 100/189 [02:43<02:24,  1.62s/it, loss=8.75]Epoch 3/5:  53%|█████▎    | 101/189 [02:43<02:24,  1.64s/it, loss=8.75]Epoch 3/5:  53%|█████▎    | 101/189 [02:44<02:24,  1.64s/it, loss=8.76]Epoch 3/5:  54%|█████▍    | 102/189 [02:44<02:25,  1.67s/it, loss=8.76]Epoch 3/5:  54%|█████▍    | 102/189 [02:46<02:25,  1.67s/it, loss=8.76]Epoch 3/5:  54%|█████▍    | 103/189 [02:46<02:21,  1.65s/it, loss=8.76]Epoch 3/5:  54%|█████▍    | 103/189 [02:48<02:21,  1.65s/it, loss=8.65]Epoch 3/5:  55%|█████▌    | 104/189 [02:48<02:20,  1.65s/it, loss=8.65]Epoch 3/5:  55%|█████▌    | 104/189 [02:49<02:20,  1.65s/it, loss=8.71]Epoch 3/5:  56%|█████▌    | 105/189 [02:49<02:20,  1.68s/it, loss=8.71]Epoch 3/5:  56%|█████▌    | 105/189 [02:51<02:20,  1.68s/it, loss=8.88]Epoch 3/5:  56%|█████▌    | 106/189 [02:51<02:17,  1.66s/it, loss=8.88]Epoch 3/5:  56%|█████▌    | 106/189 [02:53<02:17,  1.66s/it, loss=8.73]Epoch 3/5:  57%|█████▋    | 107/189 [02:53<02:16,  1.66s/it, loss=8.73]Epoch 3/5:  57%|█████▋    | 107/189 [02:54<02:16,  1.66s/it, loss=8.87]Epoch 3/5:  57%|█████▋    | 108/189 [02:54<02:14,  1.66s/it, loss=8.87]Epoch 3/5:  57%|█████▋    | 108/189 [02:56<02:14,  1.66s/it, loss=8.65]Epoch 3/5:  58%|█████▊    | 109/189 [02:56<02:12,  1.65s/it, loss=8.65]Epoch 3/5:  58%|█████▊    | 109/189 [02:58<02:12,  1.65s/it, loss=8.85]Epoch 3/5:  58%|█████▊    | 110/189 [02:58<02:09,  1.63s/it, loss=8.85]Epoch 3/5:  58%|█████▊    | 110/189 [02:59<02:09,  1.63s/it, loss=8.91]Epoch 3/5:  59%|█████▊    | 111/189 [02:59<02:05,  1.61s/it, loss=8.91]Epoch 3/5:  59%|█████▊    | 111/189 [03:01<02:05,  1.61s/it, loss=8.76]Epoch 3/5:  59%|█████▉    | 112/189 [03:01<02:08,  1.66s/it, loss=8.76]Epoch 3/5:  59%|█████▉    | 112/189 [03:02<02:08,  1.66s/it, loss=8.75]Epoch 3/5:  60%|█████▉    | 113/189 [03:02<02:04,  1.63s/it, loss=8.75]Epoch 3/5:  60%|█████▉    | 113/189 [03:04<02:04,  1.63s/it, loss=8.72]Epoch 3/5:  60%|██████    | 114/189 [03:04<02:01,  1.62s/it, loss=8.72]Epoch 3/5:  60%|██████    | 114/189 [03:06<02:01,  1.62s/it, loss=8.81]Epoch 3/5:  61%|██████    | 115/189 [03:06<02:00,  1.62s/it, loss=8.81]Epoch 3/5:  61%|██████    | 115/189 [03:07<02:00,  1.62s/it, loss=8.62]Epoch 3/5:  61%|██████▏   | 116/189 [03:07<02:01,  1.66s/it, loss=8.62]Epoch 3/5:  61%|██████▏   | 116/189 [03:09<02:01,  1.66s/it, loss=8.70]Epoch 3/5:  62%|██████▏   | 117/189 [03:09<02:00,  1.67s/it, loss=8.70]Epoch 3/5:  62%|██████▏   | 117/189 [03:11<02:00,  1.67s/it, loss=8.71]Epoch 3/5:  62%|██████▏   | 118/189 [03:11<01:58,  1.66s/it, loss=8.71]Epoch 3/5:  62%|██████▏   | 118/189 [03:12<01:58,  1.66s/it, loss=8.60]Epoch 3/5:  63%|██████▎   | 119/189 [03:12<01:55,  1.65s/it, loss=8.60]Epoch 3/5:  63%|██████▎   | 119/189 [03:14<01:55,  1.65s/it, loss=8.59]Epoch 3/5:  63%|██████▎   | 120/189 [03:14<01:51,  1.61s/it, loss=8.59]Epoch 3/5:  63%|██████▎   | 120/189 [03:16<01:51,  1.61s/it, loss=8.82]Epoch 3/5:  64%|██████▍   | 121/189 [03:16<01:53,  1.68s/it, loss=8.82]Epoch 3/5:  64%|██████▍   | 121/189 [03:17<01:53,  1.68s/it, loss=8.72]Epoch 3/5:  65%|██████▍   | 122/189 [03:17<01:54,  1.71s/it, loss=8.72]Epoch 3/5:  65%|██████▍   | 122/189 [03:19<01:54,  1.71s/it, loss=8.76]Epoch 3/5:  65%|██████▌   | 123/189 [03:19<01:52,  1.71s/it, loss=8.76]Epoch 3/5:  65%|██████▌   | 123/189 [03:21<01:52,  1.71s/it, loss=8.72]Epoch 3/5:  66%|██████▌   | 124/189 [03:21<01:48,  1.67s/it, loss=8.72]Epoch 3/5:  66%|██████▌   | 124/189 [03:22<01:48,  1.67s/it, loss=8.46]Epoch 3/5:  66%|██████▌   | 125/189 [03:22<01:47,  1.68s/it, loss=8.46]Epoch 3/5:  66%|██████▌   | 125/189 [03:24<01:47,  1.68s/it, loss=8.73]Epoch 3/5:  67%|██████▋   | 126/189 [03:24<01:44,  1.66s/it, loss=8.73]Epoch 3/5:  67%|██████▋   | 126/189 [03:26<01:44,  1.66s/it, loss=8.50]Epoch 3/5:  67%|██████▋   | 127/189 [03:26<01:44,  1.68s/it, loss=8.50]Epoch 3/5:  67%|██████▋   | 127/189 [03:27<01:44,  1.68s/it, loss=8.54]Epoch 3/5:  68%|██████▊   | 128/189 [03:27<01:40,  1.66s/it, loss=8.54]Epoch 3/5:  68%|██████▊   | 128/189 [03:29<01:40,  1.66s/it, loss=8.70]Epoch 3/5:  68%|██████▊   | 129/189 [03:29<01:37,  1.63s/it, loss=8.70]Epoch 3/5:  68%|██████▊   | 129/189 [03:31<01:37,  1.63s/it, loss=8.40]Epoch 3/5:  69%|██████▉   | 130/189 [03:31<01:35,  1.62s/it, loss=8.40]Epoch 3/5:  69%|██████▉   | 130/189 [03:32<01:35,  1.62s/it, loss=8.65]Epoch 3/5:  69%|██████▉   | 131/189 [03:32<01:32,  1.60s/it, loss=8.65]Epoch 3/5:  69%|██████▉   | 131/189 [03:34<01:32,  1.60s/it, loss=8.54]Epoch 3/5:  70%|██████▉   | 132/189 [03:34<01:31,  1.61s/it, loss=8.54]Epoch 3/5:  70%|██████▉   | 132/189 [03:35<01:31,  1.61s/it, loss=8.54]Epoch 3/5:  70%|███████   | 133/189 [03:35<01:31,  1.63s/it, loss=8.54]Epoch 3/5:  70%|███████   | 133/189 [03:37<01:31,  1.63s/it, loss=8.56]Epoch 3/5:  71%|███████   | 134/189 [03:37<01:25,  1.55s/it, loss=8.56]Epoch 3/5:  71%|███████   | 134/189 [03:38<01:25,  1.55s/it, loss=8.48]Epoch 3/5:  71%|███████▏  | 135/189 [03:38<01:24,  1.57s/it, loss=8.48]Epoch 3/5:  71%|███████▏  | 135/189 [03:40<01:24,  1.57s/it, loss=8.60]Epoch 3/5:  72%|███████▏  | 136/189 [03:40<01:21,  1.54s/it, loss=8.60]Epoch 3/5:  72%|███████▏  | 136/189 [03:42<01:21,  1.54s/it, loss=8.63]Epoch 3/5:  72%|███████▏  | 137/189 [03:42<01:23,  1.61s/it, loss=8.63]Epoch 3/5:  72%|███████▏  | 137/189 [03:43<01:23,  1.61s/it, loss=8.64]Epoch 3/5:  73%|███████▎  | 138/189 [03:43<01:21,  1.59s/it, loss=8.64]Epoch 3/5:  73%|███████▎  | 138/189 [03:45<01:21,  1.59s/it, loss=8.64]Epoch 3/5:  74%|███████▎  | 139/189 [03:45<01:20,  1.60s/it, loss=8.64]Epoch 3/5:  74%|███████▎  | 139/189 [03:47<01:20,  1.60s/it, loss=8.50]Epoch 3/5:  74%|███████▍  | 140/189 [03:47<01:19,  1.62s/it, loss=8.50]Epoch 3/5:  74%|███████▍  | 140/189 [03:48<01:19,  1.62s/it, loss=8.47]Epoch 3/5:  75%|███████▍  | 141/189 [03:48<01:17,  1.62s/it, loss=8.47]Epoch 3/5:  75%|███████▍  | 141/189 [03:50<01:17,  1.62s/it, loss=8.59]Epoch 3/5:  75%|███████▌  | 142/189 [03:50<01:15,  1.62s/it, loss=8.59]Epoch 3/5:  75%|███████▌  | 142/189 [03:51<01:15,  1.62s/it, loss=8.58]Epoch 3/5:  76%|███████▌  | 143/189 [03:51<01:13,  1.59s/it, loss=8.58]Epoch 3/5:  76%|███████▌  | 143/189 [03:53<01:13,  1.59s/it, loss=8.39]Epoch 3/5:  76%|███████▌  | 144/189 [03:53<01:07,  1.51s/it, loss=8.39]Epoch 3/5:  76%|███████▌  | 144/189 [03:54<01:07,  1.51s/it, loss=8.46]Epoch 3/5:  77%|███████▋  | 145/189 [03:54<01:04,  1.47s/it, loss=8.46]Epoch 3/5:  77%|███████▋  | 145/189 [03:56<01:04,  1.47s/it, loss=8.59]Epoch 3/5:  77%|███████▋  | 146/189 [03:56<01:05,  1.52s/it, loss=8.59]Epoch 3/5:  77%|███████▋  | 146/189 [03:57<01:05,  1.52s/it, loss=8.44]Epoch 3/5:  78%|███████▊  | 147/189 [03:57<01:05,  1.56s/it, loss=8.44]Epoch 3/5:  78%|███████▊  | 147/189 [03:59<01:05,  1.56s/it, loss=8.39]Epoch 3/5:  78%|███████▊  | 148/189 [03:59<01:06,  1.63s/it, loss=8.39]Epoch 3/5:  78%|███████▊  | 148/189 [04:01<01:06,  1.63s/it, loss=8.53]Epoch 3/5:  79%|███████▉  | 149/189 [04:01<01:05,  1.65s/it, loss=8.53]Epoch 3/5:  79%|███████▉  | 149/189 [04:02<01:05,  1.65s/it, loss=8.46]Epoch 3/5:  79%|███████▉  | 150/189 [04:02<01:03,  1.63s/it, loss=8.46]Epoch 3/5:  79%|███████▉  | 150/189 [04:04<01:03,  1.63s/it, loss=8.47]Epoch 3/5:  80%|███████▉  | 151/189 [04:04<01:00,  1.60s/it, loss=8.47]Epoch 3/5:  80%|███████▉  | 151/189 [04:06<01:00,  1.60s/it, loss=8.41]Epoch 3/5:  80%|████████  | 152/189 [04:06<01:00,  1.63s/it, loss=8.41]Epoch 3/5:  80%|████████  | 152/189 [04:07<01:00,  1.63s/it, loss=8.49]Epoch 3/5:  81%|████████  | 153/189 [04:07<00:58,  1.64s/it, loss=8.49]Epoch 3/5:  81%|████████  | 153/189 [04:09<00:58,  1.64s/it, loss=8.54]Epoch 3/5:  81%|████████▏ | 154/189 [04:09<00:56,  1.61s/it, loss=8.54]Epoch 3/5:  81%|████████▏ | 154/189 [04:10<00:56,  1.61s/it, loss=8.40]Epoch 3/5:  82%|████████▏ | 155/189 [04:10<00:55,  1.63s/it, loss=8.40]Epoch 3/5:  82%|████████▏ | 155/189 [04:12<00:55,  1.63s/it, loss=8.48]Epoch 3/5:  83%|████████▎ | 156/189 [04:12<00:52,  1.61s/it, loss=8.48]Epoch 3/5:  83%|████████▎ | 156/189 [04:14<00:52,  1.61s/it, loss=8.55]Epoch 3/5:  83%|████████▎ | 157/189 [04:14<00:51,  1.59s/it, loss=8.55]Epoch 3/5:  83%|████████▎ | 157/189 [04:15<00:51,  1.59s/it, loss=8.48]Epoch 3/5:  84%|████████▎ | 158/189 [04:15<00:48,  1.57s/it, loss=8.48]Epoch 3/5:  84%|████████▎ | 158/189 [04:17<00:48,  1.57s/it, loss=8.35]Epoch 3/5:  84%|████████▍ | 159/189 [04:17<00:47,  1.59s/it, loss=8.35]Epoch 3/5:  84%|████████▍ | 159/189 [04:18<00:47,  1.59s/it, loss=8.38]Epoch 3/5:  85%|████████▍ | 160/189 [04:18<00:46,  1.60s/it, loss=8.38]Epoch 3/5:  85%|████████▍ | 160/189 [04:20<00:46,  1.60s/it, loss=8.43]Epoch 3/5:  85%|████████▌ | 161/189 [04:20<00:44,  1.61s/it, loss=8.43]Epoch 3/5:  85%|████████▌ | 161/189 [04:22<00:44,  1.61s/it, loss=8.36]Epoch 3/5:  86%|████████▌ | 162/189 [04:22<00:42,  1.59s/it, loss=8.36]Epoch 3/5:  86%|████████▌ | 162/189 [04:23<00:42,  1.59s/it, loss=8.53]Epoch 3/5:  86%|████████▌ | 163/189 [04:23<00:40,  1.57s/it, loss=8.53]Epoch 3/5:  86%|████████▌ | 163/189 [04:25<00:40,  1.57s/it, loss=8.36]Epoch 3/5:  87%|████████▋ | 164/189 [04:25<00:39,  1.58s/it, loss=8.36]Epoch 3/5:  87%|████████▋ | 164/189 [04:26<00:39,  1.58s/it, loss=8.48]Epoch 3/5:  87%|████████▋ | 165/189 [04:26<00:38,  1.61s/it, loss=8.48]Epoch 3/5:  87%|████████▋ | 165/189 [04:28<00:38,  1.61s/it, loss=8.53]Epoch 3/5:  88%|████████▊ | 166/189 [04:28<00:37,  1.63s/it, loss=8.53]Epoch 3/5:  88%|████████▊ | 166/189 [04:30<00:37,  1.63s/it, loss=8.50]Epoch 3/5:  88%|████████▊ | 167/189 [04:30<00:36,  1.65s/it, loss=8.50]Epoch 3/5:  88%|████████▊ | 167/189 [04:31<00:36,  1.65s/it, loss=8.52]Epoch 3/5:  89%|████████▉ | 168/189 [04:31<00:34,  1.64s/it, loss=8.52]Epoch 3/5:  89%|████████▉ | 168/189 [04:33<00:34,  1.64s/it, loss=8.48]Epoch 3/5:  89%|████████▉ | 169/189 [04:33<00:32,  1.62s/it, loss=8.48]Epoch 3/5:  89%|████████▉ | 169/189 [04:35<00:32,  1.62s/it, loss=8.49]Epoch 3/5:  90%|████████▉ | 170/189 [04:35<00:30,  1.63s/it, loss=8.49]Epoch 3/5:  90%|████████▉ | 170/189 [04:36<00:30,  1.63s/it, loss=8.54]Epoch 3/5:  90%|█████████ | 171/189 [04:36<00:28,  1.60s/it, loss=8.54]Epoch 3/5:  90%|█████████ | 171/189 [04:38<00:28,  1.60s/it, loss=8.48]Epoch 3/5:  91%|█████████ | 172/189 [04:38<00:27,  1.61s/it, loss=8.48]Epoch 3/5:  91%|█████████ | 172/189 [04:39<00:27,  1.61s/it, loss=8.46]Epoch 3/5:  92%|█████████▏| 173/189 [04:39<00:25,  1.60s/it, loss=8.46]Epoch 3/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.60s/it, loss=8.47]Epoch 3/5:  92%|█████████▏| 174/189 [04:41<00:23,  1.60s/it, loss=8.47]Epoch 3/5:  92%|█████████▏| 174/189 [04:42<00:23,  1.60s/it, loss=8.33]Epoch 3/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.58s/it, loss=8.33]Epoch 3/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.58s/it, loss=8.39]Epoch 3/5:  93%|█████████▎| 176/189 [04:44<00:20,  1.57s/it, loss=8.39]Epoch 3/5:  93%|█████████▎| 176/189 [04:45<00:20,  1.57s/it, loss=8.47]Epoch 3/5:  94%|█████████▎| 177/189 [04:45<00:18,  1.56s/it, loss=8.47]Epoch 3/5:  94%|█████████▎| 177/189 [04:47<00:18,  1.56s/it, loss=8.23]Epoch 3/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.61s/it, loss=8.23]Epoch 3/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.61s/it, loss=8.32]Epoch 3/5:  95%|█████████▍| 179/189 [04:49<00:16,  1.63s/it, loss=8.32]Epoch 3/5:  95%|█████████▍| 179/189 [04:50<00:16,  1.63s/it, loss=8.36]Epoch 3/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.61s/it, loss=8.36]Epoch 3/5:  95%|█████████▌| 180/189 [04:52<00:14,  1.61s/it, loss=8.42]Epoch 3/5:  96%|█████████▌| 181/189 [04:52<00:12,  1.59s/it, loss=8.42]Epoch 3/5:  96%|█████████▌| 181/189 [04:54<00:12,  1.59s/it, loss=8.34]Epoch 3/5:  96%|█████████▋| 182/189 [04:54<00:11,  1.58s/it, loss=8.34]Epoch 3/5:  96%|█████████▋| 182/189 [04:55<00:11,  1.58s/it, loss=8.30]Epoch 3/5:  97%|█████████▋| 183/189 [04:55<00:09,  1.61s/it, loss=8.30]Epoch 3/5:  97%|█████████▋| 183/189 [04:57<00:09,  1.61s/it, loss=8.43]Epoch 3/5:  97%|█████████▋| 184/189 [04:57<00:07,  1.59s/it, loss=8.43]Epoch 3/5:  97%|█████████▋| 184/189 [04:58<00:07,  1.59s/it, loss=8.47]Epoch 3/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.56s/it, loss=8.47]Epoch 3/5:  98%|█████████▊| 185/189 [05:00<00:06,  1.56s/it, loss=8.35]Epoch 3/5:  98%|█████████▊| 186/189 [05:00<00:04,  1.50s/it, loss=8.35]Epoch 3/5:  98%|█████████▊| 186/189 [05:01<00:04,  1.50s/it, loss=8.30]Epoch 3/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.57s/it, loss=8.30]Epoch 3/5:  99%|█████████▉| 187/189 [05:03<00:03,  1.57s/it, loss=8.30]Epoch 3/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.59s/it, loss=8.30]Epoch 3/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.59s/it, loss=8.23]Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00,  1.57s/it, loss=8.23]Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00,  1.61s/it, loss=8.23]
  0%|          | 0/23 [00:00<?, ?it/s]  4%|▍         | 1/23 [00:00<00:05,  3.91it/s]  9%|▊         | 2/23 [00:00<00:06,  3.04it/s] 13%|█▎        | 3/23 [00:00<00:06,  2.91it/s] 17%|█▋        | 4/23 [00:01<00:06,  3.04it/s] 22%|██▏       | 5/23 [00:01<00:06,  2.85it/s] 26%|██▌       | 6/23 [00:02<00:05,  2.97it/s] 30%|███       | 7/23 [00:02<00:05,  3.07it/s] 35%|███▍      | 8/23 [00:02<00:05,  2.98it/s] 39%|███▉      | 9/23 [00:03<00:04,  2.83it/s] 43%|████▎     | 10/23 [00:03<00:04,  2.87it/s] 48%|████▊     | 11/23 [00:03<00:04,  2.68it/s] 52%|█████▏    | 12/23 [00:04<00:04,  2.66it/s] 57%|█████▋    | 13/23 [00:04<00:03,  2.74it/s] 61%|██████    | 14/23 [00:04<00:03,  2.95it/s] 65%|██████▌   | 15/23 [00:05<00:02,  2.91it/s] 70%|██████▉   | 16/23 [00:05<00:02,  2.85it/s] 74%|███████▍  | 17/23 [00:05<00:02,  2.74it/s] 78%|███████▊  | 18/23 [00:06<00:01,  2.87it/s] 83%|████████▎ | 19/23 [00:06<00:01,  2.83it/s] 87%|████████▋ | 20/23 [00:06<00:01,  2.78it/s] 91%|█████████▏| 21/23 [00:07<00:00,  2.83it/s] 96%|█████████▌| 22/23 [00:07<00:00,  2.79it/s]100%|██████████| 23/23 [00:07<00:00,  2.96it/s]100%|██████████| 23/23 [00:07<00:00,  2.88it/s]

Epoch 3: train_loss=8.8681 | R@10=0.0193 | DCG@10=0.2030 | NDCG@10=0.0485
Epoch 4/5:   0%|          | 0/189 [00:00<?, ?it/s]Epoch 4/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=8.34]Epoch 4/5:   1%|          | 1/189 [00:01<04:53,  1.56s/it, loss=8.34]Epoch 4/5:   1%|          | 1/189 [00:03<04:53,  1.56s/it, loss=8.29]Epoch 4/5:   1%|          | 2/189 [00:03<04:55,  1.58s/it, loss=8.29]Epoch 4/5:   1%|          | 2/189 [00:04<04:55,  1.58s/it, loss=8.42]Epoch 4/5:   2%|▏         | 3/189 [00:04<05:03,  1.63s/it, loss=8.42]Epoch 4/5:   2%|▏         | 3/189 [00:06<05:03,  1.63s/it, loss=8.33]Epoch 4/5:   2%|▏         | 4/189 [00:06<05:07,  1.66s/it, loss=8.33]Epoch 4/5:   2%|▏         | 4/189 [00:08<05:07,  1.66s/it, loss=8.42]Epoch 4/5:   3%|▎         | 5/189 [00:08<05:05,  1.66s/it, loss=8.42]Epoch 4/5:   3%|▎         | 5/189 [00:09<05:05,  1.66s/it, loss=8.43]Epoch 4/5:   3%|▎         | 6/189 [00:09<05:00,  1.64s/it, loss=8.43]Epoch 4/5:   3%|▎         | 6/189 [00:11<05:00,  1.64s/it, loss=8.19]Epoch 4/5:   4%|▎         | 7/189 [00:11<04:49,  1.59s/it, loss=8.19]Epoch 4/5:   4%|▎         | 7/189 [00:12<04:49,  1.59s/it, loss=8.26]Epoch 4/5:   4%|▍         | 8/189 [00:12<04:48,  1.59s/it, loss=8.26]Epoch 4/5:   4%|▍         | 8/189 [00:14<04:48,  1.59s/it, loss=8.38]Epoch 4/5:   5%|▍         | 9/189 [00:14<04:51,  1.62s/it, loss=8.38]Epoch 4/5:   5%|▍         | 9/189 [00:16<04:51,  1.62s/it, loss=8.28]Epoch 4/5:   5%|▌         | 10/189 [00:16<04:48,  1.61s/it, loss=8.28]Epoch 4/5:   5%|▌         | 10/189 [00:17<04:48,  1.61s/it, loss=8.32]Epoch 4/5:   6%|▌         | 11/189 [00:17<04:50,  1.63s/it, loss=8.32]Epoch 4/5:   6%|▌         | 11/189 [00:19<04:50,  1.63s/it, loss=8.37]Epoch 4/5:   6%|▋         | 12/189 [00:19<04:53,  1.66s/it, loss=8.37]Epoch 4/5:   6%|▋         | 12/189 [00:21<04:53,  1.66s/it, loss=8.25]Epoch 4/5:   7%|▋         | 13/189 [00:21<04:47,  1.64s/it, loss=8.25]Epoch 4/5:   7%|▋         | 13/189 [00:22<04:47,  1.64s/it, loss=8.36]Epoch 4/5:   7%|▋         | 14/189 [00:22<04:41,  1.61s/it, loss=8.36]Epoch 4/5:   7%|▋         | 14/189 [00:24<04:41,  1.61s/it, loss=8.40]Epoch 4/5:   8%|▊         | 15/189 [00:24<04:34,  1.58s/it, loss=8.40]Epoch 4/5:   8%|▊         | 15/189 [00:25<04:34,  1.58s/it, loss=8.28]Epoch 4/5:   8%|▊         | 16/189 [00:25<04:38,  1.61s/it, loss=8.28]Epoch 4/5:   8%|▊         | 16/189 [00:27<04:38,  1.61s/it, loss=8.17]Epoch 4/5:   9%|▉         | 17/189 [00:27<04:37,  1.61s/it, loss=8.17]Epoch 4/5:   9%|▉         | 17/189 [00:29<04:37,  1.61s/it, loss=8.30]Epoch 4/5:  10%|▉         | 18/189 [00:29<04:36,  1.61s/it, loss=8.30]Epoch 4/5:  10%|▉         | 18/189 [00:30<04:36,  1.61s/it, loss=8.25]Epoch 4/5:  10%|█         | 19/189 [00:30<04:35,  1.62s/it, loss=8.25]Epoch 4/5:  10%|█         | 19/189 [00:32<04:35,  1.62s/it, loss=8.19]Epoch 4/5:  11%|█         | 20/189 [00:32<04:30,  1.60s/it, loss=8.19]Epoch 4/5:  11%|█         | 20/189 [00:34<04:30,  1.60s/it, loss=8.09]Epoch 4/5:  11%|█         | 21/189 [00:34<04:33,  1.63s/it, loss=8.09]Epoch 4/5:  11%|█         | 21/189 [00:35<04:33,  1.63s/it, loss=8.21]Epoch 4/5:  12%|█▏        | 22/189 [00:35<04:29,  1.61s/it, loss=8.21]Epoch 4/5:  12%|█▏        | 22/189 [00:37<04:29,  1.61s/it, loss=8.21]Epoch 4/5:  12%|█▏        | 23/189 [00:37<04:24,  1.59s/it, loss=8.21]Epoch 4/5:  12%|█▏        | 23/189 [00:38<04:24,  1.59s/it, loss=8.20]Epoch 4/5:  13%|█▎        | 24/189 [00:38<04:30,  1.64s/it, loss=8.20]Epoch 4/5:  13%|█▎        | 24/189 [00:40<04:30,  1.64s/it, loss=8.22]Epoch 4/5:  13%|█▎        | 25/189 [00:40<04:30,  1.65s/it, loss=8.22]Epoch 4/5:  13%|█▎        | 25/189 [00:42<04:30,  1.65s/it, loss=8.23]Epoch 4/5:  14%|█▍        | 26/189 [00:42<04:26,  1.63s/it, loss=8.23]Epoch 4/5:  14%|█▍        | 26/189 [00:43<04:26,  1.63s/it, loss=8.23]Epoch 4/5:  14%|█▍        | 27/189 [00:43<04:13,  1.57s/it, loss=8.23]Epoch 4/5:  14%|█▍        | 27/189 [00:45<04:13,  1.57s/it, loss=8.28]Epoch 4/5:  15%|█▍        | 28/189 [00:45<04:15,  1.58s/it, loss=8.28]Epoch 4/5:  15%|█▍        | 28/189 [00:46<04:15,  1.58s/it, loss=8.15]Epoch 4/5:  15%|█▌        | 29/189 [00:46<04:04,  1.53s/it, loss=8.15]Epoch 4/5:  15%|█▌        | 29/189 [00:48<04:04,  1.53s/it, loss=8.05]Epoch 4/5:  16%|█▌        | 30/189 [00:48<04:04,  1.54s/it, loss=8.05]Epoch 4/5:  16%|█▌        | 30/189 [00:49<04:04,  1.54s/it, loss=8.17]Epoch 4/5:  16%|█▋        | 31/189 [00:49<04:05,  1.55s/it, loss=8.17]Epoch 4/5:  16%|█▋        | 31/189 [00:51<04:05,  1.55s/it, loss=8.15]Epoch 4/5:  17%|█▋        | 32/189 [00:51<04:10,  1.59s/it, loss=8.15]Epoch 4/5:  17%|█▋        | 32/189 [00:53<04:10,  1.59s/it, loss=8.12]Epoch 4/5:  17%|█▋        | 33/189 [00:53<04:09,  1.60s/it, loss=8.12]Epoch 4/5:  17%|█▋        | 33/189 [00:54<04:09,  1.60s/it, loss=8.06]Epoch 4/5:  18%|█▊        | 34/189 [00:54<04:11,  1.62s/it, loss=8.06]Epoch 4/5:  18%|█▊        | 34/189 [00:56<04:11,  1.62s/it, loss=8.15]Epoch 4/5:  19%|█▊        | 35/189 [00:56<04:10,  1.63s/it, loss=8.15]Epoch 4/5:  19%|█▊        | 35/189 [00:58<04:10,  1.63s/it, loss=8.18]Epoch 4/5:  19%|█▉        | 36/189 [00:58<04:10,  1.64s/it, loss=8.18]Epoch 4/5:  19%|█▉        | 36/189 [00:59<04:10,  1.64s/it, loss=8.28]Epoch 4/5:  20%|█▉        | 37/189 [00:59<03:52,  1.53s/it, loss=8.28]Epoch 4/5:  20%|█▉        | 37/189 [01:00<03:52,  1.53s/it, loss=8.14]Epoch 4/5:  20%|██        | 38/189 [01:00<03:50,  1.53s/it, loss=8.14]Epoch 4/5:  20%|██        | 38/189 [01:02<03:50,  1.53s/it, loss=8.13]Epoch 4/5:  21%|██        | 39/189 [01:02<03:50,  1.54s/it, loss=8.13]Epoch 4/5:  21%|██        | 39/189 [01:04<03:50,  1.54s/it, loss=8.22]Epoch 4/5:  21%|██        | 40/189 [01:04<03:53,  1.57s/it, loss=8.22]Epoch 4/5:  21%|██        | 40/189 [01:05<03:53,  1.57s/it, loss=8.24]Epoch 4/5:  22%|██▏       | 41/189 [01:05<03:55,  1.59s/it, loss=8.24]Epoch 4/5:  22%|██▏       | 41/189 [01:07<03:55,  1.59s/it, loss=8.19]Epoch 4/5:  22%|██▏       | 42/189 [01:07<03:51,  1.57s/it, loss=8.19]Epoch 4/5:  22%|██▏       | 42/189 [01:08<03:51,  1.57s/it, loss=7.95]Epoch 4/5:  23%|██▎       | 43/189 [01:08<03:49,  1.57s/it, loss=7.95]Epoch 4/5:  23%|██▎       | 43/189 [01:10<03:49,  1.57s/it, loss=8.04]Epoch 4/5:  23%|██▎       | 44/189 [01:10<03:42,  1.53s/it, loss=8.04]Epoch 4/5:  23%|██▎       | 44/189 [01:11<03:42,  1.53s/it, loss=8.11]Epoch 4/5:  24%|██▍       | 45/189 [01:11<03:48,  1.59s/it, loss=8.11]Epoch 4/5:  24%|██▍       | 45/189 [01:13<03:48,  1.59s/it, loss=8.13]Epoch 4/5:  24%|██▍       | 46/189 [01:13<03:50,  1.61s/it, loss=8.13]Epoch 4/5:  24%|██▍       | 46/189 [01:15<03:50,  1.61s/it, loss=8.33]Epoch 4/5:  25%|██▍       | 47/189 [01:15<03:48,  1.61s/it, loss=8.33]Epoch 4/5:  25%|██▍       | 47/189 [01:16<03:48,  1.61s/it, loss=8.22]Epoch 4/5:  25%|██▌       | 48/189 [01:16<03:48,  1.62s/it, loss=8.22]Epoch 4/5:  25%|██▌       | 48/189 [01:18<03:48,  1.62s/it, loss=8.22]Epoch 4/5:  26%|██▌       | 49/189 [01:18<03:46,  1.61s/it, loss=8.22]Epoch 4/5:  26%|██▌       | 49/189 [01:20<03:46,  1.61s/it, loss=8.21]Epoch 4/5:  26%|██▋       | 50/189 [01:20<03:46,  1.63s/it, loss=8.21]Epoch 4/5:  26%|██▋       | 50/189 [01:21<03:46,  1.63s/it, loss=8.11]Epoch 4/5:  27%|██▋       | 51/189 [01:21<03:39,  1.59s/it, loss=8.11]Epoch 4/5:  27%|██▋       | 51/189 [01:23<03:39,  1.59s/it, loss=8.11]Epoch 4/5:  28%|██▊       | 52/189 [01:23<03:35,  1.57s/it, loss=8.11]Epoch 4/5:  28%|██▊       | 52/189 [01:24<03:35,  1.57s/it, loss=8.19]Epoch 4/5:  28%|██▊       | 53/189 [01:24<03:37,  1.60s/it, loss=8.19]Epoch 4/5:  28%|██▊       | 53/189 [01:26<03:37,  1.60s/it, loss=8.15]Epoch 4/5:  29%|██▊       | 54/189 [01:26<03:38,  1.62s/it, loss=8.15]Epoch 4/5:  29%|██▊       | 54/189 [01:27<03:38,  1.62s/it, loss=8.08]Epoch 4/5:  29%|██▉       | 55/189 [01:27<03:33,  1.59s/it, loss=8.08]Epoch 4/5:  29%|██▉       | 55/189 [01:29<03:33,  1.59s/it, loss=8.10]Epoch 4/5:  30%|██▉       | 56/189 [01:29<03:27,  1.56s/it, loss=8.10]Epoch 4/5:  30%|██▉       | 56/189 [01:30<03:27,  1.56s/it, loss=8.18]Epoch 4/5:  30%|███       | 57/189 [01:30<03:25,  1.55s/it, loss=8.18]Epoch 4/5:  30%|███       | 57/189 [01:32<03:25,  1.55s/it, loss=8.15]Epoch 4/5:  31%|███       | 58/189 [01:32<03:29,  1.60s/it, loss=8.15]Epoch 4/5:  31%|███       | 58/189 [01:34<03:29,  1.60s/it, loss=8.15]Epoch 4/5:  31%|███       | 59/189 [01:34<03:27,  1.60s/it, loss=8.15]Epoch 4/5:  31%|███       | 59/189 [01:35<03:27,  1.60s/it, loss=8.23]Epoch 4/5:  32%|███▏      | 60/189 [01:35<03:28,  1.62s/it, loss=8.23]Epoch 4/5:  32%|███▏      | 60/189 [01:37<03:28,  1.62s/it, loss=8.15]Epoch 4/5:  32%|███▏      | 61/189 [01:37<03:29,  1.63s/it, loss=8.15]Epoch 4/5:  32%|███▏      | 61/189 [01:39<03:29,  1.63s/it, loss=8.03]Epoch 4/5:  33%|███▎      | 62/189 [01:39<03:27,  1.63s/it, loss=8.03]Epoch 4/5:  33%|███▎      | 62/189 [01:41<03:27,  1.63s/it, loss=7.94]Epoch 4/5:  33%|███▎      | 63/189 [01:41<03:31,  1.68s/it, loss=7.94]Epoch 4/5:  33%|███▎      | 63/189 [01:42<03:31,  1.68s/it, loss=8.06]Epoch 4/5:  34%|███▍      | 64/189 [01:42<03:29,  1.68s/it, loss=8.06]Epoch 4/5:  34%|███▍      | 64/189 [01:44<03:29,  1.68s/it, loss=8.11]Epoch 4/5:  34%|███▍      | 65/189 [01:44<03:25,  1.65s/it, loss=8.11]Epoch 4/5:  34%|███▍      | 65/189 [01:45<03:25,  1.65s/it, loss=7.98]Epoch 4/5:  35%|███▍      | 66/189 [01:45<03:23,  1.65s/it, loss=7.98]Epoch 4/5:  35%|███▍      | 66/189 [01:47<03:23,  1.65s/it, loss=8.13]Epoch 4/5:  35%|███▌      | 67/189 [01:47<03:16,  1.61s/it, loss=8.13]Epoch 4/5:  35%|███▌      | 67/189 [01:48<03:16,  1.61s/it, loss=8.01]Epoch 4/5:  36%|███▌      | 68/189 [01:48<03:09,  1.56s/it, loss=8.01]Epoch 4/5:  36%|███▌      | 68/189 [01:50<03:09,  1.56s/it, loss=8.11]Epoch 4/5:  37%|███▋      | 69/189 [01:50<03:08,  1.57s/it, loss=8.11]Epoch 4/5:  37%|███▋      | 69/189 [01:52<03:08,  1.57s/it, loss=8.02]Epoch 4/5:  37%|███▋      | 70/189 [01:52<03:08,  1.58s/it, loss=8.02]Epoch 4/5:  37%|███▋      | 70/189 [01:53<03:08,  1.58s/it, loss=8.03]Epoch 4/5:  38%|███▊      | 71/189 [01:53<03:09,  1.60s/it, loss=8.03]Epoch 4/5:  38%|███▊      | 71/189 [01:55<03:09,  1.60s/it, loss=8.05]Epoch 4/5:  38%|███▊      | 72/189 [01:55<03:10,  1.63s/it, loss=8.05]Epoch 4/5:  38%|███▊      | 72/189 [01:57<03:10,  1.63s/it, loss=8.16]Epoch 4/5:  39%|███▊      | 73/189 [01:57<03:06,  1.61s/it, loss=8.16]Epoch 4/5:  39%|███▊      | 73/189 [01:58<03:06,  1.61s/it, loss=8.16]Epoch 4/5:  39%|███▉      | 74/189 [01:58<03:07,  1.63s/it, loss=8.16]Epoch 4/5:  39%|███▉      | 74/189 [02:00<03:07,  1.63s/it, loss=8.11]Epoch 4/5:  40%|███▉      | 75/189 [02:00<03:03,  1.61s/it, loss=8.11]Epoch 4/5:  40%|███▉      | 75/189 [02:01<03:03,  1.61s/it, loss=8.11]Epoch 4/5:  40%|████      | 76/189 [02:01<03:04,  1.63s/it, loss=8.11]Epoch 4/5:  40%|████      | 76/189 [02:03<03:04,  1.63s/it, loss=8.08]Epoch 4/5:  41%|████      | 77/189 [02:03<03:00,  1.61s/it, loss=8.08]Epoch 4/5:  41%|████      | 77/189 [02:04<03:00,  1.61s/it, loss=8.08]Epoch 4/5:  41%|████▏     | 78/189 [02:04<02:53,  1.56s/it, loss=8.08]Epoch 4/5:  41%|████▏     | 78/189 [02:06<02:53,  1.56s/it, loss=8.08]Epoch 4/5:  42%|████▏     | 79/189 [02:06<02:52,  1.57s/it, loss=8.08]Epoch 4/5:  42%|████▏     | 79/189 [02:08<02:52,  1.57s/it, loss=8.12]Epoch 4/5:  42%|████▏     | 80/189 [02:08<02:50,  1.57s/it, loss=8.12]Epoch 4/5:  42%|████▏     | 80/189 [02:09<02:50,  1.57s/it, loss=8.11]Epoch 4/5:  43%|████▎     | 81/189 [02:09<02:49,  1.57s/it, loss=8.11]Epoch 4/5:  43%|████▎     | 81/189 [02:11<02:49,  1.57s/it, loss=7.96]Epoch 4/5:  43%|████▎     | 82/189 [02:11<02:51,  1.60s/it, loss=7.96]Epoch 4/5:  43%|████▎     | 82/189 [02:12<02:51,  1.60s/it, loss=8.03]Epoch 4/5:  44%|████▍     | 83/189 [02:12<02:50,  1.61s/it, loss=8.03]Epoch 4/5:  44%|████▍     | 83/189 [02:14<02:50,  1.61s/it, loss=7.98]Epoch 4/5:  44%|████▍     | 84/189 [02:14<02:47,  1.59s/it, loss=7.98]Epoch 4/5:  44%|████▍     | 84/189 [02:16<02:47,  1.59s/it, loss=7.97]Epoch 4/5:  45%|████▍     | 85/189 [02:16<02:51,  1.65s/it, loss=7.97]Epoch 4/5:  45%|████▍     | 85/189 [02:18<02:51,  1.65s/it, loss=8.13]Epoch 4/5:  46%|████▌     | 86/189 [02:18<02:50,  1.66s/it, loss=8.13]Epoch 4/5:  46%|████▌     | 86/189 [02:19<02:50,  1.66s/it, loss=8.12]Epoch 4/5:  46%|████▌     | 87/189 [02:19<02:47,  1.64s/it, loss=8.12]Epoch 4/5:  46%|████▌     | 87/189 [02:21<02:47,  1.64s/it, loss=8.12]Epoch 4/5:  47%|████▋     | 88/189 [02:21<02:45,  1.64s/it, loss=8.12]Epoch 4/5:  47%|████▋     | 88/189 [02:23<02:45,  1.64s/it, loss=8.02]Epoch 4/5:  47%|████▋     | 89/189 [02:23<02:50,  1.70s/it, loss=8.02]Epoch 4/5:  47%|████▋     | 89/189 [02:24<02:50,  1.70s/it, loss=8.00]Epoch 4/5:  48%|████▊     | 90/189 [02:24<02:46,  1.68s/it, loss=8.00]Epoch 4/5:  48%|████▊     | 90/189 [02:26<02:46,  1.68s/it, loss=8.12]Epoch 4/5:  48%|████▊     | 91/189 [02:26<02:44,  1.68s/it, loss=8.12]Epoch 4/5:  48%|████▊     | 91/189 [02:27<02:44,  1.68s/it, loss=8.13]Epoch 4/5:  49%|████▊     | 92/189 [02:27<02:40,  1.65s/it, loss=8.13]Epoch 4/5:  49%|████▊     | 92/189 [02:29<02:40,  1.65s/it, loss=8.04]Epoch 4/5:  49%|████▉     | 93/189 [02:29<02:37,  1.64s/it, loss=8.04]Epoch 4/5:  49%|████▉     | 93/189 [02:31<02:37,  1.64s/it, loss=8.08]Epoch 4/5:  50%|████▉     | 94/189 [02:31<02:31,  1.59s/it, loss=8.08]Epoch 4/5:  50%|████▉     | 94/189 [02:32<02:31,  1.59s/it, loss=8.00]Epoch 4/5:  50%|█████     | 95/189 [02:32<02:32,  1.62s/it, loss=8.00]Epoch 4/5:  50%|█████     | 95/189 [02:34<02:32,  1.62s/it, loss=8.02]Epoch 4/5:  51%|█████     | 96/189 [02:34<02:30,  1.62s/it, loss=8.02]Epoch 4/5:  51%|█████     | 96/189 [02:36<02:30,  1.62s/it, loss=8.11]Epoch 4/5:  51%|█████▏    | 97/189 [02:36<02:31,  1.65s/it, loss=8.11]Epoch 4/5:  51%|█████▏    | 97/189 [02:37<02:31,  1.65s/it, loss=7.96]Epoch 4/5:  52%|█████▏    | 98/189 [02:37<02:31,  1.66s/it, loss=7.96]Epoch 4/5:  52%|█████▏    | 98/189 [02:39<02:31,  1.66s/it, loss=8.11]Epoch 4/5:  52%|█████▏    | 99/189 [02:39<02:27,  1.64s/it, loss=8.11]Epoch 4/5:  52%|█████▏    | 99/189 [02:40<02:27,  1.64s/it, loss=7.94]Epoch 4/5:  53%|█████▎    | 100/189 [02:40<02:23,  1.61s/it, loss=7.94]Epoch 4/5:  53%|█████▎    | 100/189 [02:42<02:23,  1.61s/it, loss=8.13]Epoch 4/5:  53%|█████▎    | 101/189 [02:42<02:22,  1.62s/it, loss=8.13]Epoch 4/5:  53%|█████▎    | 101/189 [02:44<02:22,  1.62s/it, loss=8.12]Epoch 4/5:  54%|█████▍    | 102/189 [02:44<02:17,  1.58s/it, loss=8.12]Epoch 4/5:  54%|█████▍    | 102/189 [02:45<02:17,  1.58s/it, loss=8.07]Epoch 4/5:  54%|█████▍    | 103/189 [02:45<02:11,  1.53s/it, loss=8.07]Epoch 4/5:  54%|█████▍    | 103/189 [02:47<02:11,  1.53s/it, loss=7.99]Epoch 4/5:  55%|█████▌    | 104/189 [02:47<02:14,  1.58s/it, loss=7.99]Epoch 4/5:  55%|█████▌    | 104/189 [02:48<02:14,  1.58s/it, loss=7.98]Epoch 4/5:  56%|█████▌    | 105/189 [02:48<02:14,  1.61s/it, loss=7.98]Epoch 4/5:  56%|█████▌    | 105/189 [02:50<02:14,  1.61s/it, loss=7.96]Epoch 4/5:  56%|█████▌    | 106/189 [02:50<02:14,  1.62s/it, loss=7.96]Epoch 4/5:  56%|█████▌    | 106/189 [02:52<02:14,  1.62s/it, loss=8.02]Epoch 4/5:  57%|█████▋    | 107/189 [02:52<02:15,  1.65s/it, loss=8.02]Epoch 4/5:  57%|█████▋    | 107/189 [02:53<02:15,  1.65s/it, loss=8.03]Epoch 4/5:  57%|█████▋    | 108/189 [02:53<02:09,  1.60s/it, loss=8.03]Epoch 4/5:  57%|█████▋    | 108/189 [02:55<02:09,  1.60s/it, loss=8.18]Epoch 4/5:  58%|█████▊    | 109/189 [02:55<02:07,  1.59s/it, loss=8.18]Epoch 4/5:  58%|█████▊    | 109/189 [02:56<02:07,  1.59s/it, loss=7.83]Epoch 4/5:  58%|█████▊    | 110/189 [02:56<02:03,  1.56s/it, loss=7.83]Epoch 4/5:  58%|█████▊    | 110/189 [02:58<02:03,  1.56s/it, loss=8.14]Epoch 4/5:  59%|█████▊    | 111/189 [02:58<02:02,  1.57s/it, loss=8.14]Epoch 4/5:  59%|█████▊    | 111/189 [03:00<02:02,  1.57s/it, loss=8.05]Epoch 4/5:  59%|█████▉    | 112/189 [03:00<02:03,  1.60s/it, loss=8.05]Epoch 4/5:  59%|█████▉    | 112/189 [03:01<02:03,  1.60s/it, loss=8.09]Epoch 4/5:  60%|█████▉    | 113/189 [03:01<02:03,  1.62s/it, loss=8.09]Epoch 4/5:  60%|█████▉    | 113/189 [03:03<02:03,  1.62s/it, loss=7.88]Epoch 4/5:  60%|██████    | 114/189 [03:03<02:01,  1.61s/it, loss=7.88]Epoch 4/5:  60%|██████    | 114/189 [03:04<02:01,  1.61s/it, loss=7.95]Epoch 4/5:  61%|██████    | 115/189 [03:04<01:57,  1.59s/it, loss=7.95]Epoch 4/5:  61%|██████    | 115/189 [03:06<01:57,  1.59s/it, loss=8.16]Epoch 4/5:  61%|██████▏   | 116/189 [03:06<01:57,  1.61s/it, loss=8.16]Epoch 4/5:  61%|██████▏   | 116/189 [03:08<01:57,  1.61s/it, loss=8.02]Epoch 4/5:  62%|██████▏   | 117/189 [03:08<01:55,  1.60s/it, loss=8.02]Epoch 4/5:  62%|██████▏   | 117/189 [03:09<01:55,  1.60s/it, loss=8.01]Epoch 4/5:  62%|██████▏   | 118/189 [03:09<01:55,  1.62s/it, loss=8.01]Epoch 4/5:  62%|██████▏   | 118/189 [03:11<01:55,  1.62s/it, loss=7.77]Epoch 4/5:  63%|██████▎   | 119/189 [03:11<01:53,  1.62s/it, loss=7.77]Epoch 4/5:  63%|██████▎   | 119/189 [03:12<01:53,  1.62s/it, loss=8.01]Epoch 4/5:  63%|██████▎   | 120/189 [03:12<01:52,  1.63s/it, loss=8.01]Epoch 4/5:  63%|██████▎   | 120/189 [03:14<01:52,  1.63s/it, loss=7.82]Epoch 4/5:  64%|██████▍   | 121/189 [03:14<01:51,  1.64s/it, loss=7.82]Epoch 4/5:  64%|██████▍   | 121/189 [03:16<01:51,  1.64s/it, loss=8.04]Epoch 4/5:  65%|██████▍   | 122/189 [03:16<01:51,  1.67s/it, loss=8.04]Epoch 4/5:  65%|██████▍   | 122/189 [03:17<01:51,  1.67s/it, loss=8.08]Epoch 4/5:  65%|██████▌   | 123/189 [03:17<01:47,  1.63s/it, loss=8.08]Epoch 4/5:  65%|██████▌   | 123/189 [03:19<01:47,  1.63s/it, loss=7.92]Epoch 4/5:  66%|██████▌   | 124/189 [03:19<01:45,  1.63s/it, loss=7.92]Epoch 4/5:  66%|██████▌   | 124/189 [03:21<01:45,  1.63s/it, loss=8.08]Epoch 4/5:  66%|██████▌   | 125/189 [03:21<01:41,  1.59s/it, loss=8.08]Epoch 4/5:  66%|██████▌   | 125/189 [03:22<01:41,  1.59s/it, loss=7.87]Epoch 4/5:  67%|██████▋   | 126/189 [03:22<01:38,  1.57s/it, loss=7.87]Epoch 4/5:  67%|██████▋   | 126/189 [03:24<01:38,  1.57s/it, loss=8.12]Epoch 4/5:  67%|██████▋   | 127/189 [03:24<01:38,  1.58s/it, loss=8.12]Epoch 4/5:  67%|██████▋   | 127/189 [03:25<01:38,  1.58s/it, loss=8.02]Epoch 4/5:  68%|██████▊   | 128/189 [03:25<01:36,  1.58s/it, loss=8.02]Epoch 4/5:  68%|██████▊   | 128/189 [03:27<01:36,  1.58s/it, loss=8.03]Epoch 4/5:  68%|██████▊   | 129/189 [03:27<01:38,  1.63s/it, loss=8.03]Epoch 4/5:  68%|██████▊   | 129/189 [03:29<01:38,  1.63s/it, loss=8.11]Epoch 4/5:  69%|██████▉   | 130/189 [03:29<01:36,  1.63s/it, loss=8.11]Epoch 4/5:  69%|██████▉   | 130/189 [03:30<01:36,  1.63s/it, loss=7.91]Epoch 4/5:  69%|██████▉   | 131/189 [03:30<01:34,  1.64s/it, loss=7.91]Epoch 4/5:  69%|██████▉   | 131/189 [03:32<01:34,  1.64s/it, loss=7.94]Epoch 4/5:  70%|██████▉   | 132/189 [03:32<01:34,  1.67s/it, loss=7.94]Epoch 4/5:  70%|██████▉   | 132/189 [03:34<01:34,  1.67s/it, loss=7.95]Epoch 4/5:  70%|███████   | 133/189 [03:34<01:33,  1.66s/it, loss=7.95]Epoch 4/5:  70%|███████   | 133/189 [03:35<01:33,  1.66s/it, loss=7.78]Epoch 4/5:  71%|███████   | 134/189 [03:35<01:29,  1.63s/it, loss=7.78]Epoch 4/5:  71%|███████   | 134/189 [03:37<01:29,  1.63s/it, loss=8.03]Epoch 4/5:  71%|███████▏  | 135/189 [03:37<01:28,  1.64s/it, loss=8.03]Epoch 4/5:  71%|███████▏  | 135/189 [03:39<01:28,  1.64s/it, loss=7.90]Epoch 4/5:  72%|███████▏  | 136/189 [03:39<01:26,  1.64s/it, loss=7.90]Epoch 4/5:  72%|███████▏  | 136/189 [03:40<01:26,  1.64s/it, loss=8.02]Epoch 4/5:  72%|███████▏  | 137/189 [03:40<01:26,  1.67s/it, loss=8.02]Epoch 4/5:  72%|███████▏  | 137/189 [03:42<01:26,  1.67s/it, loss=7.86]Epoch 4/5:  73%|███████▎  | 138/189 [03:42<01:25,  1.68s/it, loss=7.86]Epoch 4/5:  73%|███████▎  | 138/189 [03:44<01:25,  1.68s/it, loss=7.98]Epoch 4/5:  74%|███████▎  | 139/189 [03:44<01:23,  1.66s/it, loss=7.98]Epoch 4/5:  74%|███████▎  | 139/189 [03:45<01:23,  1.66s/it, loss=7.85]Epoch 4/5:  74%|███████▍  | 140/189 [03:45<01:22,  1.68s/it, loss=7.85]Epoch 4/5:  74%|███████▍  | 140/189 [03:47<01:22,  1.68s/it, loss=8.20]Epoch 4/5:  75%|███████▍  | 141/189 [03:47<01:16,  1.60s/it, loss=8.20]Epoch 4/5:  75%|███████▍  | 141/189 [03:48<01:16,  1.60s/it, loss=8.05]Epoch 4/5:  75%|███████▌  | 142/189 [03:48<01:16,  1.63s/it, loss=8.05]Epoch 4/5:  75%|███████▌  | 142/189 [03:50<01:16,  1.63s/it, loss=8.03]Epoch 4/5:  76%|███████▌  | 143/189 [03:50<01:16,  1.66s/it, loss=8.03]Epoch 4/5:  76%|███████▌  | 143/189 [03:52<01:16,  1.66s/it, loss=7.94]Epoch 4/5:  76%|███████▌  | 144/189 [03:52<01:14,  1.65s/it, loss=7.94]Epoch 4/5:  76%|███████▌  | 144/189 [03:53<01:14,  1.65s/it, loss=8.02]Epoch 4/5:  77%|███████▋  | 145/189 [03:53<01:11,  1.63s/it, loss=8.02]Epoch 4/5:  77%|███████▋  | 145/189 [03:55<01:11,  1.63s/it, loss=7.91]Epoch 4/5:  77%|███████▋  | 146/189 [03:55<01:09,  1.62s/it, loss=7.91]Epoch 4/5:  77%|███████▋  | 146/189 [03:57<01:09,  1.62s/it, loss=7.79]Epoch 4/5:  78%|███████▊  | 147/189 [03:57<01:09,  1.64s/it, loss=7.79]Epoch 4/5:  78%|███████▊  | 147/189 [03:58<01:09,  1.64s/it, loss=7.92]Epoch 4/5:  78%|███████▊  | 148/189 [03:58<01:07,  1.66s/it, loss=7.92]Epoch 4/5:  78%|███████▊  | 148/189 [04:00<01:07,  1.66s/it, loss=7.88]Epoch 4/5:  79%|███████▉  | 149/189 [04:00<01:06,  1.67s/it, loss=7.88]Epoch 4/5:  79%|███████▉  | 149/189 [04:02<01:06,  1.67s/it, loss=7.77]Epoch 4/5:  79%|███████▉  | 150/189 [04:02<01:05,  1.68s/it, loss=7.77]Epoch 4/5:  79%|███████▉  | 150/189 [04:04<01:05,  1.68s/it, loss=7.93]Epoch 4/5:  80%|███████▉  | 151/189 [04:04<01:04,  1.70s/it, loss=7.93]Epoch 4/5:  80%|███████▉  | 151/189 [04:05<01:04,  1.70s/it, loss=7.93]Epoch 4/5:  80%|████████  | 152/189 [04:05<01:02,  1.69s/it, loss=7.93]Epoch 4/5:  80%|████████  | 152/189 [04:07<01:02,  1.69s/it, loss=7.96]Epoch 4/5:  81%|████████  | 153/189 [04:07<01:01,  1.71s/it, loss=7.96]Epoch 4/5:  81%|████████  | 153/189 [04:09<01:01,  1.71s/it, loss=7.98]Epoch 4/5:  81%|████████▏ | 154/189 [04:09<00:59,  1.71s/it, loss=7.98]Epoch 4/5:  81%|████████▏ | 154/189 [04:10<00:59,  1.71s/it, loss=7.90]Epoch 4/5:  82%|████████▏ | 155/189 [04:10<00:58,  1.71s/it, loss=7.90]Epoch 4/5:  82%|████████▏ | 155/189 [04:12<00:58,  1.71s/it, loss=7.77]Epoch 4/5:  83%|████████▎ | 156/189 [04:12<00:54,  1.65s/it, loss=7.77]Epoch 4/5:  83%|████████▎ | 156/189 [04:14<00:54,  1.65s/it, loss=7.96]Epoch 4/5:  83%|████████▎ | 157/189 [04:14<00:53,  1.66s/it, loss=7.96]Epoch 4/5:  83%|████████▎ | 157/189 [04:15<00:53,  1.66s/it, loss=7.86]Epoch 4/5:  84%|████████▎ | 158/189 [04:15<00:50,  1.63s/it, loss=7.86]Epoch 4/5:  84%|████████▎ | 158/189 [04:17<00:50,  1.63s/it, loss=7.83]Epoch 4/5:  84%|████████▍ | 159/189 [04:17<00:49,  1.64s/it, loss=7.83]Epoch 4/5:  84%|████████▍ | 159/189 [04:18<00:49,  1.64s/it, loss=7.81]Epoch 4/5:  85%|████████▍ | 160/189 [04:18<00:47,  1.65s/it, loss=7.81]Epoch 4/5:  85%|████████▍ | 160/189 [04:20<00:47,  1.65s/it, loss=7.94]Epoch 4/5:  85%|████████▌ | 161/189 [04:20<00:46,  1.66s/it, loss=7.94]Epoch 4/5:  85%|████████▌ | 161/189 [04:22<00:46,  1.66s/it, loss=7.79]Epoch 4/5:  86%|████████▌ | 162/189 [04:22<00:44,  1.66s/it, loss=7.79]Epoch 4/5:  86%|████████▌ | 162/189 [04:23<00:44,  1.66s/it, loss=7.77]Epoch 4/5:  86%|████████▌ | 163/189 [04:23<00:41,  1.61s/it, loss=7.77]Epoch 4/5:  86%|████████▌ | 163/189 [04:25<00:41,  1.61s/it, loss=7.72]Epoch 4/5:  87%|████████▋ | 164/189 [04:25<00:40,  1.63s/it, loss=7.72]Epoch 4/5:  87%|████████▋ | 164/189 [04:26<00:40,  1.63s/it, loss=8.04]Epoch 4/5:  87%|████████▋ | 165/189 [04:26<00:37,  1.58s/it, loss=8.04]Epoch 4/5:  87%|████████▋ | 165/189 [04:28<00:37,  1.58s/it, loss=7.95]Epoch 4/5:  88%|████████▊ | 166/189 [04:28<00:35,  1.53s/it, loss=7.95]Epoch 4/5:  88%|████████▊ | 166/189 [04:29<00:35,  1.53s/it, loss=7.90]Epoch 4/5:  88%|████████▊ | 167/189 [04:29<00:34,  1.55s/it, loss=7.90]Epoch 4/5:  88%|████████▊ | 167/189 [04:31<00:34,  1.55s/it, loss=7.85]Epoch 4/5:  89%|████████▉ | 168/189 [04:31<00:33,  1.58s/it, loss=7.85]Epoch 4/5:  89%|████████▉ | 168/189 [04:33<00:33,  1.58s/it, loss=7.91]Epoch 4/5:  89%|████████▉ | 169/189 [04:33<00:32,  1.60s/it, loss=7.91]Epoch 4/5:  89%|████████▉ | 169/189 [04:34<00:32,  1.60s/it, loss=7.77]Epoch 4/5:  90%|████████▉ | 170/189 [04:34<00:29,  1.58s/it, loss=7.77]Epoch 4/5:  90%|████████▉ | 170/189 [04:36<00:29,  1.58s/it, loss=7.96]Epoch 4/5:  90%|█████████ | 171/189 [04:36<00:29,  1.62s/it, loss=7.96]Epoch 4/5:  90%|█████████ | 171/189 [04:38<00:29,  1.62s/it, loss=8.05]Epoch 4/5:  91%|█████████ | 172/189 [04:38<00:27,  1.60s/it, loss=8.05]Epoch 4/5:  91%|█████████ | 172/189 [04:39<00:27,  1.60s/it, loss=8.11]Epoch 4/5:  92%|█████████▏| 173/189 [04:39<00:25,  1.60s/it, loss=8.11]Epoch 4/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.60s/it, loss=7.86]Epoch 4/5:  92%|█████████▏| 174/189 [04:41<00:23,  1.59s/it, loss=7.86]Epoch 4/5:  92%|█████████▏| 174/189 [04:42<00:23,  1.59s/it, loss=7.77]Epoch 4/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.60s/it, loss=7.77]Epoch 4/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.60s/it, loss=7.96]Epoch 4/5:  93%|█████████▎| 176/189 [04:44<00:21,  1.62s/it, loss=7.96]Epoch 4/5:  93%|█████████▎| 176/189 [04:46<00:21,  1.62s/it, loss=7.82]Epoch 4/5:  94%|█████████▎| 177/189 [04:46<00:19,  1.63s/it, loss=7.82]Epoch 4/5:  94%|█████████▎| 177/189 [04:47<00:19,  1.63s/it, loss=7.88]Epoch 4/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.62s/it, loss=7.88]Epoch 4/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.62s/it, loss=7.86]Epoch 4/5:  95%|█████████▍| 179/189 [04:49<00:16,  1.60s/it, loss=7.86]Epoch 4/5:  95%|█████████▍| 179/189 [04:50<00:16,  1.60s/it, loss=7.97]Epoch 4/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.60s/it, loss=7.97]Epoch 4/5:  95%|█████████▌| 180/189 [04:52<00:14,  1.60s/it, loss=7.95]Epoch 4/5:  96%|█████████▌| 181/189 [04:52<00:12,  1.61s/it, loss=7.95]Epoch 4/5:  96%|█████████▌| 181/189 [04:53<00:12,  1.61s/it, loss=7.97]Epoch 4/5:  96%|█████████▋| 182/189 [04:53<00:10,  1.56s/it, loss=7.97]Epoch 4/5:  96%|█████████▋| 182/189 [04:55<00:10,  1.56s/it, loss=7.95]Epoch 4/5:  97%|█████████▋| 183/189 [04:55<00:09,  1.62s/it, loss=7.95]Epoch 4/5:  97%|█████████▋| 183/189 [04:57<00:09,  1.62s/it, loss=7.92]Epoch 4/5:  97%|█████████▋| 184/189 [04:57<00:08,  1.64s/it, loss=7.92]Epoch 4/5:  97%|█████████▋| 184/189 [04:59<00:08,  1.64s/it, loss=7.97]Epoch 4/5:  98%|█████████▊| 185/189 [04:59<00:06,  1.66s/it, loss=7.97]Epoch 4/5:  98%|█████████▊| 185/189 [05:00<00:06,  1.66s/it, loss=7.78]Epoch 4/5:  98%|█████████▊| 186/189 [05:00<00:05,  1.67s/it, loss=7.78]Epoch 4/5:  98%|█████████▊| 186/189 [05:02<00:05,  1.67s/it, loss=7.72]Epoch 4/5:  99%|█████████▉| 187/189 [05:02<00:03,  1.57s/it, loss=7.72]Epoch 4/5:  99%|█████████▉| 187/189 [05:03<00:03,  1.57s/it, loss=7.84]Epoch 4/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.55s/it, loss=7.84]Epoch 4/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.55s/it, loss=7.76]Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00,  1.54s/it, loss=7.76]Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00,  1.61s/it, loss=7.76]
  0%|          | 0/23 [00:00<?, ?it/s]  4%|▍         | 1/23 [00:00<00:07,  2.82it/s]  9%|▊         | 2/23 [00:00<00:07,  2.96it/s] 13%|█▎        | 3/23 [00:00<00:05,  3.48it/s] 17%|█▋        | 4/23 [00:01<00:06,  2.97it/s] 22%|██▏       | 5/23 [00:01<00:06,  2.89it/s] 26%|██▌       | 6/23 [00:02<00:05,  2.87it/s] 30%|███       | 7/23 [00:02<00:05,  2.90it/s] 35%|███▍      | 8/23 [00:02<00:05,  2.97it/s] 39%|███▉      | 9/23 [00:03<00:04,  3.02it/s] 43%|████▎     | 10/23 [00:03<00:04,  2.97it/s] 48%|████▊     | 11/23 [00:03<00:04,  2.92it/s] 52%|█████▏    | 12/23 [00:04<00:03,  3.05it/s] 57%|█████▋    | 13/23 [00:04<00:03,  2.98it/s] 61%|██████    | 14/23 [00:04<00:03,  2.97it/s] 65%|██████▌   | 15/23 [00:05<00:02,  2.88it/s] 70%|██████▉   | 16/23 [00:05<00:02,  2.88it/s] 74%|███████▍  | 17/23 [00:05<00:02,  2.81it/s] 78%|███████▊  | 18/23 [00:06<00:01,  2.89it/s] 83%|████████▎ | 19/23 [00:06<00:01,  2.94it/s] 87%|████████▋ | 20/23 [00:06<00:00,  3.06it/s] 91%|█████████▏| 21/23 [00:07<00:00,  3.05it/s] 96%|█████████▌| 22/23 [00:07<00:00,  2.95it/s]100%|██████████| 23/23 [00:07<00:00,  2.92it/s]100%|██████████| 23/23 [00:07<00:00,  2.95it/s]

Epoch 4: train_loss=8.0537 | R@10=0.0265 | DCG@10=0.2720 | NDCG@10=0.0660
Epoch 5/5:   0%|          | 0/189 [00:00<?, ?it/s]Epoch 5/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=7.68]Epoch 5/5:   1%|          | 1/189 [00:01<04:44,  1.51s/it, loss=7.68]Epoch 5/5:   1%|          | 1/189 [00:02<04:44,  1.51s/it, loss=7.91]Epoch 5/5:   1%|          | 2/189 [00:02<04:39,  1.49s/it, loss=7.91]Epoch 5/5:   1%|          | 2/189 [00:04<04:39,  1.49s/it, loss=7.74]Epoch 5/5:   2%|▏         | 3/189 [00:04<04:38,  1.50s/it, loss=7.74]Epoch 5/5:   2%|▏         | 3/189 [00:05<04:38,  1.50s/it, loss=7.80]Epoch 5/5:   2%|▏         | 4/189 [00:05<04:33,  1.48s/it, loss=7.80]Epoch 5/5:   2%|▏         | 4/189 [00:07<04:33,  1.48s/it, loss=7.79]Epoch 5/5:   3%|▎         | 5/189 [00:07<04:37,  1.51s/it, loss=7.79]Epoch 5/5:   3%|▎         | 5/189 [00:09<04:37,  1.51s/it, loss=7.93]Epoch 5/5:   3%|▎         | 6/189 [00:09<04:42,  1.55s/it, loss=7.93]Epoch 5/5:   3%|▎         | 6/189 [00:10<04:42,  1.55s/it, loss=7.79]Epoch 5/5:   4%|▎         | 7/189 [00:10<04:28,  1.47s/it, loss=7.79]Epoch 5/5:   4%|▎         | 7/189 [00:12<04:28,  1.47s/it, loss=8.03]Epoch 5/5:   4%|▍         | 8/189 [00:12<04:41,  1.55s/it, loss=8.03]Epoch 5/5:   4%|▍         | 8/189 [00:13<04:41,  1.55s/it, loss=7.76]Epoch 5/5:   5%|▍         | 9/189 [00:13<04:36,  1.54s/it, loss=7.76]Epoch 5/5:   5%|▍         | 9/189 [00:15<04:36,  1.54s/it, loss=7.93]Epoch 5/5:   5%|▌         | 10/189 [00:15<04:29,  1.51s/it, loss=7.93]Epoch 5/5:   5%|▌         | 10/189 [00:16<04:29,  1.51s/it, loss=8.03]Epoch 5/5:   6%|▌         | 11/189 [00:16<04:30,  1.52s/it, loss=8.03]Epoch 5/5:   6%|▌         | 11/189 [00:18<04:30,  1.52s/it, loss=7.88]Epoch 5/5:   6%|▋         | 12/189 [00:18<04:32,  1.54s/it, loss=7.88]Epoch 5/5:   6%|▋         | 12/189 [00:19<04:32,  1.54s/it, loss=7.80]Epoch 5/5:   7%|▋         | 13/189 [00:19<04:29,  1.53s/it, loss=7.80]Epoch 5/5:   7%|▋         | 13/189 [00:21<04:29,  1.53s/it, loss=7.97]Epoch 5/5:   7%|▋         | 14/189 [00:21<04:24,  1.51s/it, loss=7.97]Epoch 5/5:   7%|▋         | 14/189 [00:22<04:24,  1.51s/it, loss=8.00]Epoch 5/5:   8%|▊         | 15/189 [00:22<04:24,  1.52s/it, loss=8.00]Epoch 5/5:   8%|▊         | 15/189 [00:24<04:24,  1.52s/it, loss=7.76]Epoch 5/5:   8%|▊         | 16/189 [00:24<04:25,  1.54s/it, loss=7.76]Epoch 5/5:   8%|▊         | 16/189 [00:25<04:25,  1.54s/it, loss=8.08]Epoch 5/5:   9%|▉         | 17/189 [00:25<04:22,  1.53s/it, loss=8.08]Epoch 5/5:   9%|▉         | 17/189 [00:27<04:22,  1.53s/it, loss=7.82]Epoch 5/5:  10%|▉         | 18/189 [00:27<04:20,  1.53s/it, loss=7.82]Epoch 5/5:  10%|▉         | 18/189 [00:28<04:20,  1.53s/it, loss=7.66]Epoch 5/5:  10%|█         | 19/189 [00:28<04:24,  1.56s/it, loss=7.66]Epoch 5/5:  10%|█         | 19/189 [00:30<04:24,  1.56s/it, loss=7.69]Epoch 5/5:  11%|█         | 20/189 [00:30<04:22,  1.55s/it, loss=7.69]Epoch 5/5:  11%|█         | 20/189 [00:32<04:22,  1.55s/it, loss=7.87]Epoch 5/5:  11%|█         | 21/189 [00:32<04:22,  1.56s/it, loss=7.87]Epoch 5/5:  11%|█         | 21/189 [00:33<04:22,  1.56s/it, loss=7.87]Epoch 5/5:  12%|█▏        | 22/189 [00:33<04:21,  1.56s/it, loss=7.87]Epoch 5/5:  12%|█▏        | 22/189 [00:35<04:21,  1.56s/it, loss=7.87]Epoch 5/5:  12%|█▏        | 23/189 [00:35<04:16,  1.54s/it, loss=7.87]Epoch 5/5:  12%|█▏        | 23/189 [00:36<04:16,  1.54s/it, loss=7.64]Epoch 5/5:  13%|█▎        | 24/189 [00:36<04:13,  1.54s/it, loss=7.64]Epoch 5/5:  13%|█▎        | 24/189 [00:38<04:13,  1.54s/it, loss=7.83]Epoch 5/5:  13%|█▎        | 25/189 [00:38<04:14,  1.55s/it, loss=7.83]Epoch 5/5:  13%|█▎        | 25/189 [00:39<04:14,  1.55s/it, loss=7.88]Epoch 5/5:  14%|█▍        | 26/189 [00:39<04:12,  1.55s/it, loss=7.88]Epoch 5/5:  14%|█▍        | 26/189 [00:41<04:12,  1.55s/it, loss=7.93]Epoch 5/5:  14%|█▍        | 27/189 [00:41<04:12,  1.56s/it, loss=7.93]Epoch 5/5:  14%|█▍        | 27/189 [00:42<04:12,  1.56s/it, loss=7.88]Epoch 5/5:  15%|█▍        | 28/189 [00:42<04:06,  1.53s/it, loss=7.88]Epoch 5/5:  15%|█▍        | 28/189 [00:44<04:06,  1.53s/it, loss=7.90]Epoch 5/5:  15%|█▌        | 29/189 [00:44<03:49,  1.44s/it, loss=7.90]Epoch 5/5:  15%|█▌        | 29/189 [00:45<03:49,  1.44s/it, loss=7.81]Epoch 5/5:  16%|█▌        | 30/189 [00:45<03:56,  1.49s/it, loss=7.81]Epoch 5/5:  16%|█▌        | 30/189 [00:47<03:56,  1.49s/it, loss=7.91]Epoch 5/5:  16%|█▋        | 31/189 [00:47<04:01,  1.53s/it, loss=7.91]Epoch 5/5:  16%|█▋        | 31/189 [00:49<04:01,  1.53s/it, loss=7.87]Epoch 5/5:  17%|█▋        | 32/189 [00:49<04:06,  1.57s/it, loss=7.87]Epoch 5/5:  17%|█▋        | 32/189 [00:50<04:06,  1.57s/it, loss=7.83]Epoch 5/5:  17%|█▋        | 33/189 [00:50<04:02,  1.56s/it, loss=7.83]Epoch 5/5:  17%|█▋        | 33/189 [00:52<04:02,  1.56s/it, loss=7.84]Epoch 5/5:  18%|█▊        | 34/189 [00:52<04:04,  1.58s/it, loss=7.84]Epoch 5/5:  18%|█▊        | 34/189 [00:53<04:04,  1.58s/it, loss=7.87]Epoch 5/5:  19%|█▊        | 35/189 [00:53<03:57,  1.55s/it, loss=7.87]Epoch 5/5:  19%|█▊        | 35/189 [00:55<03:57,  1.55s/it, loss=7.78]Epoch 5/5:  19%|█▉        | 36/189 [00:55<03:57,  1.55s/it, loss=7.78]Epoch 5/5:  19%|█▉        | 36/189 [00:56<03:57,  1.55s/it, loss=7.81]Epoch 5/5:  20%|█▉        | 37/189 [00:56<04:05,  1.62s/it, loss=7.81]Epoch 5/5:  20%|█▉        | 37/189 [00:58<04:05,  1.62s/it, loss=7.78]Epoch 5/5:  20%|██        | 38/189 [00:58<04:05,  1.63s/it, loss=7.78]Epoch 5/5:  20%|██        | 38/189 [01:00<04:05,  1.63s/it, loss=7.81]Epoch 5/5:  21%|██        | 39/189 [01:00<04:05,  1.64s/it, loss=7.81]Epoch 5/5:  21%|██        | 39/189 [01:01<04:05,  1.64s/it, loss=7.80]Epoch 5/5:  21%|██        | 40/189 [01:01<04:03,  1.63s/it, loss=7.80]Epoch 5/5:  21%|██        | 40/189 [01:03<04:03,  1.63s/it, loss=7.67]Epoch 5/5:  22%|██▏       | 41/189 [01:03<04:01,  1.63s/it, loss=7.67]Epoch 5/5:  22%|██▏       | 41/189 [01:05<04:01,  1.63s/it, loss=7.80]Epoch 5/5:  22%|██▏       | 42/189 [01:05<04:04,  1.67s/it, loss=7.80]Epoch 5/5:  22%|██▏       | 42/189 [01:06<04:04,  1.67s/it, loss=7.76]Epoch 5/5:  23%|██▎       | 43/189 [01:06<04:03,  1.67s/it, loss=7.76]Epoch 5/5:  23%|██▎       | 43/189 [01:08<04:03,  1.67s/it, loss=7.85]Epoch 5/5:  23%|██▎       | 44/189 [01:08<04:02,  1.67s/it, loss=7.85]Epoch 5/5:  23%|██▎       | 44/189 [01:10<04:02,  1.67s/it, loss=7.73]Epoch 5/5:  24%|██▍       | 45/189 [01:10<04:00,  1.67s/it, loss=7.73]Epoch 5/5:  24%|██▍       | 45/189 [01:11<04:00,  1.67s/it, loss=7.94]Epoch 5/5:  24%|██▍       | 46/189 [01:11<03:57,  1.66s/it, loss=7.94]Epoch 5/5:  24%|██▍       | 46/189 [01:13<03:57,  1.66s/it, loss=7.81]Epoch 5/5:  25%|██▍       | 47/189 [01:13<03:55,  1.66s/it, loss=7.81]Epoch 5/5:  25%|██▍       | 47/189 [01:15<03:55,  1.66s/it, loss=7.85]Epoch 5/5:  25%|██▌       | 48/189 [01:15<03:59,  1.70s/it, loss=7.85]Epoch 5/5:  25%|██▌       | 48/189 [01:17<03:59,  1.70s/it, loss=7.63]Epoch 5/5:  26%|██▌       | 49/189 [01:17<04:00,  1.72s/it, loss=7.63]Epoch 5/5:  26%|██▌       | 49/189 [01:18<04:00,  1.72s/it, loss=7.65]Epoch 5/5:  26%|██▋       | 50/189 [01:18<03:59,  1.72s/it, loss=7.65]Epoch 5/5:  26%|██▋       | 50/189 [01:20<03:59,  1.72s/it, loss=7.83]Epoch 5/5:  27%|██▋       | 51/189 [01:20<03:55,  1.70s/it, loss=7.83]Epoch 5/5:  27%|██▋       | 51/189 [01:22<03:55,  1.70s/it, loss=7.81]Epoch 5/5:  28%|██▊       | 52/189 [01:22<03:57,  1.73s/it, loss=7.81]Epoch 5/5:  28%|██▊       | 52/189 [01:23<03:57,  1.73s/it, loss=7.69]Epoch 5/5:  28%|██▊       | 53/189 [01:23<03:51,  1.70s/it, loss=7.69]Epoch 5/5:  28%|██▊       | 53/189 [01:25<03:51,  1.70s/it, loss=7.81]Epoch 5/5:  29%|██▊       | 54/189 [01:25<03:51,  1.71s/it, loss=7.81]Epoch 5/5:  29%|██▊       | 54/189 [01:27<03:51,  1.71s/it, loss=7.74]Epoch 5/5:  29%|██▉       | 55/189 [01:27<03:44,  1.67s/it, loss=7.74]Epoch 5/5:  29%|██▉       | 55/189 [01:28<03:44,  1.67s/it, loss=7.67]Epoch 5/5:  30%|██▉       | 56/189 [01:28<03:37,  1.63s/it, loss=7.67]Epoch 5/5:  30%|██▉       | 56/189 [01:30<03:37,  1.63s/it, loss=7.84]Epoch 5/5:  30%|███       | 57/189 [01:30<03:38,  1.65s/it, loss=7.84]Epoch 5/5:  30%|███       | 57/189 [01:32<03:38,  1.65s/it, loss=7.85]Epoch 5/5:  31%|███       | 58/189 [01:32<03:37,  1.66s/it, loss=7.85]Epoch 5/5:  31%|███       | 58/189 [01:33<03:37,  1.66s/it, loss=7.93]Epoch 5/5:  31%|███       | 59/189 [01:33<03:33,  1.65s/it, loss=7.93]Epoch 5/5:  31%|███       | 59/189 [01:35<03:33,  1.65s/it, loss=7.71]Epoch 5/5:  32%|███▏      | 60/189 [01:35<03:30,  1.63s/it, loss=7.71]Epoch 5/5:  32%|███▏      | 60/189 [01:36<03:30,  1.63s/it, loss=7.58]Epoch 5/5:  32%|███▏      | 61/189 [01:36<03:26,  1.61s/it, loss=7.58]Epoch 5/5:  32%|███▏      | 61/189 [01:38<03:26,  1.61s/it, loss=7.78]Epoch 5/5:  33%|███▎      | 62/189 [01:38<03:26,  1.63s/it, loss=7.78]Epoch 5/5:  33%|███▎      | 62/189 [01:40<03:26,  1.63s/it, loss=7.76]Epoch 5/5:  33%|███▎      | 63/189 [01:40<03:22,  1.61s/it, loss=7.76]Epoch 5/5:  33%|███▎      | 63/189 [01:41<03:22,  1.61s/it, loss=7.74]Epoch 5/5:  34%|███▍      | 64/189 [01:41<03:23,  1.63s/it, loss=7.74]Epoch 5/5:  34%|███▍      | 64/189 [01:43<03:23,  1.63s/it, loss=7.81]Epoch 5/5:  34%|███▍      | 65/189 [01:43<03:24,  1.65s/it, loss=7.81]Epoch 5/5:  34%|███▍      | 65/189 [01:45<03:24,  1.65s/it, loss=7.62]Epoch 5/5:  35%|███▍      | 66/189 [01:45<03:22,  1.64s/it, loss=7.62]Epoch 5/5:  35%|███▍      | 66/189 [01:46<03:22,  1.64s/it, loss=7.59]Epoch 5/5:  35%|███▌      | 67/189 [01:46<03:18,  1.63s/it, loss=7.59]Epoch 5/5:  35%|███▌      | 67/189 [01:48<03:18,  1.63s/it, loss=7.66]Epoch 5/5:  36%|███▌      | 68/189 [01:48<03:17,  1.63s/it, loss=7.66]Epoch 5/5:  36%|███▌      | 68/189 [01:49<03:17,  1.63s/it, loss=7.72]Epoch 5/5:  37%|███▋      | 69/189 [01:49<03:13,  1.61s/it, loss=7.72]Epoch 5/5:  37%|███▋      | 69/189 [01:51<03:13,  1.61s/it, loss=7.85]Epoch 5/5:  37%|███▋      | 70/189 [01:51<03:14,  1.64s/it, loss=7.85]Epoch 5/5:  37%|███▋      | 70/189 [01:53<03:14,  1.64s/it, loss=7.71]Epoch 5/5:  38%|███▊      | 71/189 [01:53<03:13,  1.64s/it, loss=7.71]Epoch 5/5:  38%|███▊      | 71/189 [01:54<03:13,  1.64s/it, loss=7.71]Epoch 5/5:  38%|███▊      | 72/189 [01:54<03:07,  1.61s/it, loss=7.71]Epoch 5/5:  38%|███▊      | 72/189 [01:56<03:07,  1.61s/it, loss=7.80]Epoch 5/5:  39%|███▊      | 73/189 [01:56<03:09,  1.64s/it, loss=7.80]Epoch 5/5:  39%|███▊      | 73/189 [01:58<03:09,  1.64s/it, loss=7.79]Epoch 5/5:  39%|███▉      | 74/189 [01:58<03:05,  1.62s/it, loss=7.79]Epoch 5/5:  39%|███▉      | 74/189 [01:59<03:05,  1.62s/it, loss=7.75]Epoch 5/5:  40%|███▉      | 75/189 [01:59<03:03,  1.61s/it, loss=7.75]Epoch 5/5:  40%|███▉      | 75/189 [02:01<03:03,  1.61s/it, loss=7.79]Epoch 5/5:  40%|████      | 76/189 [02:01<03:04,  1.63s/it, loss=7.79]Epoch 5/5:  40%|████      | 76/189 [02:02<03:04,  1.63s/it, loss=7.69]Epoch 5/5:  41%|████      | 77/189 [02:02<02:57,  1.59s/it, loss=7.69]Epoch 5/5:  41%|████      | 77/189 [02:04<02:57,  1.59s/it, loss=7.83]Epoch 5/5:  41%|████▏     | 78/189 [02:04<02:58,  1.60s/it, loss=7.83]Epoch 5/5:  41%|████▏     | 78/189 [02:06<02:58,  1.60s/it, loss=7.76]Epoch 5/5:  42%|████▏     | 79/189 [02:06<03:02,  1.66s/it, loss=7.76]Epoch 5/5:  42%|████▏     | 79/189 [02:08<03:02,  1.66s/it, loss=7.67]Epoch 5/5:  42%|████▏     | 80/189 [02:08<03:03,  1.68s/it, loss=7.67]Epoch 5/5:  42%|████▏     | 80/189 [02:09<03:03,  1.68s/it, loss=7.84]Epoch 5/5:  43%|████▎     | 81/189 [02:09<03:01,  1.68s/it, loss=7.84]Epoch 5/5:  43%|████▎     | 81/189 [02:11<03:01,  1.68s/it, loss=8.03]Epoch 5/5:  43%|████▎     | 82/189 [02:11<03:02,  1.70s/it, loss=8.03]Epoch 5/5:  43%|████▎     | 82/189 [02:13<03:02,  1.70s/it, loss=7.73]Epoch 5/5:  44%|████▍     | 83/189 [02:13<02:54,  1.64s/it, loss=7.73]Epoch 5/5:  44%|████▍     | 83/189 [02:14<02:54,  1.64s/it, loss=7.82]Epoch 5/5:  44%|████▍     | 84/189 [02:14<02:53,  1.65s/it, loss=7.82]Epoch 5/5:  44%|████▍     | 84/189 [02:16<02:53,  1.65s/it, loss=7.69]Epoch 5/5:  45%|████▍     | 85/189 [02:16<02:49,  1.63s/it, loss=7.69]Epoch 5/5:  45%|████▍     | 85/189 [02:17<02:49,  1.63s/it, loss=7.72]Epoch 5/5:  46%|████▌     | 86/189 [02:17<02:44,  1.60s/it, loss=7.72]Epoch 5/5:  46%|████▌     | 86/189 [02:19<02:44,  1.60s/it, loss=7.68]Epoch 5/5:  46%|████▌     | 87/189 [02:19<02:41,  1.58s/it, loss=7.68]Epoch 5/5:  46%|████▌     | 87/189 [02:20<02:41,  1.58s/it, loss=7.72]Epoch 5/5:  47%|████▋     | 88/189 [02:20<02:41,  1.60s/it, loss=7.72]Epoch 5/5:  47%|████▋     | 88/189 [02:22<02:41,  1.60s/it, loss=7.60]Epoch 5/5:  47%|████▋     | 89/189 [02:22<02:40,  1.60s/it, loss=7.60]Epoch 5/5:  47%|████▋     | 89/189 [02:24<02:40,  1.60s/it, loss=7.75]Epoch 5/5:  48%|████▊     | 90/189 [02:24<02:42,  1.64s/it, loss=7.75]Epoch 5/5:  48%|████▊     | 90/189 [02:25<02:42,  1.64s/it, loss=7.84]Epoch 5/5:  48%|████▊     | 91/189 [02:25<02:39,  1.63s/it, loss=7.84]Epoch 5/5:  48%|████▊     | 91/189 [02:27<02:39,  1.63s/it, loss=7.83]Epoch 5/5:  49%|████▊     | 92/189 [02:27<02:28,  1.53s/it, loss=7.83]Epoch 5/5:  49%|████▊     | 92/189 [02:28<02:28,  1.53s/it, loss=7.78]Epoch 5/5:  49%|████▉     | 93/189 [02:28<02:30,  1.57s/it, loss=7.78]Epoch 5/5:  49%|████▉     | 93/189 [02:30<02:30,  1.57s/it, loss=7.77]Epoch 5/5:  50%|████▉     | 94/189 [02:30<02:33,  1.61s/it, loss=7.77]Epoch 5/5:  50%|████▉     | 94/189 [02:32<02:33,  1.61s/it, loss=7.85]Epoch 5/5:  50%|█████     | 95/189 [02:32<02:31,  1.62s/it, loss=7.85]Epoch 5/5:  50%|█████     | 95/189 [02:33<02:31,  1.62s/it, loss=7.67]Epoch 5/5:  51%|█████     | 96/189 [02:33<02:26,  1.58s/it, loss=7.67]Epoch 5/5:  51%|█████     | 96/189 [02:35<02:26,  1.58s/it, loss=7.77]Epoch 5/5:  51%|█████▏    | 97/189 [02:35<02:21,  1.54s/it, loss=7.77]Epoch 5/5:  51%|█████▏    | 97/189 [02:36<02:21,  1.54s/it, loss=7.68]Epoch 5/5:  52%|█████▏    | 98/189 [02:36<02:23,  1.58s/it, loss=7.68]Epoch 5/5:  52%|█████▏    | 98/189 [02:38<02:23,  1.58s/it, loss=7.81]Epoch 5/5:  52%|█████▏    | 99/189 [02:38<02:25,  1.61s/it, loss=7.81]Epoch 5/5:  52%|█████▏    | 99/189 [02:40<02:25,  1.61s/it, loss=7.78]Epoch 5/5:  53%|█████▎    | 100/189 [02:40<02:25,  1.63s/it, loss=7.78]Epoch 5/5:  53%|█████▎    | 100/189 [02:41<02:25,  1.63s/it, loss=7.66]Epoch 5/5:  53%|█████▎    | 101/189 [02:41<02:24,  1.64s/it, loss=7.66]Epoch 5/5:  53%|█████▎    | 101/189 [02:43<02:24,  1.64s/it, loss=7.66]Epoch 5/5:  54%|█████▍    | 102/189 [02:43<02:21,  1.62s/it, loss=7.66]Epoch 5/5:  54%|█████▍    | 102/189 [02:45<02:21,  1.62s/it, loss=7.67]Epoch 5/5:  54%|█████▍    | 103/189 [02:45<02:22,  1.66s/it, loss=7.67]Epoch 5/5:  54%|█████▍    | 103/189 [02:46<02:22,  1.66s/it, loss=7.71]Epoch 5/5:  55%|█████▌    | 104/189 [02:46<02:18,  1.62s/it, loss=7.71]Epoch 5/5:  55%|█████▌    | 104/189 [02:48<02:18,  1.62s/it, loss=7.64]Epoch 5/5:  56%|█████▌    | 105/189 [02:48<02:18,  1.65s/it, loss=7.64]Epoch 5/5:  56%|█████▌    | 105/189 [02:50<02:18,  1.65s/it, loss=7.69]Epoch 5/5:  56%|█████▌    | 106/189 [02:50<02:16,  1.64s/it, loss=7.69]Epoch 5/5:  56%|█████▌    | 106/189 [02:51<02:16,  1.64s/it, loss=7.73]Epoch 5/5:  57%|█████▋    | 107/189 [02:51<02:13,  1.63s/it, loss=7.73]Epoch 5/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.63s/it, loss=7.70]Epoch 5/5:  57%|█████▋    | 108/189 [02:53<02:08,  1.58s/it, loss=7.70]Epoch 5/5:  57%|█████▋    | 108/189 [02:54<02:08,  1.58s/it, loss=7.76]Epoch 5/5:  58%|█████▊    | 109/189 [02:54<02:04,  1.55s/it, loss=7.76]Epoch 5/5:  58%|█████▊    | 109/189 [02:56<02:04,  1.55s/it, loss=7.62]Epoch 5/5:  58%|█████▊    | 110/189 [02:56<02:02,  1.56s/it, loss=7.62]Epoch 5/5:  58%|█████▊    | 110/189 [02:57<02:02,  1.56s/it, loss=7.79]Epoch 5/5:  59%|█████▊    | 111/189 [02:57<02:00,  1.54s/it, loss=7.79]Epoch 5/5:  59%|█████▊    | 111/189 [02:59<02:00,  1.54s/it, loss=7.76]Epoch 5/5:  59%|█████▉    | 112/189 [02:59<01:58,  1.54s/it, loss=7.76]Epoch 5/5:  59%|█████▉    | 112/189 [03:00<01:58,  1.54s/it, loss=7.63]Epoch 5/5:  60%|█████▉    | 113/189 [03:00<01:58,  1.56s/it, loss=7.63]Epoch 5/5:  60%|█████▉    | 113/189 [03:02<01:58,  1.56s/it, loss=7.92]Epoch 5/5:  60%|██████    | 114/189 [03:02<01:54,  1.52s/it, loss=7.92]Epoch 5/5:  60%|██████    | 114/189 [03:03<01:54,  1.52s/it, loss=7.59]Epoch 5/5:  61%|██████    | 115/189 [03:03<01:52,  1.52s/it, loss=7.59]Epoch 5/5:  61%|██████    | 115/189 [03:05<01:52,  1.52s/it, loss=7.76]Epoch 5/5:  61%|██████▏   | 116/189 [03:05<01:52,  1.54s/it, loss=7.76]Epoch 5/5:  61%|██████▏   | 116/189 [03:06<01:52,  1.54s/it, loss=7.61]Epoch 5/5:  62%|██████▏   | 117/189 [03:06<01:50,  1.53s/it, loss=7.61]Epoch 5/5:  62%|██████▏   | 117/189 [03:08<01:50,  1.53s/it, loss=7.73]Epoch 5/5:  62%|██████▏   | 118/189 [03:08<01:46,  1.50s/it, loss=7.73]Epoch 5/5:  62%|██████▏   | 118/189 [03:09<01:46,  1.50s/it, loss=7.88]Epoch 5/5:  63%|██████▎   | 119/189 [03:09<01:47,  1.54s/it, loss=7.88]Epoch 5/5:  63%|██████▎   | 119/189 [03:11<01:47,  1.54s/it, loss=7.71]Epoch 5/5:  63%|██████▎   | 120/189 [03:11<01:43,  1.50s/it, loss=7.71]Epoch 5/5:  63%|██████▎   | 120/189 [03:12<01:43,  1.50s/it, loss=7.70]Epoch 5/5:  64%|██████▍   | 121/189 [03:12<01:44,  1.54s/it, loss=7.70]Epoch 5/5:  64%|██████▍   | 121/189 [03:14<01:44,  1.54s/it, loss=7.70]Epoch 5/5:  65%|██████▍   | 122/189 [03:14<01:43,  1.55s/it, loss=7.70]Epoch 5/5:  65%|██████▍   | 122/189 [03:16<01:43,  1.55s/it, loss=7.76]Epoch 5/5:  65%|██████▌   | 123/189 [03:16<01:45,  1.60s/it, loss=7.76]Epoch 5/5:  65%|██████▌   | 123/189 [03:17<01:45,  1.60s/it, loss=7.71]Epoch 5/5:  66%|██████▌   | 124/189 [03:17<01:45,  1.62s/it, loss=7.71]Epoch 5/5:  66%|██████▌   | 124/189 [03:19<01:45,  1.62s/it, loss=7.87]Epoch 5/5:  66%|██████▌   | 125/189 [03:19<01:41,  1.59s/it, loss=7.87]Epoch 5/5:  66%|██████▌   | 125/189 [03:21<01:41,  1.59s/it, loss=7.70]Epoch 5/5:  67%|██████▋   | 126/189 [03:21<01:42,  1.62s/it, loss=7.70]Epoch 5/5:  67%|██████▋   | 126/189 [03:22<01:42,  1.62s/it, loss=7.93]Epoch 5/5:  67%|██████▋   | 127/189 [03:22<01:41,  1.63s/it, loss=7.93]Epoch 5/5:  67%|██████▋   | 127/189 [03:24<01:41,  1.63s/it, loss=7.88]Epoch 5/5:  68%|██████▊   | 128/189 [03:24<01:40,  1.65s/it, loss=7.88]Epoch 5/5:  68%|██████▊   | 128/189 [03:26<01:40,  1.65s/it, loss=7.75]Epoch 5/5:  68%|██████▊   | 129/189 [03:26<01:39,  1.65s/it, loss=7.75]Epoch 5/5:  68%|██████▊   | 129/189 [03:27<01:39,  1.65s/it, loss=7.55]Epoch 5/5:  69%|██████▉   | 130/189 [03:27<01:36,  1.63s/it, loss=7.55]Epoch 5/5:  69%|██████▉   | 130/189 [03:29<01:36,  1.63s/it, loss=7.70]Epoch 5/5:  69%|██████▉   | 131/189 [03:29<01:34,  1.62s/it, loss=7.70]Epoch 5/5:  69%|██████▉   | 131/189 [03:30<01:34,  1.62s/it, loss=7.82]Epoch 5/5:  70%|██████▉   | 132/189 [03:30<01:31,  1.60s/it, loss=7.82]Epoch 5/5:  70%|██████▉   | 132/189 [03:32<01:31,  1.60s/it, loss=7.72]Epoch 5/5:  70%|███████   | 133/189 [03:32<01:27,  1.56s/it, loss=7.72]Epoch 5/5:  70%|███████   | 133/189 [03:34<01:27,  1.56s/it, loss=7.75]Epoch 5/5:  71%|███████   | 134/189 [03:34<01:29,  1.62s/it, loss=7.75]Epoch 5/5:  71%|███████   | 134/189 [03:35<01:29,  1.62s/it, loss=7.70]Epoch 5/5:  71%|███████▏  | 135/189 [03:35<01:29,  1.65s/it, loss=7.70]Epoch 5/5:  71%|███████▏  | 135/189 [03:37<01:29,  1.65s/it, loss=7.73]Epoch 5/5:  72%|███████▏  | 136/189 [03:37<01:26,  1.63s/it, loss=7.73]Epoch 5/5:  72%|███████▏  | 136/189 [03:39<01:26,  1.63s/it, loss=7.61]Epoch 5/5:  72%|███████▏  | 137/189 [03:39<01:25,  1.64s/it, loss=7.61]Epoch 5/5:  72%|███████▏  | 137/189 [03:40<01:25,  1.64s/it, loss=7.61]Epoch 5/5:  73%|███████▎  | 138/189 [03:40<01:23,  1.64s/it, loss=7.61]Epoch 5/5:  73%|███████▎  | 138/189 [03:42<01:23,  1.64s/it, loss=7.74]Epoch 5/5:  74%|███████▎  | 139/189 [03:42<01:20,  1.61s/it, loss=7.74]Epoch 5/5:  74%|███████▎  | 139/189 [03:43<01:20,  1.61s/it, loss=7.77]Epoch 5/5:  74%|███████▍  | 140/189 [03:43<01:18,  1.60s/it, loss=7.77]Epoch 5/5:  74%|███████▍  | 140/189 [03:45<01:18,  1.60s/it, loss=7.84]Epoch 5/5:  75%|███████▍  | 141/189 [03:45<01:18,  1.64s/it, loss=7.84]Epoch 5/5:  75%|███████▍  | 141/189 [03:47<01:18,  1.64s/it, loss=7.75]Epoch 5/5:  75%|███████▌  | 142/189 [03:47<01:16,  1.63s/it, loss=7.75]Epoch 5/5:  75%|███████▌  | 142/189 [03:48<01:16,  1.63s/it, loss=7.62]Epoch 5/5:  76%|███████▌  | 143/189 [03:48<01:14,  1.61s/it, loss=7.62]Epoch 5/5:  76%|███████▌  | 143/189 [03:50<01:14,  1.61s/it, loss=7.75]Epoch 5/5:  76%|███████▌  | 144/189 [03:50<01:11,  1.59s/it, loss=7.75]Epoch 5/5:  76%|███████▌  | 144/189 [03:51<01:11,  1.59s/it, loss=7.64]Epoch 5/5:  77%|███████▋  | 145/189 [03:51<01:10,  1.60s/it, loss=7.64]Epoch 5/5:  77%|███████▋  | 145/189 [03:53<01:10,  1.60s/it, loss=7.44]Epoch 5/5:  77%|███████▋  | 146/189 [03:53<01:08,  1.59s/it, loss=7.44]Epoch 5/5:  77%|███████▋  | 146/189 [03:55<01:08,  1.59s/it, loss=7.66]Epoch 5/5:  78%|███████▊  | 147/189 [03:55<01:06,  1.59s/it, loss=7.66]Epoch 5/5:  78%|███████▊  | 147/189 [03:56<01:06,  1.59s/it, loss=7.79]Epoch 5/5:  78%|███████▊  | 148/189 [03:56<01:06,  1.63s/it, loss=7.79]Epoch 5/5:  78%|███████▊  | 148/189 [03:58<01:06,  1.63s/it, loss=7.79]Epoch 5/5:  79%|███████▉  | 149/189 [03:58<01:05,  1.65s/it, loss=7.79]Epoch 5/5:  79%|███████▉  | 149/189 [04:00<01:05,  1.65s/it, loss=7.73]Epoch 5/5:  79%|███████▉  | 150/189 [04:00<01:04,  1.66s/it, loss=7.73]Epoch 5/5:  79%|███████▉  | 150/189 [04:01<01:04,  1.66s/it, loss=7.68]Epoch 5/5:  80%|███████▉  | 151/189 [04:01<01:03,  1.66s/it, loss=7.68]Epoch 5/5:  80%|███████▉  | 151/189 [04:03<01:03,  1.66s/it, loss=7.79]Epoch 5/5:  80%|████████  | 152/189 [04:03<01:01,  1.67s/it, loss=7.79]Epoch 5/5:  80%|████████  | 152/189 [04:05<01:01,  1.67s/it, loss=7.63]Epoch 5/5:  81%|████████  | 153/189 [04:05<01:00,  1.68s/it, loss=7.63]Epoch 5/5:  81%|████████  | 153/189 [04:06<01:00,  1.68s/it, loss=7.74]Epoch 5/5:  81%|████████▏ | 154/189 [04:06<00:58,  1.67s/it, loss=7.74]Epoch 5/5:  81%|████████▏ | 154/189 [04:08<00:58,  1.67s/it, loss=7.77]Epoch 5/5:  82%|████████▏ | 155/189 [04:08<00:57,  1.70s/it, loss=7.77]Epoch 5/5:  82%|████████▏ | 155/189 [04:10<00:57,  1.70s/it, loss=7.56]Epoch 5/5:  83%|████████▎ | 156/189 [04:10<00:55,  1.68s/it, loss=7.56]Epoch 5/5:  83%|████████▎ | 156/189 [04:11<00:55,  1.68s/it, loss=7.86]Epoch 5/5:  83%|████████▎ | 157/189 [04:11<00:51,  1.62s/it, loss=7.86]Epoch 5/5:  83%|████████▎ | 157/189 [04:13<00:51,  1.62s/it, loss=7.72]Epoch 5/5:  84%|████████▎ | 158/189 [04:13<00:50,  1.63s/it, loss=7.72]Epoch 5/5:  84%|████████▎ | 158/189 [04:14<00:50,  1.63s/it, loss=7.66]Epoch 5/5:  84%|████████▍ | 159/189 [04:14<00:48,  1.62s/it, loss=7.66]Epoch 5/5:  84%|████████▍ | 159/189 [04:16<00:48,  1.62s/it, loss=7.76]Epoch 5/5:  85%|████████▍ | 160/189 [04:16<00:45,  1.58s/it, loss=7.76]Epoch 5/5:  85%|████████▍ | 160/189 [04:18<00:45,  1.58s/it, loss=7.86]Epoch 5/5:  85%|████████▌ | 161/189 [04:18<00:44,  1.61s/it, loss=7.86]Epoch 5/5:  85%|████████▌ | 161/189 [04:19<00:44,  1.61s/it, loss=7.72]Epoch 5/5:  86%|████████▌ | 162/189 [04:19<00:41,  1.55s/it, loss=7.72]Epoch 5/5:  86%|████████▌ | 162/189 [04:21<00:41,  1.55s/it, loss=7.62]Epoch 5/5:  86%|████████▌ | 163/189 [04:21<00:40,  1.55s/it, loss=7.62]Epoch 5/5:  86%|████████▌ | 163/189 [04:22<00:40,  1.55s/it, loss=7.76]Epoch 5/5:  87%|████████▋ | 164/189 [04:22<00:39,  1.57s/it, loss=7.76]Epoch 5/5:  87%|████████▋ | 164/189 [04:24<00:39,  1.57s/it, loss=7.89]Epoch 5/5:  87%|████████▋ | 165/189 [04:24<00:37,  1.55s/it, loss=7.89]Epoch 5/5:  87%|████████▋ | 165/189 [04:25<00:37,  1.55s/it, loss=7.70]Epoch 5/5:  88%|████████▊ | 166/189 [04:25<00:36,  1.57s/it, loss=7.70]Epoch 5/5:  88%|████████▊ | 166/189 [04:27<00:36,  1.57s/it, loss=7.61]Epoch 5/5:  88%|████████▊ | 167/189 [04:27<00:34,  1.56s/it, loss=7.61]Epoch 5/5:  88%|████████▊ | 167/189 [04:29<00:34,  1.56s/it, loss=7.61]Epoch 5/5:  89%|████████▉ | 168/189 [04:29<00:33,  1.60s/it, loss=7.61]Epoch 5/5:  89%|████████▉ | 168/189 [04:30<00:33,  1.60s/it, loss=7.61]Epoch 5/5:  89%|████████▉ | 169/189 [04:30<00:32,  1.63s/it, loss=7.61]Epoch 5/5:  89%|████████▉ | 169/189 [04:32<00:32,  1.63s/it, loss=7.71]Epoch 5/5:  90%|████████▉ | 170/189 [04:32<00:30,  1.63s/it, loss=7.71]Epoch 5/5:  90%|████████▉ | 170/189 [04:34<00:30,  1.63s/it, loss=7.57]Epoch 5/5:  90%|█████████ | 171/189 [04:34<00:29,  1.66s/it, loss=7.57]Epoch 5/5:  90%|█████████ | 171/189 [04:35<00:29,  1.66s/it, loss=7.71]Epoch 5/5:  91%|█████████ | 172/189 [04:35<00:27,  1.64s/it, loss=7.71]Epoch 5/5:  91%|█████████ | 172/189 [04:37<00:27,  1.64s/it, loss=7.65]Epoch 5/5:  92%|█████████▏| 173/189 [04:37<00:26,  1.65s/it, loss=7.65]Epoch 5/5:  92%|█████████▏| 173/189 [04:39<00:26,  1.65s/it, loss=7.61]Epoch 5/5:  92%|█████████▏| 174/189 [04:39<00:24,  1.64s/it, loss=7.61]Epoch 5/5:  92%|█████████▏| 174/189 [04:40<00:24,  1.64s/it, loss=7.82]Epoch 5/5:  93%|█████████▎| 175/189 [04:40<00:22,  1.57s/it, loss=7.82]Epoch 5/5:  93%|█████████▎| 175/189 [04:41<00:22,  1.57s/it, loss=7.57]Epoch 5/5:  93%|█████████▎| 176/189 [04:41<00:20,  1.56s/it, loss=7.57]Epoch 5/5:  93%|█████████▎| 176/189 [04:43<00:20,  1.56s/it, loss=7.75]Epoch 5/5:  94%|█████████▎| 177/189 [04:43<00:19,  1.59s/it, loss=7.75]Epoch 5/5:  94%|█████████▎| 177/189 [04:45<00:19,  1.59s/it, loss=7.71]Epoch 5/5:  94%|█████████▍| 178/189 [04:45<00:17,  1.60s/it, loss=7.71]Epoch 5/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.60s/it, loss=7.75]Epoch 5/5:  95%|█████████▍| 179/189 [04:47<00:16,  1.65s/it, loss=7.75]Epoch 5/5:  95%|█████████▍| 179/189 [04:48<00:16,  1.65s/it, loss=7.66]Epoch 5/5:  95%|█████████▌| 180/189 [04:48<00:14,  1.66s/it, loss=7.66]Epoch 5/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.66s/it, loss=7.64]Epoch 5/5:  96%|█████████▌| 181/189 [04:50<00:12,  1.59s/it, loss=7.64]Epoch 5/5:  96%|█████████▌| 181/189 [04:51<00:12,  1.59s/it, loss=7.60]Epoch 5/5:  96%|█████████▋| 182/189 [04:51<00:11,  1.60s/it, loss=7.60]Epoch 5/5:  96%|█████████▋| 182/189 [04:53<00:11,  1.60s/it, loss=7.72]Epoch 5/5:  97%|█████████▋| 183/189 [04:53<00:09,  1.58s/it, loss=7.72]Epoch 5/5:  97%|█████████▋| 183/189 [04:54<00:09,  1.58s/it, loss=7.76]Epoch 5/5:  97%|█████████▋| 184/189 [04:54<00:07,  1.56s/it, loss=7.76]Epoch 5/5:  97%|█████████▋| 184/189 [04:56<00:07,  1.56s/it, loss=7.67]Epoch 5/5:  98%|█████████▊| 185/189 [04:56<00:06,  1.60s/it, loss=7.67]Epoch 5/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.60s/it, loss=7.57]Epoch 5/5:  98%|█████████▊| 186/189 [04:58<00:04,  1.61s/it, loss=7.57]Epoch 5/5:  98%|█████████▊| 186/189 [04:59<00:04,  1.61s/it, loss=7.77]Epoch 5/5:  99%|█████████▉| 187/189 [04:59<00:03,  1.62s/it, loss=7.77]Epoch 5/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.62s/it, loss=7.66]Epoch 5/5:  99%|█████████▉| 188/189 [05:01<00:01,  1.58s/it, loss=7.66]Epoch 5/5:  99%|█████████▉| 188/189 [05:02<00:01,  1.58s/it, loss=7.65]Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00,  1.53s/it, loss=7.65]Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00,  1.60s/it, loss=7.65]
  0%|          | 0/23 [00:00<?, ?it/s]  4%|▍         | 1/23 [00:00<00:06,  3.64it/s]  9%|▊         | 2/23 [00:00<00:06,  3.30it/s] 13%|█▎        | 3/23 [00:00<00:06,  3.02it/s] 17%|█▋        | 4/23 [00:01<00:06,  3.02it/s] 22%|██▏       | 5/23 [00:01<00:05,  3.16it/s] 26%|██▌       | 6/23 [00:01<00:05,  2.89it/s] 30%|███       | 7/23 [00:02<00:05,  3.01it/s] 35%|███▍      | 8/23 [00:02<00:05,  2.90it/s] 39%|███▉      | 9/23 [00:03<00:05,  2.70it/s] 43%|████▎     | 10/23 [00:03<00:04,  2.83it/s] 48%|████▊     | 11/23 [00:03<00:04,  2.82it/s] 52%|█████▏    | 12/23 [00:04<00:03,  2.88it/s] 57%|█████▋    | 13/23 [00:04<00:03,  3.19it/s] 61%|██████    | 14/23 [00:04<00:02,  3.18it/s] 65%|██████▌   | 15/23 [00:05<00:02,  2.98it/s] 70%|██████▉   | 16/23 [00:05<00:02,  3.16it/s] 74%|███████▍  | 17/23 [00:05<00:01,  3.10it/s] 78%|███████▊  | 18/23 [00:05<00:01,  3.07it/s] 83%|████████▎ | 19/23 [00:06<00:01,  3.23it/s] 87%|████████▋ | 20/23 [00:06<00:00,  3.15it/s] 91%|█████████▏| 21/23 [00:06<00:00,  3.09it/s] 96%|█████████▌| 22/23 [00:07<00:00,  3.16it/s]100%|██████████| 23/23 [00:07<00:00,  3.06it/s]100%|██████████| 23/23 [00:07<00:00,  3.04it/s]

Epoch 5: train_loss=7.7534 | R@10=0.0242 | DCG@10=0.2592 | NDCG@10=0.0612
metrics
{'recall@5': 0.01575394982720289,
 'recall@10': 0.0241689173872425,
 'recall@20': 0.04311617966492117,
 'precision@5': 0.05770308123249291,
 'precision@10': 0.05546218487394955,
 'precision@20': 0.05042016806722694,
 'dcg@5': 0.17425028015585506,
 'dcg@10': 0.2591816378073866,
 'dcg@20': 0.37284562258827253,
 'ndcg@5': 0.06126104414152299,
 'ndcg@10': 0.061165418642456525,
 'ndcg@20': 0.06258607128889172,
 'mrr': 0.13528584705055305}

The model shows promising results—metrics are improving across epochs and beginning to converge. I’d like to scale up with larger embedding dimensions and more training epochs, but I’m limited by compute in this environment.

One challenge I’ve encountered is the difficulty of fairly evaluating recommendation models. There’s significant variation in how metrics are calculated across papers and implementations, making it hard to compare results directly. I plan to dig deeper into this topic in a future post.

Cleanup

Remove the rec repo so that the notebook runs end to end on restart, and unneccesary files are removed from the blog.

!rm -rf rec
!rm -rf ../../assets/movielens_rec_data

Conclusion

This post presents a GPTRec implementation using my rec framework. The key contributions are:

  • A minimal, reproducible PyTorch implementation of GPTRec, contrasting with the original version which is more general but implemented in TensorFlow
  • A successful demonstration of the rec framework’s capabilities

The model shows reasonable performance, validating that the architecture is implemented correctly.

Looking ahead, I’d like to integrate sequential models into the rec framework. The framework currently supports a Retrieval → Ranking pipeline via the train_all script. I’m considering two approaches:

  1. Three-stage pipeline (Retrieval → Sequential → Ranking): The ranking model would either become a hybrid combining traditional ranking with sequential signals, or incorporate the sequential model’s logits into the ranking embeddings.

  2. Sequential ranking: Replace the ranking stage with a sequential model that also leverages user/item features. This aligns with industry trends—see Meta’s recent work on sequence learning for personalized recommendations.

Finally, a note on tooling: I found solveit’s compute limitations frustrating for this implementation-heavy post, requiring many restarts. For future implementation work, I’ll likely develop locally and reserve solveit for paper reviews and lighter research tasks.