!git clone https://github.com/AndrewBoney/rec.git && cd rec && git checkout 7a6475ffatal: destination path 'rec' already exists and is not an empty directory.
Andrew Boney
February 11, 2026
Following up from my previous post on the GPTRec paper, I want to implement a GPTRec-style system.
The original author has existing implementations across a couple of repos: gptrec_rl and bert4rec_repro. However, these are broader in scope than I need, and are implemented in TensorFlow (boo!!!).
Instead, I’ll be implementing this using a framework I’m building: rec. This is a work in progress aiming to provide an end-to-end implementation of the kinds of recommendation systems used in industry—encompassing the whole lifecycle from data prep to deployment. While there’s still plenty to add, the baseline should now be robust enough to work at scale. I’ll likely use this framework in future posts too.
In this implementation, I’m going to focus on the GPTRec architecture - ignoring the sub-item tokenisation and Next-K prediction aspects of the paper.
fatal: destination path 'rec' already exists and is not an empty directory.
First, we’ll use the rec framework to prepare the data.
I’ll use the MovieLens 1M dataset—a classic benchmark in recommendation systems research containing 1 million ratings from 6,000 users on 4,000 movies. The rec framework includes a data preparation module that downloads and processes this dataset into a consistent format for training.
paths = DataPaths(
users_path = os.path.join(root_folder, "prepared", "users.parquet"),
items_path = os.path.join(root_folder, "prepared", "items.parquet"),
interactions_train_path = os.path.join(root_folder, "prepared", "interactions_train.parquet"),
interactions_val_path = os.path.join(root_folder, "prepared", "interactions_val.parquet")
) user_id age gender age_group occupation zip zip_prefix
0 1 1.0 F age_1.0 occ_10 48067 480
1 2 56.0 M age_56.0 occ_16 70072 700
2 3 25.0 M age_25.0 occ_15 55117 551
3 4 45.0 M age_45.0 occ_7 02460 024
4 5 25.0 M age_25.0 occ_20 55455 554
item_id genres genre_grouped \
0 1 Animation|Children's|Comedy other
1 2 Adventure|Children's|Fantasy Adventure|Children's|Fantasy
2 3 Comedy|Romance Comedy|Romance
3 4 Comedy|Drama Comedy|Drama
4 5 Comedy Comedy
title title_raw year \
0 Toy Story (1995) Toy Story 1995
1 Jumanji (1995) Jumanji 1995
2 Grumpier Old Men (1995) Grumpier Old Men 1995
3 Waiting to Exhale (1995) Waiting to Exhale 1995
4 Father of the Bride Part II (1995) Father of the Bride Part II 1995
year_bucket
0 year_1990s
1 year_1990s
2 year_1990s
3 year_1990s
4 year_1990s
user_id item_id rating timestamp
0 1 1193 5 2000-12-31
1 1 661 3 2000-12-31
2 1 914 3 2000-12-31
3 1 3408 4 2000-12-31
4 1 2355 5 2001-01-06
This defines the cols used in the dataset, and the types of features they should be converted into.
Before training, we need to convert raw IDs (like u_000001 or i_000042) into integer indices that can be used for embedding lookups—much like tokenization in NLP. The build_encoders function creates a CategoryEncoder for each categorical column, mapping each unique value to an integer index while reserving 0 for unknown values.
We also need to define the feature cardinalities, i.e. the number of unique values for each categorical feature. This is used to determine the size of the embeddings.
In this case we only need user_id and item_id. Note that the cardinality is the number of unique values, plus one for the unknown value.
Sequential recommendation models like GPTRec learn from the order in which users interact with items—predicting the next item based on the sequence of previous ones. To train such models, we need to transform our flat interaction table into an ordered mapping: for each user, a chronologically sorted list of item IDs.
While the rec framework includes a build_user_item_map function, this was designed for non-sequential models where interaction order doesn’t matter—it simply collects the set of items each user has interacted with. For GPTRec, we need a modified version that preserves temporal ordering by sorting on the timestamp column. I’ll likely integrate this into the framework in a future update.
from typing import Dict, List
from rec.common.io import read_parquet_batches
from rec.common.data import FeatureConfig, CategoryEncoder
def build_user_item_map_ordered(
interactions_path: str,
feature_cfg: FeatureConfig,
user_encoders: Dict[str, CategoryEncoder],
item_encoders: Dict[str, CategoryEncoder],
chunksize: int = 200_000,
) -> Dict[int, List[int]]:
"""Build user->items map ordered by timestamp (ascending)."""
user_to_items: Dict[int, List[tuple]] = {} # uid -> [(timestamp, item_id), ...]
for chunk in read_parquet_batches(interactions_path, chunksize):
user_ids = user_encoders[feature_cfg.user_id_col].transform(
chunk[feature_cfg.interaction_user_col].astype(str).tolist()
)
item_ids = item_encoders[feature_cfg.item_id_col].transform(
chunk[feature_cfg.interaction_item_col].astype(str).tolist()
)
timestamps = chunk[feature_cfg.interaction_time_col].tolist()
for uid, iid, ts in zip(user_ids, item_ids, timestamps):
uid, iid = int(uid), int(iid)
if uid not in user_to_items:
user_to_items[uid] = []
user_to_items[uid].append((ts, iid))
# Sort by timestamp and extract just the item ids
return {
uid: [iid for _, iid in sorted(items)]
for uid, items in user_to_items.items()
}
train_user_item_map = build_user_item_map_ordered(
paths.interactions_train_path,
feature_config,
user_encoders,
item_encoders,
)
val_user_item_map = build_user_item_map_ordered(
paths.interactions_val_path,
feature_config,
user_encoders,
item_encoders,
)[1, 11, 21, 30, 32, 34, 47, 143, 171, 177, 195, 221, 230, 231, 244, 254, 314, 326, 353, 377, 438, 446, 477, 497, 521, 548, 584, 585, 586, 594, 643, 700, 771, 776, 842, 1064, 1066, 1082, 1112, 1120, 1121, 1157, 1173, 1225, 1230, 1231, 1240, 1251, 1255, 1259, 1266, 1278, 1301, 1336, 1352, 1373, 1375, 1376, 1383, 1425, 1449, 1456, 1483, 1492, 1506, 1530, 1534, 1540, 1544, 1596, 1608, 1618, 1631, 1696, 1761, 1764, 1807, 1808, 1814, 1839, 1841, 1855, 1895, 1900, 1932, 1944, 1955, 1997, 2010, 2013, 2026, 2032, 2040, 2069, 2077, 2087, 2175, 2177, 2180, 2182, 2226, 2244, 2253, 2284, 2287, 2303, 2327, 2338, 2339, 2350, 2356, 2364, 2365, 2401, 2422, 2434, 2473, 2503, 2512, 2531, 2537, 2560, 2603, 2615, 2619, 2625, 2631, 2632, 2634, 2638, 2648, 2654, 2678, 2693, 2694, 2695, 2723, 2729, 2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004, 3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233, 3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555, 3603, 3629, 3644, 3677, 3685, 3717, 3725, 3752, 3767, 3794, 3883, 6, 10, 16, 109, 110, 160, 163, 164, 227, 233, 346, 374, 451, 454, 463, 471, 524, 590, 605, 642, 725, 779, 856, 901, 913, 958, 1063, 1065, 1075, 1077, 1079, 1179, 1180, 1183, 1191, 1193, 1197, 1210, 1215, 1221, 1246, 1253, 1257, 1280, 1288, 1338, 1350, 1386, 1452, 1546, 1563, 1569, 1575, 1576, 1600, 1629, 1674, 1743, 1852, 1892, 1960, 1990, 1996, 2050, 2210, 2220, 2266, 2285, 2323, 2325, 2472, 2696, 2747, 2803, 2815, 2822, 2848, 2899, 2904, 2998, 3033, 3079, 3128, 3188, 3373, 3380, 3444, 3459, 3483, 3638, 3659, 3695, 1843, 3340, 258, 290, 365, 504, 538, 552, 589, 607, 848, 1024, 1086, 1181, 1204, 1245, 1354, 1356, 1557, 1567, 1886, 1933, 1945, 2037, 2126, 2205, 2461, 2549, 2572, 2734, 2880, 2921, 3036, 3350, 3412, 3508, 3567, 3634, 7, 1273, 197, 550, 553, 578, 1132, 1175, 1184, 1264, 1272, 1367, 1844, 2047, 2099, 2459, 2594, 2883, 2917, 3074, 3291, 3758, 2706]
[245, 294, 592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571, 3718]
In a future iteration, I may extend this to include timestamps in the output mapping. This would enable time-based positional embeddings—encoding when interactions occurred rather than just their relative order. For now, I’ll keep things simple with index-based positional embeddings.
The FeatureStore class provides efficient feature lookup for users and items during training and inference. Rather than repeatedly encoding features on-the-fly, it pre-encodes all user and item features into tensors at initialization—storing them in memory for fast indexed access.
Key functionality: - Pre-encoded tensors: All categorical and dense features are encoded once and stored as PyTorch tensors with zero-padding at index 0 (for unknown/missing values) - Index mappings: Maintains user_index and item_index dictionaries that map encoded IDs to their row positions in the feature tensors - Batch lookups: get_user_features() and get_item_features() retrieve all features for a batch of IDs in a single operation - Item catalog access: get_all_item_features() and get_all_item_ids() provide full item catalog access—useful for scoring all items during inference
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'feature_cfg', 'get_all_item_features', 'get_all_item_ids', 'get_item_features', 'get_user_features', 'item_encoders', 'item_features', 'item_id_tensor', 'item_index', 'map_item_ids_to_indices', 'user_encoders', 'user_features', 'user_index']
For this I want a dataset that generates a padded sequence of items for each user.
First, work out a good max sequence length based on the distribution in the data
Max len: 2314
Avg len: 163.83573439311144
Std len: 190.44394716254214
Pct > 200: 25.7824 %
With a max sequence length of 200, we capture the full history for ~75% of users. While this loses some information it allows us to work in a compute limited environment.
Now let’s build the datasets. For sequential recommendation, we need two different dataset types:
Training dataset: Uses a sliding window approach where, given a sequence of items [A, B, C, D], the model learns to predict each next item from the preceding context: A→B, [A,B]→C, [A,B,C]→D. This is implemented by shifting input and labels by one position.
Evaluation dataset: Uses the full training history as context and held-out validation items as targets. This mirrors the real inference scenario: given everything we know about a user’s past behavior, can we predict what they’ll interact with next?
Both datasets use left-padding (padding at the start of sequences) so the most recent item is always at the same position—this works naturally with causal attention where we predict the next token based on previous ones.
import torch
from torch.utils.data import Dataset, DataLoader
class SequentialTrainDataset(Dataset):
"""Training dataset: generates sequences for next-item prediction."""
PAD_TOKEN = 0
def __init__(
self,
user_item_map: Dict[int, List[int]],
max_length: int = 50,
min_length: int = 2,
) -> None:
super().__init__()
self.max_length = max_length
self.user_item_map = user_item_map
self.user_ids = [
uid for uid, items in user_item_map.items()
if len(items) >= min_length
]
def __len__(self) -> int:
return len(self.user_ids)
def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
user_id = self.user_ids[idx]
items = self.user_item_map[user_id]
# Need max_length + 1 items to get max_length input/target pairs
if len(items) > self.max_length + 1:
items = items[-(self.max_length + 1):]
# input/target shifted by 1
input_items = items[:-1]
labels = items[1:]
actual_len = len(input_items)
# Left-pad to max_length
pad_len = self.max_length - actual_len
input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
label_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
input_seq[pad_len:] = input_items
label_seq[pad_len:] = labels
attention_mask = np.zeros(self.max_length, dtype=np.float32)
attention_mask[pad_len:] = 1.0
return {
"user_id": torch.tensor(user_id, dtype=torch.long),
"input_ids": torch.from_numpy(input_seq),
"labels": torch.from_numpy(label_seq),
"attention_mask": torch.from_numpy(attention_mask),
"seq_length": torch.tensor(actual_len, dtype=torch.long),
}
class SequentialEvalDataset(Dataset):
"""
Eval dataset for retrieval metrics.
Returns user's training history as context, and val items as targets.
Compatible with evaluate_retrieval pattern - model produces scores,
we compare top-k against val items.
"""
PAD_TOKEN = 0
def __init__(
self,
train_user_item_map: Dict[int, List[int]],
val_user_item_map: Dict[int, List[int]],
max_length: int = 50,
) -> None:
super().__init__()
self.max_length = max_length
self.train_map = train_user_item_map
self.val_map = val_user_item_map
# Users with val items AND some training history
self.user_ids = [
uid for uid in val_user_item_map
if len(val_user_item_map[uid]) >= 1 and uid in train_user_item_map
]
def __len__(self) -> int:
return len(self.user_ids)
def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
user_id = self.user_ids[idx]
context_items = self.train_map.get(user_id, [])
target_items = self.val_map[user_id]
# Truncate context to max_length (keep most recent)
if len(context_items) > self.max_length:
context_items = context_items[-self.max_length:]
actual_len = len(context_items)
pad_len = self.max_length - actual_len
input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
input_seq[pad_len:] = context_items
attention_mask = np.zeros(self.max_length, dtype=np.float32)
attention_mask[pad_len:] = 1.0
return {
"user_id": torch.tensor(user_id, dtype=torch.long),
"input_ids": torch.from_numpy(input_seq),
"attention_mask": torch.from_numpy(attention_mask),
"seq_length": torch.tensor(actual_len, dtype=torch.long),
# Targets for metric computation (variable length)
"target_items": torch.tensor(target_items, dtype=torch.long),
}
def collate_eval_batches(batch):
return {
"user_id" : torch.stack([ex["user_id"] for ex in batch]),
"input_ids": torch.stack([ex["input_ids"] for ex in batch]),
"attention_mask": torch.stack([ex["attention_mask"] for ex in batch]),
"seq_length": torch.stack([ex["seq_length"] for ex in batch]),
"target_items": [ex["target_items"] for ex in batch], # keep as list of tensors for ragged
}A few implementation details worth noting:
+1 offset we built into our encoders earliercollate_eval_batches function handles this ragged structure.{'user_id': tensor(1),
'input_ids': tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 149, 258, 528, 591, 605, 656, 712, 903,
908, 927, 1010, 1016, 1017, 1023, 1082, 1177, 1180, 1190, 1227, 1251,
1268, 1673, 1769, 1893, 1894, 1950, 1960, 2253, 2272, 2330, 2624, 2694,
2723, 2729, 2736, 2850, 3037, 3046, 3118, 3340, 1, 48, 524, 585,
592, 736, 774, 1507, 1527, 1839, 2226, 2287]),
'labels': tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 258, 528, 591, 605, 656, 712, 903, 908,
927, 1010, 1016, 1017, 1023, 1082, 1177, 1180, 1190, 1227, 1251, 1268,
1673, 1769, 1893, 1894, 1950, 1960, 2253, 2272, 2330, 2624, 2694, 2723,
2729, 2736, 2850, 3037, 3046, 3118, 3340, 1, 48, 524, 585, 592,
736, 774, 1507, 1527, 1839, 2226, 2287, 2619]),
'attention_mask': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1.]),
'seq_length': tensor(52)}
{'user_id': tensor(36),
'input_ids': tensor([2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004,
3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233,
3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555,
3603, 3629, 3644, 3677, 3685, 3717, 3725, 3752, 3767, 3794, 3883, 6,
10, 16, 109, 110, 160, 163, 164, 227, 233, 346, 374, 451,
454, 463, 471, 524, 590, 605, 642, 725, 779, 856, 901, 913,
958, 1063, 1065, 1075, 1077, 1079, 1179, 1180, 1183, 1191, 1193, 1197,
1210, 1215, 1221, 1246, 1253, 1257, 1280, 1288, 1338, 1350, 1386, 1452,
1546, 1563, 1569, 1575, 1576, 1600, 1629, 1674, 1743, 1852, 1892, 1960,
1990, 1996, 2050, 2210, 2220, 2266, 2285, 2323, 2325, 2472, 2696, 2747,
2803, 2815, 2822, 2848, 2899, 2904, 2998, 3033, 3079, 3128, 3188, 3373,
3380, 3444, 3459, 3483, 3638, 3659, 3695, 1843, 3340, 258, 290, 365,
504, 538, 552, 589, 607, 848, 1024, 1086, 1181, 1204, 1245, 1354,
1356, 1557, 1567, 1886, 1933, 1945, 2037, 2126, 2205, 2461, 2549, 2572,
2734, 2880, 2921, 3036, 3350, 3412, 3508, 3567, 3634, 7, 1273, 197,
550, 553, 578, 1132, 1175, 1184, 1264, 1272, 1367, 1844, 2047, 2099,
2459, 2594, 2883, 2917, 3074, 3291, 3758, 2706]),
'attention_mask': tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1.]),
'seq_length': tensor(200),
'target_items': tensor([ 245, 294, 592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571,
3718])}
{'user_id': tensor([2993, 2311, 5298, 2819, 2634, 144, 230, 4548, 4676, 754, 5792, 3857,
2707, 3, 2189, 87]),
'input_ids': tensor([[ 0, 0, 0, ..., 3687, 3741, 3772],
[ 0, 0, 0, ..., 3257, 3340, 3382],
[ 0, 0, 0, ..., 3570, 3633, 3634],
...,
[ 0, 0, 0, ..., 3484, 3551, 3603],
[ 0, 0, 0, ..., 3547, 3686, 3695],
[ 0, 0, 0, ..., 3604, 3725, 1198]]),
'labels': tensor([[ 0, 0, 0, ..., 3741, 3772, 3800],
[ 0, 0, 0, ..., 3340, 3382, 3442],
[ 0, 0, 0, ..., 3633, 3634, 3635],
...,
[ 0, 0, 0, ..., 3551, 3603, 3799],
[ 0, 0, 0, ..., 3686, 3695, 3725],
[ 0, 0, 0, ..., 3725, 1198, 1569]]),
'attention_mask': tensor([[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
...,
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.]]),
'seq_length': tensor([ 82, 22, 58, 19, 200, 31, 178, 20, 32, 156, 200, 53, 22, 50,
34, 58])}
{'user_id': tensor([ 36, 59, 65, 102, 131, 146, 157, 164, 169, 184, 192, 193, 195, 229,
231, 237]),
'input_ids': tensor([[2790, 2847, 2850, ..., 3291, 3758, 2706],
[ 0, 0, 0, ..., 3743, 3747, 3841],
[ 0, 0, 0, ..., 1193, 2560, 3752],
...,
[ 0, 0, 0, ..., 3687, 3725, 3847],
[ 0, 0, 0, ..., 2848, 2917, 3459],
[ 0, 0, 0, ..., 1637, 354, 2041]]),
'attention_mask': tensor([[1., 1., 1., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
...,
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.]]),
'seq_length': tensor([200, 93, 119, 32, 200, 200, 200, 25, 200, 28, 200, 177, 200, 83,
48, 161]),
'target_items': [tensor([ 245, 294, 592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571,
3718]),
tensor([ 17, 25, 32, 58, 109, 198, 222, 300, 312, 374, 506, 512,
538, 587, 605, 741, 888, 889, 890, 892, 894, 903, 919, 927,
932, 934, 942, 948, 954, 1016, 1023, 1053, 1066, 1073, 1082, 1157,
1160, 1167, 1182, 1184, 1203, 1209, 1213, 1216, 1222, 1225, 1233, 1247,
1248, 1250, 1263, 1264, 1265, 1269, 1277, 1284, 1285, 1337, 1360, 1398,
1540, 1598, 1608, 1867, 1883, 1891, 1892, 1952, 1960, 1997, 1999, 2002,
2079, 2091, 2175, 2223, 2232, 2268, 2284, 2301, 2328, 2338, 2464, 2497,
2503, 2576, 2589, 2592, 2602, 2618, 2644, 2648, 2663, 2694, 2851, 2873,
2875, 2929, 2948, 2950, 2993, 3089, 3117, 3156, 3178, 3192, 3239, 3291,
3295, 3340, 3346, 3367, 3403, 3477, 3481, 3587, 3616, 3617, 3802, 3842]),
tensor([2126, 942]),
tensor([ 10, 24, 552, 592, 841, 912, 920, 922, 1080, 1082, 1164, 1247,
1479, 1674, 1986, 2002, 2265, 2268, 2269, 2287, 2301, 2322, 2327, 2328,
2366, 2417, 2434, 2473, 2507, 2512, 2531, 2615, 2618, 2624, 2632, 2646,
2654, 2655, 2656, 2693, 2695, 2773, 2790, 2795, 2840, 2858, 2874, 2879,
2881, 2891, 2907, 2919, 2922, 2929, 2971, 3089, 3092, 3108, 3151, 3480,
3512, 3567, 3686, 3718, 3723, 3730, 3794, 3883, 741, 895, 902, 913,
917, 919, 932, 940, 942, 1069, 1190, 1202, 1250, 1884, 2138, 2330,
2867, 3367, 3400, 294, 1177, 1232, 3645, 202, 911, 915, 1943, 1944,
1972, 2144, 2809, 3031]),
tensor([888]),
tensor([ 10, 291, 887, 939, 967, 1237, 1337, 1998, 2298, 2428, 2544, 2852,
2879, 2922, 3367, 3571, 3784, 924, 3536, 1164, 1229, 1324, 3224]),
tensor([ 208, 294, 524, 901, 1075, 1187, 1191, 1200, 1204, 1205, 1244, 1280,
1386, 1876, 2876, 2994, 3128, 3274, 3451, 3586, 3665, 3685, 3743, 1951,
2126]),
tensor([3847]),
tensor([2869, 1066]),
tensor([ 1, 10, 34, 260, 294, 315, 361, 548, 585, 591, 592, 593,
908, 912, 922, 975, 1010, 1016, 1020, 1059, 1082, 1164, 1208, 1247,
1355, 1360, 1527, 1674, 1780, 1839, 1851, 1950, 1986, 2010, 2012, 2013,
2015, 2017, 2019, 2022, 2028, 2038, 2070, 2072, 2226, 2287, 2619, 2874,
2879, 2881, 2922, 3328, 3567, 3571, 3683, 3723]),
tensor([3880]),
tensor([ 39, 233, 331, 354, 374, 454, 477, 590, 941, 944, 1019, 1059,
1112, 1147, 1157, 1160, 1180, 1181, 1182, 1187, 1207, 1208, 1215, 1245,
1274, 1354, 1374, 1697, 1817, 1844, 1846, 1889, 1952, 2026, 2031, 2082,
2140, 2298, 2328, 2503, 2648, 2655, 2723, 2803, 2844, 2869, 2900, 2905,
2919, 2932, 2951, 3004, 3020, 3028, 3085, 3199, 3266, 3293, 3328, 3338,
3480, 3508, 3635, 3702, 3858]),
tensor([2492, 1212, 3124, 1219, 389, 2517, 2476, 3662, 1737, 6, 1341, 2870,
1996, 2444, 3318, 2562, 2840, 2929, 3024, 3066, 3259, 2272, 3882, 3403,
1936, 899, 3027, 3020, 3022, 1885, 2994, 2045, 3436, 172, 931, 2420,
3525, 1816, 889, 1183, 2106, 2228, 2229, 2305, 939, 3618, 1075, 3590,
1881, 2519, 1277, 628, 3678, 2740, 1235, 1430, 407, 465, 1167, 1282,
1655, 2680, 2723, 2724, 2939, 3696, 326, 1336, 1354, 1355, 2179]),
tensor([ 26, 34, 62, 86, 109, 160, 171, 174, 258, 263, 316, 452,
586, 590, 592, 605, 642, 716, 757, 838, 1025, 1046, 1082, 1179,
1193, 1207, 1213, 1263, 1371, 1373, 1394, 1417, 1459, 1476, 1549, 1594,
1664, 1681, 1687, 1696, 1740, 1743, 1752, 1766, 1816, 1829, 1841, 1850,
1934, 1944, 1957, 1991, 1994, 2086, 2163, 2229, 2233, 2237, 2248, 2360,
2363, 2365, 2374, 2379, 2436, 2473, 2522, 2560, 2589, 2608, 2620, 2644,
2650, 2678, 2698, 2703, 2822, 2982, 3039, 3047, 3079, 3080, 3092, 3105,
3106, 3108, 3109, 3110, 3118, 3121, 3187, 3217, 3230, 3249, 3258, 3292,
3357, 3384, 3385, 3442, 3445, 3468, 3486, 3487, 3495, 3510, 3526, 3530,
3549, 3649, 3655, 3676, 3677, 3717, 3718, 3756, 3762, 3782, 3788, 3794,
3813, 3828, 3841, 3846, 3874, 3879, 3880, 2771, 3021]),
tensor([ 258, 538, 586, 1074, 1183, 1208, 1239, 1240, 1354, 1608, 2125, 2531,
2596, 2648, 2649, 2929, 3107, 3804, 1112, 1024, 1179, 1181, 1203, 1227,
1267, 1288, 1629, 1932, 2042, 2076, 2126, 2180, 2729, 2850, 3293, 3329,
3437, 1, 736, 2919, 3046, 1906, 1907, 1908, 1909, 1910, 1911, 1912,
1913]),
tensor([1000, 3618, 2386, 2580, 2013])]}
At its core, GPTRec applies the same autoregressive language modeling approach that powers GPT to sequential recommendation. Just as GPT predicts the next word given previous words, GPTRec predicts the next item given a user’s interaction history.
The architecture follows a familiar transformer pattern:
Item Embeddings: Each item gets a learned embedding vector. We also add positional embeddings so the model knows where each item appears in the sequence.
Causal Transformer: The key ingredient. Unlike BERT-style models that can look at the full sequence bidirectionally, we use causal (autoregressive) masking—each position can only attend to earlier positions. This matches our inference scenario: predict what comes next based only on what we’ve seen so far.
Weight Tying: The output projection layer shares weights with the item embedding layer. This is a common trick in language models that reduces parameters and often improves performance—the intuition being that the “meaning” of an item should be consistent whether we’re encoding it as input or predicting it as output.
The compute_loss method implements standard cross-entropy loss with ignore_index=0 to skip padding tokens—we only want to learn from real predictions, not from predicting padding.
One notable simplification: unlike the full GPTRec paper which explores SVD-based embedding initialization and various training optimizations, I’m using standard randomly-initialized embeddings here. For a small dataset like this, it should work fine.
import torch
import torch.nn as nn
import math
class GPTRecModel(nn.Module):
"""
GPTRec: GPT-style autoregressive transformer for sequential recommendation.
Uses causal masking so each position only attends to previous positions,
enabling next-item prediction.
"""
def __init__(
self,
n_items: int,
d_model: int = 64,
n_heads: int = 2,
n_layers: int = 2,
d_ff: int = 256,
max_seq_len: int = 50,
dropout: float = 0.1,
pad_token: int = 0,
):
super().__init__()
self.pad_token = pad_token
self.d_model = d_model
# Item embedding (+1 for padding token at index 0)
self.item_embedding = nn.Embedding(n_items + 1, d_model, padding_idx=pad_token)
self.pos_embedding = nn.Embedding(max_seq_len, d_model)
self.dropout = nn.Dropout(dropout)
self.layer_norm = nn.LayerNorm(d_model)
# Transformer encoder with causal masking
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_heads,
dim_feedforward=d_ff,
dropout=dropout,
activation='gelu',
batch_first=True,
)
self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
# Output projection to item scores
self.output_layer = nn.Linear(d_model, n_items + 1)
self.output_layer.weight = self.item_embedding.weight # tie weights
def _generate_causal_mask(self, seq_len: int, device: torch.device) -> torch.Tensor:
"""Generate causal mask: positions can only attend to earlier positions."""
mask = torch.triu(torch.ones(seq_len, seq_len, device=device), diagonal=1)
mask = mask.masked_fill(mask == 1, float('-inf'))
return mask
def forward(
self,
input_ids: torch.Tensor, # (batch, seq_len)
attention_mask: torch.Tensor = None, # (batch, seq_len) - 1 for real, 0 for pad
) -> torch.Tensor:
batch_size, seq_len = input_ids.shape
device = input_ids.device
# Embeddings
positions = torch.arange(seq_len, device=device).unsqueeze(0).expand(batch_size, -1)
x = self.item_embedding(input_ids) + self.pos_embedding(positions)
x = self.dropout(self.layer_norm(x))
# Causal mask
causal_mask = self._generate_causal_mask(seq_len, device)
# Padding mask (convert to float: 0.0 = attend, -inf = ignore)
src_key_padding_mask = torch.where(attention_mask == 1, 0.0, float('-inf'))
# Transformer
x = self.transformer(
x,
mask=causal_mask,
src_key_padding_mask=src_key_padding_mask,
)
# Project to item logits
logits = self.output_layer(x) # (batch, seq_len, n_items+1)
return logits
def compute_loss(self, batch, ignore_index=0):
"""Cross-entropy loss for next-item prediction, ignoring padding."""
logits = self(batch['input_ids'], batch['attention_mask']) # (B, seq_len, n_items+1)
# Reshape for cross-entropy: (B*seq_len, n_items+1) vs (B*seq_len,)
logits_flat = logits.view(-1, logits.size(-1))
labels_flat = batch['labels'].view(-1)
loss = nn.functional.cross_entropy(
logits_flat,
labels_flat,
ignore_index=ignore_index # Ignore padding positions
)
return lossn_items = item_cardinalities['item_id'] # 201
model = GPTRecModel(
n_items=n_items,
d_model=64,
n_heads=2,
n_layers=2,
d_ff=256,
max_seq_len=max_len,
dropout=0.2,
)
# Test forward pass
logits = model(val_batch['input_ids'], val_batch['attention_mask'])
print("Logits shape:", logits.shape) # (1, seq_len, n_items+1)Logits shape: torch.Size([16, 200, 3885])
#from rec.retrieval.metrics import aggregate_retrieval_metrics
from rec.retrieval.metrics import *
from rec.retrieval.metrics import _as_list
# requires rewrite from library version to add dcg for comparison with GPTRec paper. will integrate into library in future iterations
def aggregate_retrieval_metrics(
topk_indices: torch.Tensor,
relevant_indices: Sequence[torch.Tensor],
ks: Iterable[int],
) -> Dict[str, float]:
ks_list = _as_list(ks)
if not ks_list or topk_indices.numel() == 0:
return {}
max_k = max(ks_list)
if topk_indices.size(1) < max_k:
raise ValueError("topk_indices must have at least max(k) columns")
totals = {f"recall@{k}": 0.0 for k in ks_list}
totals.update({f"precision@{k}": 0.0 for k in ks_list})
totals.update({f"dcg@{k}": 0.0 for k in ks_list})
totals.update({f"ndcg@{k}": 0.0 for k in ks_list})
totals["mrr"] = 0.0
num_users = topk_indices.size(0)
for idx in range(num_users):
topk = topk_indices[idx]
rel = relevant_indices[idx]
if rel.numel() == 0:
continue
hits = torch.isin(topk, rel)
totals["mrr"] += mrr(hits)
num_rel = int(rel.numel())
for k in ks_list:
totals[f"recall@{k}"] += recall_at_k(hits, num_rel, k)
totals[f"precision@{k}"] += precision_at_k(hits, k)
dcg = dcg_at_k(hits, k)
ideal_dcg = idcg_at_k(num_rel, k)
ndcg = dcg / ideal_dcg if ideal_dcg > 0 else 0.0
totals[f"dcg@{k}"] += dcg
totals[f"ndcg@{k}"] += ndcg
if num_users == 0:
return {}
return {k: v / float(num_users) for k, v in totals.items()}
def evaluate_gptrec(
model: GPTRecModel,
val_dataloader: DataLoader,
train_user_item_map: Dict[int, List[int]],
n_items: int,
ks: List[int] = [5, 10, 20],
device: torch.device = None,
) -> Dict[str, float]:
"""
Evaluate GPTRec model on retrieval metrics.
For each user, we:
1. Get model's logits from the last position (next-item prediction)
2. Mask out items the user has already seen in training
3. Get top-k predictions
4. Compare against validation targets
"""
if device is None:
device = next(model.parameters()).device
model.eval()
topk_indices_list = []
relevant_indices_list = []
max_k = max(ks)
with torch.no_grad():
for batch in tqdm(val_dataloader):
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
user_ids = batch['user_id']
target_items = batch['target_items'] # list of tensors
# Forward pass
logits = model(input_ids, attention_mask) # (B, seq_len, n_items+1)
# Get logits from last position for next-item prediction
last_logits = logits[:, -1, :] # (B, n_items+1)
# For each user in batch
for i in range(len(user_ids)):
uid = user_ids[i].item()
scores = last_logits[i].clone() # (n_items+1,)
# Mask out seen items (set to -inf)
seen_items = train_user_item_map.get(uid, [])
if seen_items:
seen_tensor = torch.tensor(seen_items, device=device)
scores[seen_tensor] = float('-inf')
# Also mask out padding token (index 0)
scores[0] = float('-inf')
# Get top-k predictions
topk = torch.topk(scores, min(max_k, n_items)).indices
topk_indices_list.append(topk.cpu())
# Target items for this user
relevant_indices_list.append(target_items[i])
if not topk_indices_list:
return {}
topk_tensor = torch.stack(topk_indices_list, dim=0)
metrics = aggregate_retrieval_metrics(topk_tensor, relevant_indices_list, ks)
return metricsfrom tqdm import tqdm
num_epochs = 5
batch_size = 32
lr = 5e-4
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
for epoch in range(num_epochs):
model.train()
train_losses = []
# Training loop
pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}")
for batch in pbar:
optimizer.zero_grad()
loss = model.compute_loss(batch)
loss.backward()
optimizer.step()
train_losses.append(loss.item())
pbar.set_postfix(loss=f"{loss.item():.2f}")
avg_train_loss = sum(train_losses) / len(train_losses)
# Retrieval metrics (with progress bar)
metrics = evaluate_gptrec(
model, val_dl, train_user_item_map,
n_items=n_items, ks=[5, 10, 20],
)
# Summary line
print(f"\nEpoch {epoch+1}: train_loss={avg_train_loss:.4f} | R@10={metrics.get('recall@10', 0):.4f}",
f"| DCG@10={metrics.get('dcg@10', 0):.4f} | NDCG@10={metrics.get('ndcg@10', 0):.4f}")Epoch 1/5: 0%| | 0/189 [00:00<?, ?it/s]Epoch 1/5: 0%| | 0/189 [00:01<?, ?it/s, loss=36.64]Epoch 1/5: 1%| | 1/189 [00:01<05:34, 1.78s/it, loss=36.64]Epoch 1/5: 1%| | 1/189 [00:03<05:34, 1.78s/it, loss=36.19]Epoch 1/5: 1%| | 2/189 [00:03<05:31, 1.77s/it, loss=36.19]Epoch 1/5: 1%| | 2/189 [00:05<05:31, 1.77s/it, loss=36.06]Epoch 1/5: 2%|▏ | 3/189 [00:05<05:12, 1.68s/it, loss=36.06]Epoch 1/5: 2%|▏ | 3/189 [00:06<05:12, 1.68s/it, loss=35.71]Epoch 1/5: 2%|▏ | 4/189 [00:06<05:08, 1.67s/it, loss=35.71]Epoch 1/5: 2%|▏ | 4/189 [00:08<05:08, 1.67s/it, loss=35.56]Epoch 1/5: 3%|▎ | 5/189 [00:08<05:09, 1.68s/it, loss=35.56]Epoch 1/5: 3%|▎ | 5/189 [00:10<05:09, 1.68s/it, loss=35.01]Epoch 1/5: 3%|▎ | 6/189 [00:10<05:01, 1.65s/it, loss=35.01]Epoch 1/5: 3%|▎ | 6/189 [00:11<05:01, 1.65s/it, loss=34.56]Epoch 1/5: 4%|▎ | 7/189 [00:11<04:55, 1.62s/it, loss=34.56]Epoch 1/5: 4%|▎ | 7/189 [00:13<04:55, 1.62s/it, loss=34.11]Epoch 1/5: 4%|▍ | 8/189 [00:13<04:46, 1.58s/it, loss=34.11]Epoch 1/5: 4%|▍ | 8/189 [00:14<04:46, 1.58s/it, loss=33.33]Epoch 1/5: 5%|▍ | 9/189 [00:14<04:42, 1.57s/it, loss=33.33]Epoch 1/5: 5%|▍ | 9/189 [00:16<04:42, 1.57s/it, loss=32.68]Epoch 1/5: 5%|▌ | 10/189 [00:16<04:46, 1.60s/it, loss=32.68]Epoch 1/5: 5%|▌ | 10/189 [00:17<04:46, 1.60s/it, loss=32.68]Epoch 1/5: 6%|▌ | 11/189 [00:17<04:37, 1.56s/it, loss=32.68]Epoch 1/5: 6%|▌ | 11/189 [00:19<04:37, 1.56s/it, loss=31.85]Epoch 1/5: 6%|▋ | 12/189 [00:19<04:39, 1.58s/it, loss=31.85]Epoch 1/5: 6%|▋ | 12/189 [00:21<04:39, 1.58s/it, loss=31.54]Epoch 1/5: 7%|▋ | 13/189 [00:21<04:39, 1.59s/it, loss=31.54]Epoch 1/5: 7%|▋ | 13/189 [00:22<04:39, 1.59s/it, loss=30.80]Epoch 1/5: 7%|▋ | 14/189 [00:22<04:34, 1.57s/it, loss=30.80]Epoch 1/5: 7%|▋ | 14/189 [00:24<04:34, 1.57s/it, loss=30.32]Epoch 1/5: 8%|▊ | 15/189 [00:24<04:36, 1.59s/it, loss=30.32]Epoch 1/5: 8%|▊ | 15/189 [00:25<04:36, 1.59s/it, loss=29.31]Epoch 1/5: 8%|▊ | 16/189 [00:25<04:25, 1.54s/it, loss=29.31]Epoch 1/5: 8%|▊ | 16/189 [00:27<04:25, 1.54s/it, loss=28.87]Epoch 1/5: 9%|▉ | 17/189 [00:27<04:30, 1.58s/it, loss=28.87]Epoch 1/5: 9%|▉ | 17/189 [00:28<04:30, 1.58s/it, loss=28.11]Epoch 1/5: 10%|▉ | 18/189 [00:28<04:31, 1.59s/it, loss=28.11]Epoch 1/5: 10%|▉ | 18/189 [00:30<04:31, 1.59s/it, loss=27.85]Epoch 1/5: 10%|█ | 19/189 [00:30<04:32, 1.60s/it, loss=27.85]Epoch 1/5: 10%|█ | 19/189 [00:32<04:32, 1.60s/it, loss=27.22]Epoch 1/5: 11%|█ | 20/189 [00:32<04:33, 1.62s/it, loss=27.22]Epoch 1/5: 11%|█ | 20/189 [00:33<04:33, 1.62s/it, loss=27.05]Epoch 1/5: 11%|█ | 21/189 [00:33<04:31, 1.61s/it, loss=27.05]Epoch 1/5: 11%|█ | 21/189 [00:35<04:31, 1.61s/it, loss=26.80]Epoch 1/5: 12%|█▏ | 22/189 [00:35<04:37, 1.66s/it, loss=26.80]Epoch 1/5: 12%|█▏ | 22/189 [00:37<04:37, 1.66s/it, loss=26.49]Epoch 1/5: 12%|█▏ | 23/189 [00:37<04:35, 1.66s/it, loss=26.49]Epoch 1/5: 12%|█▏ | 23/189 [00:38<04:35, 1.66s/it, loss=25.91]Epoch 1/5: 13%|█▎ | 24/189 [00:38<04:36, 1.67s/it, loss=25.91]Epoch 1/5: 13%|█▎ | 24/189 [00:40<04:36, 1.67s/it, loss=25.62]Epoch 1/5: 13%|█▎ | 25/189 [00:40<04:28, 1.64s/it, loss=25.62]Epoch 1/5: 13%|█▎ | 25/189 [00:42<04:28, 1.64s/it, loss=25.39]Epoch 1/5: 14%|█▍ | 26/189 [00:42<04:25, 1.63s/it, loss=25.39]Epoch 1/5: 14%|█▍ | 26/189 [00:43<04:25, 1.63s/it, loss=24.72]Epoch 1/5: 14%|█▍ | 27/189 [00:43<04:26, 1.65s/it, loss=24.72]Epoch 1/5: 14%|█▍ | 27/189 [00:45<04:26, 1.65s/it, loss=24.38]Epoch 1/5: 15%|█▍ | 28/189 [00:45<04:22, 1.63s/it, loss=24.38]Epoch 1/5: 15%|█▍ | 28/189 [00:47<04:22, 1.63s/it, loss=24.32]Epoch 1/5: 15%|█▌ | 29/189 [00:47<04:22, 1.64s/it, loss=24.32]Epoch 1/5: 15%|█▌ | 29/189 [00:48<04:22, 1.64s/it, loss=24.00]Epoch 1/5: 16%|█▌ | 30/189 [00:48<04:18, 1.63s/it, loss=24.00]Epoch 1/5: 16%|█▌ | 30/189 [00:50<04:18, 1.63s/it, loss=23.63]Epoch 1/5: 16%|█▋ | 31/189 [00:50<04:25, 1.68s/it, loss=23.63]Epoch 1/5: 16%|█▋ | 31/189 [00:51<04:25, 1.68s/it, loss=23.35]Epoch 1/5: 17%|█▋ | 32/189 [00:51<04:18, 1.64s/it, loss=23.35]Epoch 1/5: 17%|█▋ | 32/189 [00:53<04:18, 1.64s/it, loss=23.33]Epoch 1/5: 17%|█▋ | 33/189 [00:53<04:17, 1.65s/it, loss=23.33]Epoch 1/5: 17%|█▋ | 33/189 [00:55<04:17, 1.65s/it, loss=23.14]Epoch 1/5: 18%|█▊ | 34/189 [00:55<04:16, 1.66s/it, loss=23.14]Epoch 1/5: 18%|█▊ | 34/189 [00:56<04:16, 1.66s/it, loss=23.01]Epoch 1/5: 19%|█▊ | 35/189 [00:56<04:14, 1.65s/it, loss=23.01]Epoch 1/5: 19%|█▊ | 35/189 [00:58<04:14, 1.65s/it, loss=22.90]Epoch 1/5: 19%|█▉ | 36/189 [00:58<04:09, 1.63s/it, loss=22.90]Epoch 1/5: 19%|█▉ | 36/189 [01:00<04:09, 1.63s/it, loss=22.78]Epoch 1/5: 20%|█▉ | 37/189 [01:00<04:14, 1.67s/it, loss=22.78]Epoch 1/5: 20%|█▉ | 37/189 [01:02<04:14, 1.67s/it, loss=22.71]Epoch 1/5: 20%|██ | 38/189 [01:02<04:14, 1.69s/it, loss=22.71]Epoch 1/5: 20%|██ | 38/189 [01:03<04:14, 1.69s/it, loss=22.54]Epoch 1/5: 21%|██ | 39/189 [01:03<04:14, 1.70s/it, loss=22.54]Epoch 1/5: 21%|██ | 39/189 [01:05<04:14, 1.70s/it, loss=22.40]Epoch 1/5: 21%|██ | 40/189 [01:05<04:15, 1.71s/it, loss=22.40]Epoch 1/5: 21%|██ | 40/189 [01:07<04:15, 1.71s/it, loss=22.37]Epoch 1/5: 22%|██▏ | 41/189 [01:07<04:18, 1.75s/it, loss=22.37]Epoch 1/5: 22%|██▏ | 41/189 [01:09<04:18, 1.75s/it, loss=22.06]Epoch 1/5: 22%|██▏ | 42/189 [01:09<04:13, 1.73s/it, loss=22.06]Epoch 1/5: 22%|██▏ | 42/189 [01:10<04:13, 1.73s/it, loss=22.00]Epoch 1/5: 23%|██▎ | 43/189 [01:10<04:10, 1.72s/it, loss=22.00]Epoch 1/5: 23%|██▎ | 43/189 [01:12<04:10, 1.72s/it, loss=22.13]Epoch 1/5: 23%|██▎ | 44/189 [01:12<04:05, 1.70s/it, loss=22.13]Epoch 1/5: 23%|██▎ | 44/189 [01:13<04:05, 1.70s/it, loss=21.73]Epoch 1/5: 24%|██▍ | 45/189 [01:13<03:59, 1.66s/it, loss=21.73]Epoch 1/5: 24%|██▍ | 45/189 [01:15<03:59, 1.66s/it, loss=21.77]Epoch 1/5: 24%|██▍ | 46/189 [01:15<03:52, 1.63s/it, loss=21.77]Epoch 1/5: 24%|██▍ | 46/189 [01:17<03:52, 1.63s/it, loss=21.65]Epoch 1/5: 25%|██▍ | 47/189 [01:17<03:52, 1.64s/it, loss=21.65]Epoch 1/5: 25%|██▍ | 47/189 [01:18<03:52, 1.64s/it, loss=21.58]Epoch 1/5: 25%|██▌ | 48/189 [01:18<03:43, 1.58s/it, loss=21.58]Epoch 1/5: 25%|██▌ | 48/189 [01:20<03:43, 1.58s/it, loss=21.53]Epoch 1/5: 26%|██▌ | 49/189 [01:20<03:37, 1.55s/it, loss=21.53]Epoch 1/5: 26%|██▌ | 49/189 [01:21<03:37, 1.55s/it, loss=21.26]Epoch 1/5: 26%|██▋ | 50/189 [01:21<03:27, 1.50s/it, loss=21.26]Epoch 1/5: 26%|██▋ | 50/189 [01:23<03:27, 1.50s/it, loss=21.38]Epoch 1/5: 27%|██▋ | 51/189 [01:23<03:31, 1.54s/it, loss=21.38]Epoch 1/5: 27%|██▋ | 51/189 [01:24<03:31, 1.54s/it, loss=21.17]Epoch 1/5: 28%|██▊ | 52/189 [01:24<03:26, 1.51s/it, loss=21.17]Epoch 1/5: 28%|██▊ | 52/189 [01:26<03:26, 1.51s/it, loss=21.18]Epoch 1/5: 28%|██▊ | 53/189 [01:26<03:27, 1.52s/it, loss=21.18]Epoch 1/5: 28%|██▊ | 53/189 [01:27<03:27, 1.52s/it, loss=21.15]Epoch 1/5: 29%|██▊ | 54/189 [01:27<03:19, 1.48s/it, loss=21.15]Epoch 1/5: 29%|██▊ | 54/189 [01:28<03:19, 1.48s/it, loss=21.16]Epoch 1/5: 29%|██▉ | 55/189 [01:28<03:14, 1.45s/it, loss=21.16]Epoch 1/5: 29%|██▉ | 55/189 [01:30<03:14, 1.45s/it, loss=20.84]Epoch 1/5: 30%|██▉ | 56/189 [01:30<03:18, 1.50s/it, loss=20.84]Epoch 1/5: 30%|██▉ | 56/189 [01:32<03:18, 1.50s/it, loss=20.65]Epoch 1/5: 30%|███ | 57/189 [01:32<03:20, 1.52s/it, loss=20.65]Epoch 1/5: 30%|███ | 57/189 [01:33<03:20, 1.52s/it, loss=20.82]Epoch 1/5: 31%|███ | 58/189 [01:33<03:17, 1.51s/it, loss=20.82]Epoch 1/5: 31%|███ | 58/189 [01:35<03:17, 1.51s/it, loss=20.35]Epoch 1/5: 31%|███ | 59/189 [01:35<03:23, 1.56s/it, loss=20.35]Epoch 1/5: 31%|███ | 59/189 [01:36<03:23, 1.56s/it, loss=20.53]Epoch 1/5: 32%|███▏ | 60/189 [01:36<03:20, 1.55s/it, loss=20.53]Epoch 1/5: 32%|███▏ | 60/189 [01:38<03:20, 1.55s/it, loss=20.51]Epoch 1/5: 32%|███▏ | 61/189 [01:38<03:19, 1.56s/it, loss=20.51]Epoch 1/5: 32%|███▏ | 61/189 [01:39<03:19, 1.56s/it, loss=20.36]Epoch 1/5: 33%|███▎ | 62/189 [01:39<03:19, 1.57s/it, loss=20.36]Epoch 1/5: 33%|███▎ | 62/189 [01:41<03:19, 1.57s/it, loss=20.41]Epoch 1/5: 33%|███▎ | 63/189 [01:41<03:20, 1.59s/it, loss=20.41]Epoch 1/5: 33%|███▎ | 63/189 [01:43<03:20, 1.59s/it, loss=20.24]Epoch 1/5: 34%|███▍ | 64/189 [01:43<03:23, 1.63s/it, loss=20.24]Epoch 1/5: 34%|███▍ | 64/189 [01:44<03:23, 1.63s/it, loss=20.16]Epoch 1/5: 34%|███▍ | 65/189 [01:44<03:14, 1.57s/it, loss=20.16]Epoch 1/5: 34%|███▍ | 65/189 [01:46<03:14, 1.57s/it, loss=20.13]Epoch 1/5: 35%|███▍ | 66/189 [01:46<03:19, 1.63s/it, loss=20.13]Epoch 1/5: 35%|███▍ | 66/189 [01:48<03:19, 1.63s/it, loss=19.93]Epoch 1/5: 35%|███▌ | 67/189 [01:48<03:17, 1.62s/it, loss=19.93]Epoch 1/5: 35%|███▌ | 67/189 [01:49<03:17, 1.62s/it, loss=20.23]Epoch 1/5: 36%|███▌ | 68/189 [01:49<03:12, 1.59s/it, loss=20.23]Epoch 1/5: 36%|███▌ | 68/189 [01:51<03:12, 1.59s/it, loss=19.74]Epoch 1/5: 37%|███▋ | 69/189 [01:51<03:10, 1.59s/it, loss=19.74]Epoch 1/5: 37%|███▋ | 69/189 [01:52<03:10, 1.59s/it, loss=19.88]Epoch 1/5: 37%|███▋ | 70/189 [01:52<03:07, 1.58s/it, loss=19.88]Epoch 1/5: 37%|███▋ | 70/189 [01:54<03:07, 1.58s/it, loss=19.78]Epoch 1/5: 38%|███▊ | 71/189 [01:54<03:02, 1.55s/it, loss=19.78]Epoch 1/5: 38%|███▊ | 71/189 [01:55<03:02, 1.55s/it, loss=19.82]Epoch 1/5: 38%|███▊ | 72/189 [01:55<02:59, 1.53s/it, loss=19.82]Epoch 1/5: 38%|███▊ | 72/189 [01:57<02:59, 1.53s/it, loss=19.66]Epoch 1/5: 39%|███▊ | 73/189 [01:57<02:59, 1.55s/it, loss=19.66]Epoch 1/5: 39%|███▊ | 73/189 [01:58<02:59, 1.55s/it, loss=19.61]Epoch 1/5: 39%|███▉ | 74/189 [01:58<03:03, 1.60s/it, loss=19.61]Epoch 1/5: 39%|███▉ | 74/189 [02:00<03:03, 1.60s/it, loss=19.48]Epoch 1/5: 40%|███▉ | 75/189 [02:00<03:04, 1.62s/it, loss=19.48]Epoch 1/5: 40%|███▉ | 75/189 [02:02<03:04, 1.62s/it, loss=19.44]Epoch 1/5: 40%|████ | 76/189 [02:02<03:01, 1.61s/it, loss=19.44]Epoch 1/5: 40%|████ | 76/189 [02:03<03:01, 1.61s/it, loss=19.40]Epoch 1/5: 41%|████ | 77/189 [02:03<03:01, 1.62s/it, loss=19.40]Epoch 1/5: 41%|████ | 77/189 [02:05<03:01, 1.62s/it, loss=19.42]Epoch 1/5: 41%|████▏ | 78/189 [02:05<02:57, 1.60s/it, loss=19.42]Epoch 1/5: 41%|████▏ | 78/189 [02:07<02:57, 1.60s/it, loss=18.89]Epoch 1/5: 42%|████▏ | 79/189 [02:07<02:56, 1.61s/it, loss=18.89]Epoch 1/5: 42%|████▏ | 79/189 [02:08<02:56, 1.61s/it, loss=19.14]Epoch 1/5: 42%|████▏ | 80/189 [02:08<02:55, 1.61s/it, loss=19.14]Epoch 1/5: 42%|████▏ | 80/189 [02:10<02:55, 1.61s/it, loss=19.00]Epoch 1/5: 43%|████▎ | 81/189 [02:10<02:56, 1.63s/it, loss=19.00]Epoch 1/5: 43%|████▎ | 81/189 [02:12<02:56, 1.63s/it, loss=18.78]Epoch 1/5: 43%|████▎ | 82/189 [02:12<02:58, 1.67s/it, loss=18.78]Epoch 1/5: 43%|████▎ | 82/189 [02:13<02:58, 1.67s/it, loss=18.91]Epoch 1/5: 44%|████▍ | 83/189 [02:13<02:52, 1.63s/it, loss=18.91]Epoch 1/5: 44%|████▍ | 83/189 [02:15<02:52, 1.63s/it, loss=18.71]Epoch 1/5: 44%|████▍ | 84/189 [02:15<02:49, 1.62s/it, loss=18.71]Epoch 1/5: 44%|████▍ | 84/189 [02:16<02:49, 1.62s/it, loss=18.55]Epoch 1/5: 45%|████▍ | 85/189 [02:16<02:42, 1.56s/it, loss=18.55]Epoch 1/5: 45%|████▍ | 85/189 [02:18<02:42, 1.56s/it, loss=18.64]Epoch 1/5: 46%|████▌ | 86/189 [02:18<02:40, 1.56s/it, loss=18.64]Epoch 1/5: 46%|████▌ | 86/189 [02:19<02:40, 1.56s/it, loss=18.65]Epoch 1/5: 46%|████▌ | 87/189 [02:19<02:41, 1.58s/it, loss=18.65]Epoch 1/5: 46%|████▌ | 87/189 [02:21<02:41, 1.58s/it, loss=18.56]Epoch 1/5: 47%|████▋ | 88/189 [02:21<02:39, 1.58s/it, loss=18.56]Epoch 1/5: 47%|████▋ | 88/189 [02:23<02:39, 1.58s/it, loss=18.59]Epoch 1/5: 47%|████▋ | 89/189 [02:23<02:42, 1.63s/it, loss=18.59]Epoch 1/5: 47%|████▋ | 89/189 [02:24<02:42, 1.63s/it, loss=18.41]Epoch 1/5: 48%|████▊ | 90/189 [02:24<02:40, 1.62s/it, loss=18.41]Epoch 1/5: 48%|████▊ | 90/189 [02:26<02:40, 1.62s/it, loss=18.22]Epoch 1/5: 48%|████▊ | 91/189 [02:26<02:42, 1.66s/it, loss=18.22]Epoch 1/5: 48%|████▊ | 91/189 [02:28<02:42, 1.66s/it, loss=18.27]Epoch 1/5: 49%|████▊ | 92/189 [02:28<02:43, 1.68s/it, loss=18.27]Epoch 1/5: 49%|████▊ | 92/189 [02:29<02:43, 1.68s/it, loss=17.92]Epoch 1/5: 49%|████▉ | 93/189 [02:29<02:35, 1.62s/it, loss=17.92]Epoch 1/5: 49%|████▉ | 93/189 [02:31<02:35, 1.62s/it, loss=18.26]Epoch 1/5: 50%|████▉ | 94/189 [02:31<02:31, 1.60s/it, loss=18.26]Epoch 1/5: 50%|████▉ | 94/189 [02:32<02:31, 1.60s/it, loss=18.02]Epoch 1/5: 50%|█████ | 95/189 [02:32<02:31, 1.61s/it, loss=18.02]Epoch 1/5: 50%|█████ | 95/189 [02:34<02:31, 1.61s/it, loss=18.19]Epoch 1/5: 51%|█████ | 96/189 [02:34<02:32, 1.64s/it, loss=18.19]Epoch 1/5: 51%|█████ | 96/189 [02:36<02:32, 1.64s/it, loss=17.99]Epoch 1/5: 51%|█████▏ | 97/189 [02:36<02:28, 1.62s/it, loss=17.99]Epoch 1/5: 51%|█████▏ | 97/189 [02:37<02:28, 1.62s/it, loss=17.63]Epoch 1/5: 52%|█████▏ | 98/189 [02:37<02:29, 1.64s/it, loss=17.63]Epoch 1/5: 52%|█████▏ | 98/189 [02:39<02:29, 1.64s/it, loss=17.92]Epoch 1/5: 52%|█████▏ | 99/189 [02:39<02:26, 1.62s/it, loss=17.92]Epoch 1/5: 52%|█████▏ | 99/189 [02:41<02:26, 1.62s/it, loss=17.82]Epoch 1/5: 53%|█████▎ | 100/189 [02:41<02:26, 1.65s/it, loss=17.82]Epoch 1/5: 53%|█████▎ | 100/189 [02:42<02:26, 1.65s/it, loss=17.55]Epoch 1/5: 53%|█████▎ | 101/189 [02:42<02:23, 1.64s/it, loss=17.55]Epoch 1/5: 53%|█████▎ | 101/189 [02:44<02:23, 1.64s/it, loss=17.63]Epoch 1/5: 54%|█████▍ | 102/189 [02:44<02:22, 1.63s/it, loss=17.63]Epoch 1/5: 54%|█████▍ | 102/189 [02:46<02:22, 1.63s/it, loss=17.74]Epoch 1/5: 54%|█████▍ | 103/189 [02:46<02:20, 1.64s/it, loss=17.74]Epoch 1/5: 54%|█████▍ | 103/189 [02:47<02:20, 1.64s/it, loss=17.44]Epoch 1/5: 55%|█████▌ | 104/189 [02:47<02:14, 1.58s/it, loss=17.44]Epoch 1/5: 55%|█████▌ | 104/189 [02:49<02:14, 1.58s/it, loss=17.60]Epoch 1/5: 56%|█████▌ | 105/189 [02:49<02:10, 1.56s/it, loss=17.60]Epoch 1/5: 56%|█████▌ | 105/189 [02:50<02:10, 1.56s/it, loss=17.58]Epoch 1/5: 56%|█████▌ | 106/189 [02:50<02:11, 1.58s/it, loss=17.58]Epoch 1/5: 56%|█████▌ | 106/189 [02:52<02:11, 1.58s/it, loss=17.23]Epoch 1/5: 57%|█████▋ | 107/189 [02:52<02:13, 1.63s/it, loss=17.23]Epoch 1/5: 57%|█████▋ | 107/189 [02:53<02:13, 1.63s/it, loss=17.34]Epoch 1/5: 57%|█████▋ | 108/189 [02:53<02:11, 1.62s/it, loss=17.34]Epoch 1/5: 57%|█████▋ | 108/189 [02:55<02:11, 1.62s/it, loss=17.37]Epoch 1/5: 58%|█████▊ | 109/189 [02:55<02:14, 1.68s/it, loss=17.37]Epoch 1/5: 58%|█████▊ | 109/189 [02:57<02:14, 1.68s/it, loss=17.22]Epoch 1/5: 58%|█████▊ | 110/189 [02:57<02:15, 1.72s/it, loss=17.22]Epoch 1/5: 58%|█████▊ | 110/189 [02:59<02:15, 1.72s/it, loss=17.15]Epoch 1/5: 59%|█████▊ | 111/189 [02:59<02:11, 1.69s/it, loss=17.15]Epoch 1/5: 59%|█████▊ | 111/189 [03:00<02:11, 1.69s/it, loss=17.01]Epoch 1/5: 59%|█████▉ | 112/189 [03:00<02:10, 1.70s/it, loss=17.01]Epoch 1/5: 59%|█████▉ | 112/189 [03:02<02:10, 1.70s/it, loss=17.17]Epoch 1/5: 60%|█████▉ | 113/189 [03:02<02:11, 1.73s/it, loss=17.17]Epoch 1/5: 60%|█████▉ | 113/189 [03:04<02:11, 1.73s/it, loss=17.01]Epoch 1/5: 60%|██████ | 114/189 [03:04<02:05, 1.67s/it, loss=17.01]Epoch 1/5: 60%|██████ | 114/189 [03:05<02:05, 1.67s/it, loss=16.85]Epoch 1/5: 61%|██████ | 115/189 [03:05<02:00, 1.63s/it, loss=16.85]Epoch 1/5: 61%|██████ | 115/189 [03:07<02:00, 1.63s/it, loss=16.88]Epoch 1/5: 61%|██████▏ | 116/189 [03:07<02:01, 1.67s/it, loss=16.88]Epoch 1/5: 61%|██████▏ | 116/189 [03:09<02:01, 1.67s/it, loss=16.82]Epoch 1/5: 62%|██████▏ | 117/189 [03:09<01:59, 1.66s/it, loss=16.82]Epoch 1/5: 62%|██████▏ | 117/189 [03:10<01:59, 1.66s/it, loss=16.83]Epoch 1/5: 62%|██████▏ | 118/189 [03:10<01:57, 1.65s/it, loss=16.83]Epoch 1/5: 62%|██████▏ | 118/189 [03:12<01:57, 1.65s/it, loss=16.63]Epoch 1/5: 63%|██████▎ | 119/189 [03:12<01:54, 1.63s/it, loss=16.63]Epoch 1/5: 63%|██████▎ | 119/189 [03:14<01:54, 1.63s/it, loss=16.44]Epoch 1/5: 63%|██████▎ | 120/189 [03:14<01:52, 1.64s/it, loss=16.44]Epoch 1/5: 63%|██████▎ | 120/189 [03:15<01:52, 1.64s/it, loss=16.58]Epoch 1/5: 64%|██████▍ | 121/189 [03:15<01:50, 1.62s/it, loss=16.58]Epoch 1/5: 64%|██████▍ | 121/189 [03:17<01:50, 1.62s/it, loss=16.61]Epoch 1/5: 65%|██████▍ | 122/189 [03:17<01:50, 1.65s/it, loss=16.61]Epoch 1/5: 65%|██████▍ | 122/189 [03:19<01:50, 1.65s/it, loss=16.14]Epoch 1/5: 65%|██████▌ | 123/189 [03:19<01:49, 1.66s/it, loss=16.14]Epoch 1/5: 65%|██████▌ | 123/189 [03:20<01:49, 1.66s/it, loss=16.71]Epoch 1/5: 66%|██████▌ | 124/189 [03:20<01:43, 1.59s/it, loss=16.71]Epoch 1/5: 66%|██████▌ | 124/189 [03:22<01:43, 1.59s/it, loss=16.30]Epoch 1/5: 66%|██████▌ | 125/189 [03:22<01:40, 1.58s/it, loss=16.30]Epoch 1/5: 66%|██████▌ | 125/189 [03:23<01:40, 1.58s/it, loss=16.33]Epoch 1/5: 67%|██████▋ | 126/189 [03:23<01:38, 1.56s/it, loss=16.33]Epoch 1/5: 67%|██████▋ | 126/189 [03:25<01:38, 1.56s/it, loss=16.26]Epoch 1/5: 67%|██████▋ | 127/189 [03:25<01:36, 1.56s/it, loss=16.26]Epoch 1/5: 67%|██████▋ | 127/189 [03:26<01:36, 1.56s/it, loss=16.13]Epoch 1/5: 68%|██████▊ | 128/189 [03:26<01:34, 1.55s/it, loss=16.13]Epoch 1/5: 68%|██████▊ | 128/189 [03:28<01:34, 1.55s/it, loss=15.99]Epoch 1/5: 68%|██████▊ | 129/189 [03:28<01:35, 1.58s/it, loss=15.99]Epoch 1/5: 68%|██████▊ | 129/189 [03:29<01:35, 1.58s/it, loss=16.44]Epoch 1/5: 69%|██████▉ | 130/189 [03:29<01:34, 1.60s/it, loss=16.44]Epoch 1/5: 69%|██████▉ | 130/189 [03:31<01:34, 1.60s/it, loss=15.88]Epoch 1/5: 69%|██████▉ | 131/189 [03:31<01:31, 1.58s/it, loss=15.88]Epoch 1/5: 69%|██████▉ | 131/189 [03:33<01:31, 1.58s/it, loss=16.16]Epoch 1/5: 70%|██████▉ | 132/189 [03:33<01:31, 1.60s/it, loss=16.16]Epoch 1/5: 70%|██████▉ | 132/189 [03:34<01:31, 1.60s/it, loss=15.84]Epoch 1/5: 70%|███████ | 133/189 [03:34<01:28, 1.58s/it, loss=15.84]Epoch 1/5: 70%|███████ | 133/189 [03:36<01:28, 1.58s/it, loss=15.77]Epoch 1/5: 71%|███████ | 134/189 [03:36<01:27, 1.59s/it, loss=15.77]Epoch 1/5: 71%|███████ | 134/189 [03:37<01:27, 1.59s/it, loss=15.82]Epoch 1/5: 71%|███████▏ | 135/189 [03:37<01:27, 1.62s/it, loss=15.82]Epoch 1/5: 71%|███████▏ | 135/189 [03:39<01:27, 1.62s/it, loss=15.74]Epoch 1/5: 72%|███████▏ | 136/189 [03:39<01:31, 1.72s/it, loss=15.74]Epoch 1/5: 72%|███████▏ | 136/189 [03:41<01:31, 1.72s/it, loss=15.69]Epoch 1/5: 72%|███████▏ | 137/189 [03:41<01:30, 1.75s/it, loss=15.69]Epoch 1/5: 72%|███████▏ | 137/189 [03:43<01:30, 1.75s/it, loss=15.77]Epoch 1/5: 73%|███████▎ | 138/189 [03:43<01:30, 1.78s/it, loss=15.77]Epoch 1/5: 73%|███████▎ | 138/189 [03:45<01:30, 1.78s/it, loss=15.66]Epoch 1/5: 74%|███████▎ | 139/189 [03:45<01:29, 1.79s/it, loss=15.66]Epoch 1/5: 74%|███████▎ | 139/189 [03:47<01:29, 1.79s/it, loss=15.55]Epoch 1/5: 74%|███████▍ | 140/189 [03:47<01:27, 1.79s/it, loss=15.55]Epoch 1/5: 74%|███████▍ | 140/189 [03:48<01:27, 1.79s/it, loss=15.68]Epoch 1/5: 75%|███████▍ | 141/189 [03:48<01:23, 1.74s/it, loss=15.68]Epoch 1/5: 75%|███████▍ | 141/189 [03:50<01:23, 1.74s/it, loss=15.45]Epoch 1/5: 75%|███████▌ | 142/189 [03:50<01:18, 1.67s/it, loss=15.45]Epoch 1/5: 75%|███████▌ | 142/189 [03:52<01:18, 1.67s/it, loss=15.60]Epoch 1/5: 76%|███████▌ | 143/189 [03:52<01:18, 1.70s/it, loss=15.60]Epoch 1/5: 76%|███████▌ | 143/189 [03:53<01:18, 1.70s/it, loss=15.44]Epoch 1/5: 76%|███████▌ | 144/189 [03:53<01:14, 1.65s/it, loss=15.44]Epoch 1/5: 76%|███████▌ | 144/189 [03:55<01:14, 1.65s/it, loss=15.45]Epoch 1/5: 77%|███████▋ | 145/189 [03:55<01:13, 1.68s/it, loss=15.45]Epoch 1/5: 77%|███████▋ | 145/189 [03:57<01:13, 1.68s/it, loss=15.33]Epoch 1/5: 77%|███████▋ | 146/189 [03:57<01:13, 1.71s/it, loss=15.33]Epoch 1/5: 77%|███████▋ | 146/189 [03:58<01:13, 1.71s/it, loss=15.21]Epoch 1/5: 78%|███████▊ | 147/189 [03:58<01:11, 1.69s/it, loss=15.21]Epoch 1/5: 78%|███████▊ | 147/189 [04:00<01:11, 1.69s/it, loss=15.24]Epoch 1/5: 78%|███████▊ | 148/189 [04:00<01:08, 1.66s/it, loss=15.24]Epoch 1/5: 78%|███████▊ | 148/189 [04:01<01:08, 1.66s/it, loss=15.07]Epoch 1/5: 79%|███████▉ | 149/189 [04:01<01:05, 1.63s/it, loss=15.07]Epoch 1/5: 79%|███████▉ | 149/189 [04:03<01:05, 1.63s/it, loss=15.05]Epoch 1/5: 79%|███████▉ | 150/189 [04:03<01:02, 1.60s/it, loss=15.05]Epoch 1/5: 79%|███████▉ | 150/189 [04:05<01:02, 1.60s/it, loss=15.25]Epoch 1/5: 80%|███████▉ | 151/189 [04:05<01:00, 1.60s/it, loss=15.25]Epoch 1/5: 80%|███████▉ | 151/189 [04:06<01:00, 1.60s/it, loss=14.99]Epoch 1/5: 80%|████████ | 152/189 [04:06<00:59, 1.61s/it, loss=14.99]Epoch 1/5: 80%|████████ | 152/189 [04:08<00:59, 1.61s/it, loss=14.92]Epoch 1/5: 81%|████████ | 153/189 [04:08<00:57, 1.61s/it, loss=14.92]Epoch 1/5: 81%|████████ | 153/189 [04:09<00:57, 1.61s/it, loss=15.06]Epoch 1/5: 81%|████████▏ | 154/189 [04:09<00:56, 1.61s/it, loss=15.06]Epoch 1/5: 81%|████████▏ | 154/189 [04:11<00:56, 1.61s/it, loss=14.89]Epoch 1/5: 82%|████████▏ | 155/189 [04:11<00:55, 1.64s/it, loss=14.89]Epoch 1/5: 82%|████████▏ | 155/189 [04:13<00:55, 1.64s/it, loss=14.79]Epoch 1/5: 83%|████████▎ | 156/189 [04:13<00:55, 1.67s/it, loss=14.79]Epoch 1/5: 83%|████████▎ | 156/189 [04:15<00:55, 1.67s/it, loss=14.68]Epoch 1/5: 83%|████████▎ | 157/189 [04:15<00:54, 1.69s/it, loss=14.68]Epoch 1/5: 83%|████████▎ | 157/189 [04:16<00:54, 1.69s/it, loss=14.75]Epoch 1/5: 84%|████████▎ | 158/189 [04:16<00:52, 1.69s/it, loss=14.75]Epoch 1/5: 84%|████████▎ | 158/189 [04:18<00:52, 1.69s/it, loss=14.70]Epoch 1/5: 84%|████████▍ | 159/189 [04:18<00:50, 1.70s/it, loss=14.70]Epoch 1/5: 84%|████████▍ | 159/189 [04:20<00:50, 1.70s/it, loss=14.68]Epoch 1/5: 85%|████████▍ | 160/189 [04:20<00:49, 1.70s/it, loss=14.68]Epoch 1/5: 85%|████████▍ | 160/189 [04:21<00:49, 1.70s/it, loss=14.69]Epoch 1/5: 85%|████████▌ | 161/189 [04:21<00:47, 1.71s/it, loss=14.69]Epoch 1/5: 85%|████████▌ | 161/189 [04:23<00:47, 1.71s/it, loss=14.70]Epoch 1/5: 86%|████████▌ | 162/189 [04:23<00:45, 1.68s/it, loss=14.70]Epoch 1/5: 86%|████████▌ | 162/189 [04:25<00:45, 1.68s/it, loss=14.55]Epoch 1/5: 86%|████████▌ | 163/189 [04:25<00:43, 1.66s/it, loss=14.55]Epoch 1/5: 86%|████████▌ | 163/189 [04:26<00:43, 1.66s/it, loss=14.70]Epoch 1/5: 87%|████████▋ | 164/189 [04:26<00:41, 1.67s/it, loss=14.70]Epoch 1/5: 87%|████████▋ | 164/189 [04:28<00:41, 1.67s/it, loss=14.55]Epoch 1/5: 87%|████████▋ | 165/189 [04:28<00:39, 1.66s/it, loss=14.55]Epoch 1/5: 87%|████████▋ | 165/189 [04:30<00:39, 1.66s/it, loss=14.54]Epoch 1/5: 88%|████████▊ | 166/189 [04:30<00:38, 1.68s/it, loss=14.54]Epoch 1/5: 88%|████████▊ | 166/189 [04:31<00:38, 1.68s/it, loss=14.35]Epoch 1/5: 88%|████████▊ | 167/189 [04:31<00:36, 1.66s/it, loss=14.35]Epoch 1/5: 88%|████████▊ | 167/189 [04:33<00:36, 1.66s/it, loss=14.41]Epoch 1/5: 89%|████████▉ | 168/189 [04:33<00:35, 1.67s/it, loss=14.41]Epoch 1/5: 89%|████████▉ | 168/189 [04:35<00:35, 1.67s/it, loss=14.46]Epoch 1/5: 89%|████████▉ | 169/189 [04:35<00:33, 1.66s/it, loss=14.46]Epoch 1/5: 89%|████████▉ | 169/189 [04:36<00:33, 1.66s/it, loss=14.36]Epoch 1/5: 90%|████████▉ | 170/189 [04:36<00:30, 1.63s/it, loss=14.36]Epoch 1/5: 90%|████████▉ | 170/189 [04:38<00:30, 1.63s/it, loss=14.38]Epoch 1/5: 90%|█████████ | 171/189 [04:38<00:30, 1.68s/it, loss=14.38]Epoch 1/5: 90%|█████████ | 171/189 [04:40<00:30, 1.68s/it, loss=14.19]Epoch 1/5: 91%|█████████ | 172/189 [04:40<00:28, 1.68s/it, loss=14.19]Epoch 1/5: 91%|█████████ | 172/189 [04:41<00:28, 1.68s/it, loss=14.05]Epoch 1/5: 92%|█████████▏| 173/189 [04:41<00:25, 1.62s/it, loss=14.05]Epoch 1/5: 92%|█████████▏| 173/189 [04:43<00:25, 1.62s/it, loss=14.13]Epoch 1/5: 92%|█████████▏| 174/189 [04:43<00:24, 1.65s/it, loss=14.13]Epoch 1/5: 92%|█████████▏| 174/189 [04:44<00:24, 1.65s/it, loss=14.16]Epoch 1/5: 93%|█████████▎| 175/189 [04:44<00:22, 1.59s/it, loss=14.16]Epoch 1/5: 93%|█████████▎| 175/189 [04:46<00:22, 1.59s/it, loss=14.07]Epoch 1/5: 93%|█████████▎| 176/189 [04:46<00:20, 1.62s/it, loss=14.07]Epoch 1/5: 93%|█████████▎| 176/189 [04:48<00:20, 1.62s/it, loss=13.89]Epoch 1/5: 94%|█████████▎| 177/189 [04:48<00:18, 1.58s/it, loss=13.89]Epoch 1/5: 94%|█████████▎| 177/189 [04:49<00:18, 1.58s/it, loss=13.97]Epoch 1/5: 94%|█████████▍| 178/189 [04:49<00:17, 1.62s/it, loss=13.97]Epoch 1/5: 94%|█████████▍| 178/189 [04:51<00:17, 1.62s/it, loss=14.08]Epoch 1/5: 95%|█████████▍| 179/189 [04:51<00:16, 1.64s/it, loss=14.08]Epoch 1/5: 95%|█████████▍| 179/189 [04:53<00:16, 1.64s/it, loss=13.94]Epoch 1/5: 95%|█████████▌| 180/189 [04:53<00:14, 1.63s/it, loss=13.94]Epoch 1/5: 95%|█████████▌| 180/189 [04:54<00:14, 1.63s/it, loss=13.85]Epoch 1/5: 96%|█████████▌| 181/189 [04:54<00:13, 1.64s/it, loss=13.85]Epoch 1/5: 96%|█████████▌| 181/189 [04:56<00:13, 1.64s/it, loss=13.88]Epoch 1/5: 96%|█████████▋| 182/189 [04:56<00:11, 1.63s/it, loss=13.88]Epoch 1/5: 96%|█████████▋| 182/189 [04:58<00:11, 1.63s/it, loss=13.67]Epoch 1/5: 97%|█████████▋| 183/189 [04:58<00:09, 1.66s/it, loss=13.67]Epoch 1/5: 97%|█████████▋| 183/189 [04:59<00:09, 1.66s/it, loss=13.83]Epoch 1/5: 97%|█████████▋| 184/189 [04:59<00:08, 1.65s/it, loss=13.83]Epoch 1/5: 97%|█████████▋| 184/189 [05:01<00:08, 1.65s/it, loss=13.61]Epoch 1/5: 98%|█████████▊| 185/189 [05:01<00:06, 1.58s/it, loss=13.61]Epoch 1/5: 98%|█████████▊| 185/189 [05:02<00:06, 1.58s/it, loss=13.65]Epoch 1/5: 98%|█████████▊| 186/189 [05:02<00:04, 1.62s/it, loss=13.65]Epoch 1/5: 98%|█████████▊| 186/189 [05:04<00:04, 1.62s/it, loss=13.69]Epoch 1/5: 99%|█████████▉| 187/189 [05:04<00:03, 1.61s/it, loss=13.69]Epoch 1/5: 99%|█████████▉| 187/189 [05:05<00:03, 1.61s/it, loss=13.57]Epoch 1/5: 99%|█████████▉| 188/189 [05:05<00:01, 1.59s/it, loss=13.57]Epoch 1/5: 99%|█████████▉| 188/189 [05:07<00:01, 1.59s/it, loss=13.67]Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00, 1.60s/it, loss=13.67]Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00, 1.63s/it, loss=13.67]
0%| | 0/23 [00:00<?, ?it/s] 4%|▍ | 1/23 [00:00<00:08, 2.64it/s] 9%|▊ | 2/23 [00:00<00:08, 2.60it/s] 13%|█▎ | 3/23 [00:01<00:07, 2.57it/s] 17%|█▋ | 4/23 [00:01<00:06, 2.94it/s] 22%|██▏ | 5/23 [00:01<00:06, 2.82it/s] 26%|██▌ | 6/23 [00:02<00:06, 2.67it/s] 30%|███ | 7/23 [00:02<00:05, 2.85it/s] 35%|███▍ | 8/23 [00:02<00:05, 2.92it/s] 39%|███▉ | 9/23 [00:03<00:05, 2.68it/s] 43%|████▎ | 10/23 [00:03<00:04, 2.78it/s] 48%|████▊ | 11/23 [00:04<00:04, 2.71it/s] 52%|█████▏ | 12/23 [00:04<00:03, 2.76it/s] 57%|█████▋ | 13/23 [00:04<00:03, 2.78it/s] 61%|██████ | 14/23 [00:05<00:03, 2.80it/s] 65%|██████▌ | 15/23 [00:05<00:02, 2.84it/s] 70%|██████▉ | 16/23 [00:05<00:02, 2.80it/s] 74%|███████▍ | 17/23 [00:06<00:02, 2.73it/s] 78%|███████▊ | 18/23 [00:06<00:01, 2.67it/s] 83%|████████▎ | 19/23 [00:06<00:01, 2.60it/s] 87%|████████▋ | 20/23 [00:07<00:01, 2.55it/s] 91%|█████████▏| 21/23 [00:07<00:00, 2.73it/s] 96%|█████████▌| 22/23 [00:08<00:00, 2.67it/s]100%|██████████| 23/23 [00:08<00:00, 2.61it/s]100%|██████████| 23/23 [00:08<00:00, 2.71it/s]
Epoch 1: train_loss=19.5774 | R@10=0.0075 | DCG@10=0.0666 | NDCG@10=0.0156
Epoch 2/5: 0%| | 0/189 [00:00<?, ?it/s]Epoch 2/5: 0%| | 0/189 [00:01<?, ?it/s, loss=13.54]Epoch 2/5: 1%| | 1/189 [00:01<05:20, 1.71s/it, loss=13.54]Epoch 2/5: 1%| | 1/189 [00:03<05:20, 1.71s/it, loss=13.40]Epoch 2/5: 1%| | 2/189 [00:03<05:08, 1.65s/it, loss=13.40]Epoch 2/5: 1%| | 2/189 [00:04<05:08, 1.65s/it, loss=13.47]Epoch 2/5: 2%|▏ | 3/189 [00:04<04:51, 1.56s/it, loss=13.47]Epoch 2/5: 2%|▏ | 3/189 [00:06<04:51, 1.56s/it, loss=13.47]Epoch 2/5: 2%|▏ | 4/189 [00:06<04:50, 1.57s/it, loss=13.47]Epoch 2/5: 2%|▏ | 4/189 [00:07<04:50, 1.57s/it, loss=13.30]Epoch 2/5: 3%|▎ | 5/189 [00:07<04:46, 1.56s/it, loss=13.30]Epoch 2/5: 3%|▎ | 5/189 [00:09<04:46, 1.56s/it, loss=13.31]Epoch 2/5: 3%|▎ | 6/189 [00:09<04:49, 1.58s/it, loss=13.31]Epoch 2/5: 3%|▎ | 6/189 [00:11<04:49, 1.58s/it, loss=13.23]Epoch 2/5: 4%|▎ | 7/189 [00:11<04:46, 1.58s/it, loss=13.23]Epoch 2/5: 4%|▎ | 7/189 [00:12<04:46, 1.58s/it, loss=13.24]Epoch 2/5: 4%|▍ | 8/189 [00:12<04:33, 1.51s/it, loss=13.24]Epoch 2/5: 4%|▍ | 8/189 [00:13<04:33, 1.51s/it, loss=13.35]Epoch 2/5: 5%|▍ | 9/189 [00:13<04:31, 1.51s/it, loss=13.35]Epoch 2/5: 5%|▍ | 9/189 [00:15<04:31, 1.51s/it, loss=13.15]Epoch 2/5: 5%|▌ | 10/189 [00:15<04:31, 1.52s/it, loss=13.15]Epoch 2/5: 5%|▌ | 10/189 [00:17<04:31, 1.52s/it, loss=13.11]Epoch 2/5: 6%|▌ | 11/189 [00:17<04:29, 1.52s/it, loss=13.11]Epoch 2/5: 6%|▌ | 11/189 [00:18<04:29, 1.52s/it, loss=13.15]Epoch 2/5: 6%|▋ | 12/189 [00:18<04:35, 1.56s/it, loss=13.15]Epoch 2/5: 6%|▋ | 12/189 [00:20<04:35, 1.56s/it, loss=13.11]Epoch 2/5: 7%|▋ | 13/189 [00:20<04:40, 1.59s/it, loss=13.11]Epoch 2/5: 7%|▋ | 13/189 [00:22<04:40, 1.59s/it, loss=13.27]Epoch 2/5: 7%|▋ | 14/189 [00:22<04:45, 1.63s/it, loss=13.27]Epoch 2/5: 7%|▋ | 14/189 [00:23<04:45, 1.63s/it, loss=13.05]Epoch 2/5: 8%|▊ | 15/189 [00:23<04:50, 1.67s/it, loss=13.05]Epoch 2/5: 8%|▊ | 15/189 [00:25<04:50, 1.67s/it, loss=12.91]Epoch 2/5: 8%|▊ | 16/189 [00:25<04:47, 1.66s/it, loss=12.91]Epoch 2/5: 8%|▊ | 16/189 [00:27<04:47, 1.66s/it, loss=12.93]Epoch 2/5: 9%|▉ | 17/189 [00:27<04:40, 1.63s/it, loss=12.93]Epoch 2/5: 9%|▉ | 17/189 [00:28<04:40, 1.63s/it, loss=13.04]Epoch 2/5: 10%|▉ | 18/189 [00:28<04:39, 1.63s/it, loss=13.04]Epoch 2/5: 10%|▉ | 18/189 [00:30<04:39, 1.63s/it, loss=12.89]Epoch 2/5: 10%|█ | 19/189 [00:30<04:38, 1.64s/it, loss=12.89]Epoch 2/5: 10%|█ | 19/189 [00:31<04:38, 1.64s/it, loss=12.66]Epoch 2/5: 11%|█ | 20/189 [00:31<04:38, 1.65s/it, loss=12.66]Epoch 2/5: 11%|█ | 20/189 [00:33<04:38, 1.65s/it, loss=12.61]Epoch 2/5: 11%|█ | 21/189 [00:33<04:42, 1.68s/it, loss=12.61]Epoch 2/5: 11%|█ | 21/189 [00:35<04:42, 1.68s/it, loss=12.86]Epoch 2/5: 12%|█▏ | 22/189 [00:35<04:35, 1.65s/it, loss=12.86]Epoch 2/5: 12%|█▏ | 22/189 [00:36<04:35, 1.65s/it, loss=13.02]Epoch 2/5: 12%|█▏ | 23/189 [00:36<04:29, 1.63s/it, loss=13.02]Epoch 2/5: 12%|█▏ | 23/189 [00:38<04:29, 1.63s/it, loss=12.67]Epoch 2/5: 13%|█▎ | 24/189 [00:38<04:26, 1.62s/it, loss=12.67]Epoch 2/5: 13%|█▎ | 24/189 [00:40<04:26, 1.62s/it, loss=12.71]Epoch 2/5: 13%|█▎ | 25/189 [00:40<04:28, 1.64s/it, loss=12.71]Epoch 2/5: 13%|█▎ | 25/189 [00:41<04:28, 1.64s/it, loss=12.63]Epoch 2/5: 14%|█▍ | 26/189 [00:41<04:19, 1.59s/it, loss=12.63]Epoch 2/5: 14%|█▍ | 26/189 [00:43<04:19, 1.59s/it, loss=12.68]Epoch 2/5: 14%|█▍ | 27/189 [00:43<04:26, 1.64s/it, loss=12.68]Epoch 2/5: 14%|█▍ | 27/189 [00:45<04:26, 1.64s/it, loss=12.62]Epoch 2/5: 15%|█▍ | 28/189 [00:45<04:26, 1.66s/it, loss=12.62]Epoch 2/5: 15%|█▍ | 28/189 [00:46<04:26, 1.66s/it, loss=12.56]Epoch 2/5: 15%|█▌ | 29/189 [00:46<04:20, 1.63s/it, loss=12.56]Epoch 2/5: 15%|█▌ | 29/189 [00:48<04:20, 1.63s/it, loss=12.73]Epoch 2/5: 16%|█▌ | 30/189 [00:48<04:18, 1.63s/it, loss=12.73]Epoch 2/5: 16%|█▌ | 30/189 [00:49<04:18, 1.63s/it, loss=12.63]Epoch 2/5: 16%|█▋ | 31/189 [00:49<04:18, 1.64s/it, loss=12.63]Epoch 2/5: 16%|█▋ | 31/189 [00:51<04:18, 1.64s/it, loss=12.51]Epoch 2/5: 17%|█▋ | 32/189 [00:51<04:15, 1.63s/it, loss=12.51]Epoch 2/5: 17%|█▋ | 32/189 [00:53<04:15, 1.63s/it, loss=12.16]Epoch 2/5: 17%|█▋ | 33/189 [00:53<04:12, 1.62s/it, loss=12.16]Epoch 2/5: 17%|█▋ | 33/189 [00:54<04:12, 1.62s/it, loss=12.49]Epoch 2/5: 18%|█▊ | 34/189 [00:54<04:18, 1.67s/it, loss=12.49]Epoch 2/5: 18%|█▊ | 34/189 [00:56<04:18, 1.67s/it, loss=12.27]Epoch 2/5: 19%|█▊ | 35/189 [00:56<04:19, 1.68s/it, loss=12.27]Epoch 2/5: 19%|█▊ | 35/189 [00:58<04:19, 1.68s/it, loss=12.26]Epoch 2/5: 19%|█▉ | 36/189 [00:58<04:17, 1.68s/it, loss=12.26]Epoch 2/5: 19%|█▉ | 36/189 [00:59<04:17, 1.68s/it, loss=12.36]Epoch 2/5: 20%|█▉ | 37/189 [00:59<04:12, 1.66s/it, loss=12.36]Epoch 2/5: 20%|█▉ | 37/189 [01:01<04:12, 1.66s/it, loss=12.24]Epoch 2/5: 20%|██ | 38/189 [01:01<04:04, 1.62s/it, loss=12.24]Epoch 2/5: 20%|██ | 38/189 [01:03<04:04, 1.62s/it, loss=12.26]Epoch 2/5: 21%|██ | 39/189 [01:03<04:06, 1.64s/it, loss=12.26]Epoch 2/5: 21%|██ | 39/189 [01:04<04:06, 1.64s/it, loss=12.12]Epoch 2/5: 21%|██ | 40/189 [01:04<04:08, 1.67s/it, loss=12.12]Epoch 2/5: 21%|██ | 40/189 [01:06<04:08, 1.67s/it, loss=12.33]Epoch 2/5: 22%|██▏ | 41/189 [01:06<04:09, 1.68s/it, loss=12.33]Epoch 2/5: 22%|██▏ | 41/189 [01:08<04:09, 1.68s/it, loss=12.16]Epoch 2/5: 22%|██▏ | 42/189 [01:08<04:01, 1.65s/it, loss=12.16]Epoch 2/5: 22%|██▏ | 42/189 [01:09<04:01, 1.65s/it, loss=12.10]Epoch 2/5: 23%|██▎ | 43/189 [01:09<03:54, 1.60s/it, loss=12.10]Epoch 2/5: 23%|██▎ | 43/189 [01:11<03:54, 1.60s/it, loss=11.91]Epoch 2/5: 23%|██▎ | 44/189 [01:11<03:52, 1.60s/it, loss=11.91]Epoch 2/5: 23%|██▎ | 44/189 [01:12<03:52, 1.60s/it, loss=12.02]Epoch 2/5: 24%|██▍ | 45/189 [01:12<03:53, 1.62s/it, loss=12.02]Epoch 2/5: 24%|██▍ | 45/189 [01:14<03:53, 1.62s/it, loss=12.11]Epoch 2/5: 24%|██▍ | 46/189 [01:14<03:48, 1.59s/it, loss=12.11]Epoch 2/5: 24%|██▍ | 46/189 [01:16<03:48, 1.59s/it, loss=12.06]Epoch 2/5: 25%|██▍ | 47/189 [01:16<03:56, 1.67s/it, loss=12.06]Epoch 2/5: 25%|██▍ | 47/189 [01:18<03:56, 1.67s/it, loss=12.16]Epoch 2/5: 25%|██▌ | 48/189 [01:18<04:00, 1.71s/it, loss=12.16]Epoch 2/5: 25%|██▌ | 48/189 [01:19<04:00, 1.71s/it, loss=11.98]Epoch 2/5: 26%|██▌ | 49/189 [01:19<03:55, 1.68s/it, loss=11.98]Epoch 2/5: 26%|██▌ | 49/189 [01:21<03:55, 1.68s/it, loss=12.18]Epoch 2/5: 26%|██▋ | 50/189 [01:21<03:53, 1.68s/it, loss=12.18]Epoch 2/5: 26%|██▋ | 50/189 [01:23<03:53, 1.68s/it, loss=11.85]Epoch 2/5: 27%|██▋ | 51/189 [01:23<03:55, 1.70s/it, loss=11.85]Epoch 2/5: 27%|██▋ | 51/189 [01:24<03:55, 1.70s/it, loss=11.98]Epoch 2/5: 28%|██▊ | 52/189 [01:24<03:43, 1.63s/it, loss=11.98]Epoch 2/5: 28%|██▊ | 52/189 [01:26<03:43, 1.63s/it, loss=12.01]Epoch 2/5: 28%|██▊ | 53/189 [01:26<03:41, 1.63s/it, loss=12.01]Epoch 2/5: 28%|██▊ | 53/189 [01:27<03:41, 1.63s/it, loss=11.81]Epoch 2/5: 29%|██▊ | 54/189 [01:27<03:41, 1.64s/it, loss=11.81]Epoch 2/5: 29%|██▊ | 54/189 [01:29<03:41, 1.64s/it, loss=11.78]Epoch 2/5: 29%|██▉ | 55/189 [01:29<03:41, 1.66s/it, loss=11.78]Epoch 2/5: 29%|██▉ | 55/189 [01:31<03:41, 1.66s/it, loss=11.84]Epoch 2/5: 30%|██▉ | 56/189 [01:31<03:41, 1.67s/it, loss=11.84]Epoch 2/5: 30%|██▉ | 56/189 [01:32<03:41, 1.67s/it, loss=12.02]Epoch 2/5: 30%|███ | 57/189 [01:32<03:37, 1.64s/it, loss=12.02]Epoch 2/5: 30%|███ | 57/189 [01:34<03:37, 1.64s/it, loss=11.94]Epoch 2/5: 31%|███ | 58/189 [01:34<03:31, 1.61s/it, loss=11.94]Epoch 2/5: 31%|███ | 58/189 [01:36<03:31, 1.61s/it, loss=11.72]Epoch 2/5: 31%|███ | 59/189 [01:36<03:31, 1.63s/it, loss=11.72]Epoch 2/5: 31%|███ | 59/189 [01:37<03:31, 1.63s/it, loss=11.60]Epoch 2/5: 32%|███▏ | 60/189 [01:37<03:28, 1.61s/it, loss=11.60]Epoch 2/5: 32%|███▏ | 60/189 [01:39<03:28, 1.61s/it, loss=11.65]Epoch 2/5: 32%|███▏ | 61/189 [01:39<03:29, 1.64s/it, loss=11.65]Epoch 2/5: 32%|███▏ | 61/189 [01:41<03:29, 1.64s/it, loss=11.49]Epoch 2/5: 33%|███▎ | 62/189 [01:41<03:29, 1.65s/it, loss=11.49]Epoch 2/5: 33%|███▎ | 62/189 [01:42<03:29, 1.65s/it, loss=11.53]Epoch 2/5: 33%|███▎ | 63/189 [01:42<03:27, 1.65s/it, loss=11.53]Epoch 2/5: 33%|███▎ | 63/189 [01:44<03:27, 1.65s/it, loss=11.93]Epoch 2/5: 34%|███▍ | 64/189 [01:44<03:21, 1.61s/it, loss=11.93]Epoch 2/5: 34%|███▍ | 64/189 [01:45<03:21, 1.61s/it, loss=11.78]Epoch 2/5: 34%|███▍ | 65/189 [01:45<03:16, 1.58s/it, loss=11.78]Epoch 2/5: 34%|███▍ | 65/189 [01:47<03:16, 1.58s/it, loss=11.73]Epoch 2/5: 35%|███▍ | 66/189 [01:47<03:24, 1.66s/it, loss=11.73]Epoch 2/5: 35%|███▍ | 66/189 [01:49<03:24, 1.66s/it, loss=11.46]Epoch 2/5: 35%|███▌ | 67/189 [01:49<03:18, 1.63s/it, loss=11.46]Epoch 2/5: 35%|███▌ | 67/189 [01:50<03:18, 1.63s/it, loss=11.55]Epoch 2/5: 36%|███▌ | 68/189 [01:50<03:16, 1.63s/it, loss=11.55]Epoch 2/5: 36%|███▌ | 68/189 [01:52<03:16, 1.63s/it, loss=11.59]Epoch 2/5: 37%|███▋ | 69/189 [01:52<03:13, 1.62s/it, loss=11.59]Epoch 2/5: 37%|███▋ | 69/189 [01:53<03:13, 1.62s/it, loss=11.51]Epoch 2/5: 37%|███▋ | 70/189 [01:53<03:10, 1.60s/it, loss=11.51]Epoch 2/5: 37%|███▋ | 70/189 [01:55<03:10, 1.60s/it, loss=11.60]Epoch 2/5: 38%|███▊ | 71/189 [01:55<03:10, 1.62s/it, loss=11.60]Epoch 2/5: 38%|███▊ | 71/189 [01:57<03:10, 1.62s/it, loss=11.33]Epoch 2/5: 38%|███▊ | 72/189 [01:57<03:06, 1.60s/it, loss=11.33]Epoch 2/5: 38%|███▊ | 72/189 [01:58<03:06, 1.60s/it, loss=11.61]Epoch 2/5: 39%|███▊ | 73/189 [01:58<03:04, 1.59s/it, loss=11.61]Epoch 2/5: 39%|███▊ | 73/189 [02:00<03:04, 1.59s/it, loss=11.45]Epoch 2/5: 39%|███▉ | 74/189 [02:00<02:57, 1.55s/it, loss=11.45]Epoch 2/5: 39%|███▉ | 74/189 [02:01<02:57, 1.55s/it, loss=11.39]Epoch 2/5: 40%|███▉ | 75/189 [02:01<02:54, 1.53s/it, loss=11.39]Epoch 2/5: 40%|███▉ | 75/189 [02:03<02:54, 1.53s/it, loss=11.38]Epoch 2/5: 40%|████ | 76/189 [02:03<02:53, 1.53s/it, loss=11.38]Epoch 2/5: 40%|████ | 76/189 [02:04<02:53, 1.53s/it, loss=11.39]Epoch 2/5: 41%|████ | 77/189 [02:04<02:53, 1.55s/it, loss=11.39]Epoch 2/5: 41%|████ | 77/189 [02:06<02:53, 1.55s/it, loss=11.31]Epoch 2/5: 41%|████▏ | 78/189 [02:06<02:52, 1.55s/it, loss=11.31]Epoch 2/5: 41%|████▏ | 78/189 [02:07<02:52, 1.55s/it, loss=11.40]Epoch 2/5: 42%|████▏ | 79/189 [02:07<02:51, 1.56s/it, loss=11.40]Epoch 2/5: 42%|████▏ | 79/189 [02:09<02:51, 1.56s/it, loss=11.34]Epoch 2/5: 42%|████▏ | 80/189 [02:09<02:45, 1.52s/it, loss=11.34]Epoch 2/5: 42%|████▏ | 80/189 [02:10<02:45, 1.52s/it, loss=11.21]Epoch 2/5: 43%|████▎ | 81/189 [02:10<02:44, 1.53s/it, loss=11.21]Epoch 2/5: 43%|████▎ | 81/189 [02:12<02:44, 1.53s/it, loss=11.26]Epoch 2/5: 43%|████▎ | 82/189 [02:12<02:47, 1.57s/it, loss=11.26]Epoch 2/5: 43%|████▎ | 82/189 [02:14<02:47, 1.57s/it, loss=11.34]Epoch 2/5: 44%|████▍ | 83/189 [02:14<02:51, 1.62s/it, loss=11.34]Epoch 2/5: 44%|████▍ | 83/189 [02:15<02:51, 1.62s/it, loss=11.28]Epoch 2/5: 44%|████▍ | 84/189 [02:15<02:48, 1.60s/it, loss=11.28]Epoch 2/5: 44%|████▍ | 84/189 [02:17<02:48, 1.60s/it, loss=11.10]Epoch 2/5: 45%|████▍ | 85/189 [02:17<02:46, 1.61s/it, loss=11.10]Epoch 2/5: 45%|████▍ | 85/189 [02:18<02:46, 1.61s/it, loss=10.87]Epoch 2/5: 46%|████▌ | 86/189 [02:18<02:41, 1.57s/it, loss=10.87]Epoch 2/5: 46%|████▌ | 86/189 [02:20<02:41, 1.57s/it, loss=11.29]Epoch 2/5: 46%|████▌ | 87/189 [02:20<02:37, 1.54s/it, loss=11.29]Epoch 2/5: 46%|████▌ | 87/189 [02:22<02:37, 1.54s/it, loss=11.03]Epoch 2/5: 47%|████▋ | 88/189 [02:22<02:38, 1.57s/it, loss=11.03]Epoch 2/5: 47%|████▋ | 88/189 [02:23<02:38, 1.57s/it, loss=11.04]Epoch 2/5: 47%|████▋ | 89/189 [02:23<02:35, 1.56s/it, loss=11.04]Epoch 2/5: 47%|████▋ | 89/189 [02:25<02:35, 1.56s/it, loss=11.05]Epoch 2/5: 48%|████▊ | 90/189 [02:25<02:36, 1.58s/it, loss=11.05]Epoch 2/5: 48%|████▊ | 90/189 [02:26<02:36, 1.58s/it, loss=11.04]Epoch 2/5: 48%|████▊ | 91/189 [02:26<02:30, 1.54s/it, loss=11.04]Epoch 2/5: 48%|████▊ | 91/189 [02:28<02:30, 1.54s/it, loss=11.02]Epoch 2/5: 49%|████▊ | 92/189 [02:28<02:27, 1.52s/it, loss=11.02]Epoch 2/5: 49%|████▊ | 92/189 [02:29<02:27, 1.52s/it, loss=10.98]Epoch 2/5: 49%|████▉ | 93/189 [02:29<02:28, 1.55s/it, loss=10.98]Epoch 2/5: 49%|████▉ | 93/189 [02:31<02:28, 1.55s/it, loss=11.15]Epoch 2/5: 50%|████▉ | 94/189 [02:31<02:26, 1.54s/it, loss=11.15]Epoch 2/5: 50%|████▉ | 94/189 [02:32<02:26, 1.54s/it, loss=11.03]Epoch 2/5: 50%|█████ | 95/189 [02:32<02:18, 1.47s/it, loss=11.03]Epoch 2/5: 50%|█████ | 95/189 [02:34<02:18, 1.47s/it, loss=10.97]Epoch 2/5: 51%|█████ | 96/189 [02:34<02:16, 1.47s/it, loss=10.97]Epoch 2/5: 51%|█████ | 96/189 [02:35<02:16, 1.47s/it, loss=10.85]Epoch 2/5: 51%|█████▏ | 97/189 [02:35<02:22, 1.55s/it, loss=10.85]Epoch 2/5: 51%|█████▏ | 97/189 [02:37<02:22, 1.55s/it, loss=10.86]Epoch 2/5: 52%|█████▏ | 98/189 [02:37<02:28, 1.63s/it, loss=10.86]Epoch 2/5: 52%|█████▏ | 98/189 [02:39<02:28, 1.63s/it, loss=10.85]Epoch 2/5: 52%|█████▏ | 99/189 [02:39<02:29, 1.67s/it, loss=10.85]Epoch 2/5: 52%|█████▏ | 99/189 [02:41<02:29, 1.67s/it, loss=10.84]Epoch 2/5: 53%|█████▎ | 100/189 [02:41<02:28, 1.67s/it, loss=10.84]Epoch 2/5: 53%|█████▎ | 100/189 [02:42<02:28, 1.67s/it, loss=10.78]Epoch 2/5: 53%|█████▎ | 101/189 [02:42<02:28, 1.68s/it, loss=10.78]Epoch 2/5: 53%|█████▎ | 101/189 [02:44<02:28, 1.68s/it, loss=10.84]Epoch 2/5: 54%|█████▍ | 102/189 [02:44<02:21, 1.63s/it, loss=10.84]Epoch 2/5: 54%|█████▍ | 102/189 [02:45<02:21, 1.63s/it, loss=10.75]Epoch 2/5: 54%|█████▍ | 103/189 [02:45<02:17, 1.59s/it, loss=10.75]Epoch 2/5: 54%|█████▍ | 103/189 [02:47<02:17, 1.59s/it, loss=10.67]Epoch 2/5: 55%|█████▌ | 104/189 [02:47<02:15, 1.60s/it, loss=10.67]Epoch 2/5: 55%|█████▌ | 104/189 [02:48<02:15, 1.60s/it, loss=10.66]Epoch 2/5: 56%|█████▌ | 105/189 [02:48<02:14, 1.60s/it, loss=10.66]Epoch 2/5: 56%|█████▌ | 105/189 [02:50<02:14, 1.60s/it, loss=10.69]Epoch 2/5: 56%|█████▌ | 106/189 [02:50<02:12, 1.60s/it, loss=10.69]Epoch 2/5: 56%|█████▌ | 106/189 [02:52<02:12, 1.60s/it, loss=10.86]Epoch 2/5: 57%|█████▋ | 107/189 [02:52<02:13, 1.62s/it, loss=10.86]Epoch 2/5: 57%|█████▋ | 107/189 [02:53<02:13, 1.62s/it, loss=10.67]Epoch 2/5: 57%|█████▋ | 108/189 [02:53<02:10, 1.62s/it, loss=10.67]Epoch 2/5: 57%|█████▋ | 108/189 [02:55<02:10, 1.62s/it, loss=10.60]Epoch 2/5: 58%|█████▊ | 109/189 [02:55<02:08, 1.61s/it, loss=10.60]Epoch 2/5: 58%|█████▊ | 109/189 [02:57<02:08, 1.61s/it, loss=10.68]Epoch 2/5: 58%|█████▊ | 110/189 [02:57<02:10, 1.66s/it, loss=10.68]Epoch 2/5: 58%|█████▊ | 110/189 [02:58<02:10, 1.66s/it, loss=10.80]Epoch 2/5: 59%|█████▊ | 111/189 [02:58<02:05, 1.61s/it, loss=10.80]Epoch 2/5: 59%|█████▊ | 111/189 [03:00<02:05, 1.61s/it, loss=10.63]Epoch 2/5: 59%|█████▉ | 112/189 [03:00<02:02, 1.59s/it, loss=10.63]Epoch 2/5: 59%|█████▉ | 112/189 [03:01<02:02, 1.59s/it, loss=10.75]Epoch 2/5: 60%|█████▉ | 113/189 [03:01<02:00, 1.58s/it, loss=10.75]Epoch 2/5: 60%|█████▉ | 113/189 [03:03<02:00, 1.58s/it, loss=10.67]Epoch 2/5: 60%|██████ | 114/189 [03:03<02:00, 1.60s/it, loss=10.67]Epoch 2/5: 60%|██████ | 114/189 [03:05<02:00, 1.60s/it, loss=10.79]Epoch 2/5: 61%|██████ | 115/189 [03:05<01:58, 1.60s/it, loss=10.79]Epoch 2/5: 61%|██████ | 115/189 [03:06<01:58, 1.60s/it, loss=10.52]Epoch 2/5: 61%|██████▏ | 116/189 [03:06<01:53, 1.55s/it, loss=10.52]Epoch 2/5: 61%|██████▏ | 116/189 [03:08<01:53, 1.55s/it, loss=10.52]Epoch 2/5: 62%|██████▏ | 117/189 [03:08<01:52, 1.57s/it, loss=10.52]Epoch 2/5: 62%|██████▏ | 117/189 [03:09<01:52, 1.57s/it, loss=10.55]Epoch 2/5: 62%|██████▏ | 118/189 [03:09<01:53, 1.59s/it, loss=10.55]Epoch 2/5: 62%|██████▏ | 118/189 [03:11<01:53, 1.59s/it, loss=10.54]Epoch 2/5: 63%|██████▎ | 119/189 [03:11<01:51, 1.59s/it, loss=10.54]Epoch 2/5: 63%|██████▎ | 119/189 [03:12<01:51, 1.59s/it, loss=10.55]Epoch 2/5: 63%|██████▎ | 120/189 [03:12<01:49, 1.58s/it, loss=10.55]Epoch 2/5: 63%|██████▎ | 120/189 [03:14<01:49, 1.58s/it, loss=10.53]Epoch 2/5: 64%|██████▍ | 121/189 [03:14<01:48, 1.60s/it, loss=10.53]Epoch 2/5: 64%|██████▍ | 121/189 [03:16<01:48, 1.60s/it, loss=10.71]Epoch 2/5: 65%|██████▍ | 122/189 [03:16<01:47, 1.61s/it, loss=10.71]Epoch 2/5: 65%|██████▍ | 122/189 [03:17<01:47, 1.61s/it, loss=10.54]Epoch 2/5: 65%|██████▌ | 123/189 [03:17<01:45, 1.60s/it, loss=10.54]Epoch 2/5: 65%|██████▌ | 123/189 [03:19<01:45, 1.60s/it, loss=10.54]Epoch 2/5: 66%|██████▌ | 124/189 [03:19<01:43, 1.60s/it, loss=10.54]Epoch 2/5: 66%|██████▌ | 124/189 [03:20<01:43, 1.60s/it, loss=10.51]Epoch 2/5: 66%|██████▌ | 125/189 [03:20<01:41, 1.59s/it, loss=10.51]Epoch 2/5: 66%|██████▌ | 125/189 [03:22<01:41, 1.59s/it, loss=10.56]Epoch 2/5: 67%|██████▋ | 126/189 [03:22<01:41, 1.60s/it, loss=10.56]Epoch 2/5: 67%|██████▋ | 126/189 [03:24<01:41, 1.60s/it, loss=10.40]Epoch 2/5: 67%|██████▋ | 127/189 [03:24<01:40, 1.62s/it, loss=10.40]Epoch 2/5: 67%|██████▋ | 127/189 [03:25<01:40, 1.62s/it, loss=10.46]Epoch 2/5: 68%|██████▊ | 128/189 [03:25<01:40, 1.64s/it, loss=10.46]Epoch 2/5: 68%|██████▊ | 128/189 [03:27<01:40, 1.64s/it, loss=10.54]Epoch 2/5: 68%|██████▊ | 129/189 [03:27<01:36, 1.62s/it, loss=10.54]Epoch 2/5: 68%|██████▊ | 129/189 [03:29<01:36, 1.62s/it, loss=10.52]Epoch 2/5: 69%|██████▉ | 130/189 [03:29<01:35, 1.63s/it, loss=10.52]Epoch 2/5: 69%|██████▉ | 130/189 [03:30<01:35, 1.63s/it, loss=10.44]Epoch 2/5: 69%|██████▉ | 131/189 [03:30<01:35, 1.64s/it, loss=10.44]Epoch 2/5: 69%|██████▉ | 131/189 [03:32<01:35, 1.64s/it, loss=10.26]Epoch 2/5: 70%|██████▉ | 132/189 [03:32<01:31, 1.61s/it, loss=10.26]Epoch 2/5: 70%|██████▉ | 132/189 [03:33<01:31, 1.61s/it, loss=10.31]Epoch 2/5: 70%|███████ | 133/189 [03:33<01:29, 1.61s/it, loss=10.31]Epoch 2/5: 70%|███████ | 133/189 [03:35<01:29, 1.61s/it, loss=10.32]Epoch 2/5: 71%|███████ | 134/189 [03:35<01:27, 1.60s/it, loss=10.32]Epoch 2/5: 71%|███████ | 134/189 [03:37<01:27, 1.60s/it, loss=10.37]Epoch 2/5: 71%|███████▏ | 135/189 [03:37<01:25, 1.58s/it, loss=10.37]Epoch 2/5: 71%|███████▏ | 135/189 [03:38<01:25, 1.58s/it, loss=10.20]Epoch 2/5: 72%|███████▏ | 136/189 [03:38<01:20, 1.52s/it, loss=10.20]Epoch 2/5: 72%|███████▏ | 136/189 [03:39<01:20, 1.52s/it, loss=10.31]Epoch 2/5: 72%|███████▏ | 137/189 [03:39<01:20, 1.54s/it, loss=10.31]Epoch 2/5: 72%|███████▏ | 137/189 [03:41<01:20, 1.54s/it, loss=10.19]Epoch 2/5: 73%|███████▎ | 138/189 [03:41<01:20, 1.57s/it, loss=10.19]Epoch 2/5: 73%|███████▎ | 138/189 [03:43<01:20, 1.57s/it, loss=10.21]Epoch 2/5: 74%|███████▎ | 139/189 [03:43<01:16, 1.52s/it, loss=10.21]Epoch 2/5: 74%|███████▎ | 139/189 [03:44<01:16, 1.52s/it, loss=10.06]Epoch 2/5: 74%|███████▍ | 140/189 [03:44<01:17, 1.58s/it, loss=10.06]Epoch 2/5: 74%|███████▍ | 140/189 [03:46<01:17, 1.58s/it, loss=10.29]Epoch 2/5: 75%|███████▍ | 141/189 [03:46<01:14, 1.55s/it, loss=10.29]Epoch 2/5: 75%|███████▍ | 141/189 [03:47<01:14, 1.55s/it, loss=10.24]Epoch 2/5: 75%|███████▌ | 142/189 [03:47<01:12, 1.55s/it, loss=10.24]Epoch 2/5: 75%|███████▌ | 142/189 [03:49<01:12, 1.55s/it, loss=10.08]Epoch 2/5: 76%|███████▌ | 143/189 [03:49<01:11, 1.56s/it, loss=10.08]Epoch 2/5: 76%|███████▌ | 143/189 [03:50<01:11, 1.56s/it, loss=10.15]Epoch 2/5: 76%|███████▌ | 144/189 [03:50<01:09, 1.55s/it, loss=10.15]Epoch 2/5: 76%|███████▌ | 144/189 [03:52<01:09, 1.55s/it, loss=10.10]Epoch 2/5: 77%|███████▋ | 145/189 [03:52<01:06, 1.51s/it, loss=10.10]Epoch 2/5: 77%|███████▋ | 145/189 [03:53<01:06, 1.51s/it, loss=10.13]Epoch 2/5: 77%|███████▋ | 146/189 [03:53<01:07, 1.56s/it, loss=10.13]Epoch 2/5: 77%|███████▋ | 146/189 [03:55<01:07, 1.56s/it, loss=10.02]Epoch 2/5: 78%|███████▊ | 147/189 [03:55<01:05, 1.55s/it, loss=10.02]Epoch 2/5: 78%|███████▊ | 147/189 [03:57<01:05, 1.55s/it, loss=10.05]Epoch 2/5: 78%|███████▊ | 148/189 [03:57<01:03, 1.56s/it, loss=10.05]Epoch 2/5: 78%|███████▊ | 148/189 [03:58<01:03, 1.56s/it, loss=9.91] Epoch 2/5: 79%|███████▉ | 149/189 [03:58<01:01, 1.54s/it, loss=9.91]Epoch 2/5: 79%|███████▉ | 149/189 [04:00<01:01, 1.54s/it, loss=10.09]Epoch 2/5: 79%|███████▉ | 150/189 [04:00<01:01, 1.57s/it, loss=10.09]Epoch 2/5: 79%|███████▉ | 150/189 [04:01<01:01, 1.57s/it, loss=10.13]Epoch 2/5: 80%|███████▉ | 151/189 [04:01<00:59, 1.56s/it, loss=10.13]Epoch 2/5: 80%|███████▉ | 151/189 [04:03<00:59, 1.56s/it, loss=10.09]Epoch 2/5: 80%|████████ | 152/189 [04:03<00:58, 1.59s/it, loss=10.09]Epoch 2/5: 80%|████████ | 152/189 [04:05<00:58, 1.59s/it, loss=10.00]Epoch 2/5: 81%|████████ | 153/189 [04:05<00:58, 1.63s/it, loss=10.00]Epoch 2/5: 81%|████████ | 153/189 [04:06<00:58, 1.63s/it, loss=10.14]Epoch 2/5: 81%|████████▏ | 154/189 [04:06<00:56, 1.62s/it, loss=10.14]Epoch 2/5: 81%|████████▏ | 154/189 [04:08<00:56, 1.62s/it, loss=10.18]Epoch 2/5: 82%|████████▏ | 155/189 [04:08<00:55, 1.64s/it, loss=10.18]Epoch 2/5: 82%|████████▏ | 155/189 [04:09<00:55, 1.64s/it, loss=10.18]Epoch 2/5: 83%|████████▎ | 156/189 [04:09<00:52, 1.60s/it, loss=10.18]Epoch 2/5: 83%|████████▎ | 156/189 [04:11<00:52, 1.60s/it, loss=9.97] Epoch 2/5: 83%|████████▎ | 157/189 [04:11<00:51, 1.61s/it, loss=9.97]Epoch 2/5: 83%|████████▎ | 157/189 [04:13<00:51, 1.61s/it, loss=9.97]Epoch 2/5: 84%|████████▎ | 158/189 [04:13<00:49, 1.60s/it, loss=9.97]Epoch 2/5: 84%|████████▎ | 158/189 [04:14<00:49, 1.60s/it, loss=9.90]Epoch 2/5: 84%|████████▍ | 159/189 [04:14<00:47, 1.58s/it, loss=9.90]Epoch 2/5: 84%|████████▍ | 159/189 [04:16<00:47, 1.58s/it, loss=10.04]Epoch 2/5: 85%|████████▍ | 160/189 [04:16<00:46, 1.60s/it, loss=10.04]Epoch 2/5: 85%|████████▍ | 160/189 [04:17<00:46, 1.60s/it, loss=10.13]Epoch 2/5: 85%|████████▌ | 161/189 [04:17<00:44, 1.60s/it, loss=10.13]Epoch 2/5: 85%|████████▌ | 161/189 [04:19<00:44, 1.60s/it, loss=9.84] Epoch 2/5: 86%|████████▌ | 162/189 [04:19<00:44, 1.66s/it, loss=9.84]Epoch 2/5: 86%|████████▌ | 162/189 [04:21<00:44, 1.66s/it, loss=10.01]Epoch 2/5: 86%|████████▌ | 163/189 [04:21<00:43, 1.66s/it, loss=10.01]Epoch 2/5: 86%|████████▌ | 163/189 [04:22<00:43, 1.66s/it, loss=9.88] Epoch 2/5: 87%|████████▋ | 164/189 [04:22<00:41, 1.64s/it, loss=9.88]Epoch 2/5: 87%|████████▋ | 164/189 [04:24<00:41, 1.64s/it, loss=9.91]Epoch 2/5: 87%|████████▋ | 165/189 [04:24<00:37, 1.55s/it, loss=9.91]Epoch 2/5: 87%|████████▋ | 165/189 [04:25<00:37, 1.55s/it, loss=9.69]Epoch 2/5: 88%|████████▊ | 166/189 [04:26<00:36, 1.59s/it, loss=9.69]Epoch 2/5: 88%|████████▊ | 166/189 [04:27<00:36, 1.59s/it, loss=9.80]Epoch 2/5: 88%|████████▊ | 167/189 [04:27<00:35, 1.63s/it, loss=9.80]Epoch 2/5: 88%|████████▊ | 167/189 [04:29<00:35, 1.63s/it, loss=9.80]Epoch 2/5: 89%|████████▉ | 168/189 [04:29<00:35, 1.67s/it, loss=9.80]Epoch 2/5: 89%|████████▉ | 168/189 [04:31<00:35, 1.67s/it, loss=9.77]Epoch 2/5: 89%|████████▉ | 169/189 [04:31<00:33, 1.67s/it, loss=9.77]Epoch 2/5: 89%|████████▉ | 169/189 [04:32<00:33, 1.67s/it, loss=9.89]Epoch 2/5: 90%|████████▉ | 170/189 [04:32<00:29, 1.57s/it, loss=9.89]Epoch 2/5: 90%|████████▉ | 170/189 [04:34<00:29, 1.57s/it, loss=9.91]Epoch 2/5: 90%|█████████ | 171/189 [04:34<00:28, 1.56s/it, loss=9.91]Epoch 2/5: 90%|█████████ | 171/189 [04:35<00:28, 1.56s/it, loss=9.86]Epoch 2/5: 91%|█████████ | 172/189 [04:35<00:27, 1.62s/it, loss=9.86]Epoch 2/5: 91%|█████████ | 172/189 [04:37<00:27, 1.62s/it, loss=9.86]Epoch 2/5: 92%|█████████▏| 173/189 [04:37<00:25, 1.58s/it, loss=9.86]Epoch 2/5: 92%|█████████▏| 173/189 [04:38<00:25, 1.58s/it, loss=9.79]Epoch 2/5: 92%|█████████▏| 174/189 [04:38<00:24, 1.61s/it, loss=9.79]Epoch 2/5: 92%|█████████▏| 174/189 [04:40<00:24, 1.61s/it, loss=9.75]Epoch 2/5: 93%|█████████▎| 175/189 [04:40<00:22, 1.64s/it, loss=9.75]Epoch 2/5: 93%|█████████▎| 175/189 [04:42<00:22, 1.64s/it, loss=9.79]Epoch 2/5: 93%|█████████▎| 176/189 [04:42<00:20, 1.60s/it, loss=9.79]Epoch 2/5: 93%|█████████▎| 176/189 [04:43<00:20, 1.60s/it, loss=9.85]Epoch 2/5: 94%|█████████▎| 177/189 [04:43<00:19, 1.61s/it, loss=9.85]Epoch 2/5: 94%|█████████▎| 177/189 [04:45<00:19, 1.61s/it, loss=9.70]Epoch 2/5: 94%|█████████▍| 178/189 [04:45<00:16, 1.54s/it, loss=9.70]Epoch 2/5: 94%|█████████▍| 178/189 [04:46<00:16, 1.54s/it, loss=9.86]Epoch 2/5: 95%|█████████▍| 179/189 [04:46<00:16, 1.60s/it, loss=9.86]Epoch 2/5: 95%|█████████▍| 179/189 [04:48<00:16, 1.60s/it, loss=9.78]Epoch 2/5: 95%|█████████▌| 180/189 [04:48<00:14, 1.64s/it, loss=9.78]Epoch 2/5: 95%|█████████▌| 180/189 [04:50<00:14, 1.64s/it, loss=9.79]Epoch 2/5: 96%|█████████▌| 181/189 [04:50<00:13, 1.64s/it, loss=9.79]Epoch 2/5: 96%|█████████▌| 181/189 [04:51<00:13, 1.64s/it, loss=9.75]Epoch 2/5: 96%|█████████▋| 182/189 [04:51<00:11, 1.58s/it, loss=9.75]Epoch 2/5: 96%|█████████▋| 182/189 [04:53<00:11, 1.58s/it, loss=9.74]Epoch 2/5: 97%|█████████▋| 183/189 [04:53<00:09, 1.52s/it, loss=9.74]Epoch 2/5: 97%|█████████▋| 183/189 [04:54<00:09, 1.52s/it, loss=9.68]Epoch 2/5: 97%|█████████▋| 184/189 [04:54<00:07, 1.58s/it, loss=9.68]Epoch 2/5: 97%|█████████▋| 184/189 [04:56<00:07, 1.58s/it, loss=9.75]Epoch 2/5: 98%|█████████▊| 185/189 [04:56<00:06, 1.63s/it, loss=9.75]Epoch 2/5: 98%|█████████▊| 185/189 [04:58<00:06, 1.63s/it, loss=9.68]Epoch 2/5: 98%|█████████▊| 186/189 [04:58<00:04, 1.66s/it, loss=9.68]Epoch 2/5: 98%|█████████▊| 186/189 [05:00<00:04, 1.66s/it, loss=9.66]Epoch 2/5: 99%|█████████▉| 187/189 [05:00<00:03, 1.68s/it, loss=9.66]Epoch 2/5: 99%|█████████▉| 187/189 [05:01<00:03, 1.68s/it, loss=9.63]Epoch 2/5: 99%|█████████▉| 188/189 [05:01<00:01, 1.63s/it, loss=9.63]Epoch 2/5: 99%|█████████▉| 188/189 [05:03<00:01, 1.63s/it, loss=9.46]Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00, 1.61s/it, loss=9.46]Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00, 1.60s/it, loss=9.46]
0%| | 0/23 [00:00<?, ?it/s] 4%|▍ | 1/23 [00:00<00:08, 2.53it/s] 9%|▊ | 2/23 [00:00<00:08, 2.54it/s] 13%|█▎ | 3/23 [00:01<00:07, 2.62it/s] 17%|█▋ | 4/23 [00:01<00:06, 2.95it/s] 22%|██▏ | 5/23 [00:01<00:06, 2.85it/s] 26%|██▌ | 6/23 [00:02<00:06, 2.82it/s] 30%|███ | 7/23 [00:02<00:05, 2.84it/s] 35%|███▍ | 8/23 [00:02<00:05, 2.98it/s] 39%|███▉ | 9/23 [00:03<00:04, 2.84it/s] 43%|████▎ | 10/23 [00:03<00:04, 2.79it/s] 48%|████▊ | 11/23 [00:03<00:04, 2.80it/s] 52%|█████▏ | 12/23 [00:04<00:03, 2.77it/s] 57%|█████▋ | 13/23 [00:04<00:03, 2.67it/s] 61%|██████ | 14/23 [00:04<00:03, 2.86it/s] 65%|██████▌ | 15/23 [00:05<00:02, 3.00it/s] 70%|██████▉ | 16/23 [00:05<00:02, 2.87it/s] 74%|███████▍ | 17/23 [00:06<00:02, 2.81it/s] 78%|███████▊ | 18/23 [00:06<00:01, 2.95it/s] 83%|████████▎ | 19/23 [00:06<00:01, 2.89it/s] 87%|████████▋ | 20/23 [00:07<00:01, 2.98it/s] 91%|█████████▏| 21/23 [00:07<00:00, 2.99it/s] 96%|█████████▌| 22/23 [00:07<00:00, 3.13it/s]100%|██████████| 23/23 [00:07<00:00, 3.31it/s]100%|██████████| 23/23 [00:07<00:00, 2.91it/s]
Epoch 2: train_loss=11.1752 | R@10=0.0129 | DCG@10=0.1115 | NDCG@10=0.0265
Epoch 3/5: 0%| | 0/189 [00:00<?, ?it/s]Epoch 3/5: 0%| | 0/189 [00:01<?, ?it/s, loss=9.65]Epoch 3/5: 1%| | 1/189 [00:01<05:10, 1.65s/it, loss=9.65]Epoch 3/5: 1%| | 1/189 [00:03<05:10, 1.65s/it, loss=9.52]Epoch 3/5: 1%| | 2/189 [00:03<05:03, 1.63s/it, loss=9.52]Epoch 3/5: 1%| | 2/189 [00:04<05:03, 1.63s/it, loss=9.52]Epoch 3/5: 2%|▏ | 3/189 [00:04<04:59, 1.61s/it, loss=9.52]Epoch 3/5: 2%|▏ | 3/189 [00:06<04:59, 1.61s/it, loss=9.53]Epoch 3/5: 2%|▏ | 4/189 [00:06<04:55, 1.60s/it, loss=9.53]Epoch 3/5: 2%|▏ | 4/189 [00:07<04:55, 1.60s/it, loss=9.53]Epoch 3/5: 3%|▎ | 5/189 [00:07<04:47, 1.56s/it, loss=9.53]Epoch 3/5: 3%|▎ | 5/189 [00:09<04:47, 1.56s/it, loss=9.52]Epoch 3/5: 3%|▎ | 6/189 [00:09<04:48, 1.58s/it, loss=9.52]Epoch 3/5: 3%|▎ | 6/189 [00:11<04:48, 1.58s/it, loss=9.50]Epoch 3/5: 4%|▎ | 7/189 [00:11<04:58, 1.64s/it, loss=9.50]Epoch 3/5: 4%|▎ | 7/189 [00:12<04:58, 1.64s/it, loss=9.53]Epoch 3/5: 4%|▍ | 8/189 [00:12<04:54, 1.63s/it, loss=9.53]Epoch 3/5: 4%|▍ | 8/189 [00:14<04:54, 1.63s/it, loss=9.64]Epoch 3/5: 5%|▍ | 9/189 [00:14<04:49, 1.61s/it, loss=9.64]Epoch 3/5: 5%|▍ | 9/189 [00:16<04:49, 1.61s/it, loss=9.59]Epoch 3/5: 5%|▌ | 10/189 [00:16<04:45, 1.59s/it, loss=9.59]Epoch 3/5: 5%|▌ | 10/189 [00:17<04:45, 1.59s/it, loss=9.43]Epoch 3/5: 6%|▌ | 11/189 [00:17<04:50, 1.63s/it, loss=9.43]Epoch 3/5: 6%|▌ | 11/189 [00:19<04:50, 1.63s/it, loss=9.32]Epoch 3/5: 6%|▋ | 12/189 [00:19<04:45, 1.61s/it, loss=9.32]Epoch 3/5: 6%|▋ | 12/189 [00:20<04:45, 1.61s/it, loss=9.40]Epoch 3/5: 7%|▋ | 13/189 [00:20<04:44, 1.61s/it, loss=9.40]Epoch 3/5: 7%|▋ | 13/189 [00:22<04:44, 1.61s/it, loss=9.55]Epoch 3/5: 7%|▋ | 14/189 [00:22<04:42, 1.61s/it, loss=9.55]Epoch 3/5: 7%|▋ | 14/189 [00:24<04:42, 1.61s/it, loss=9.57]Epoch 3/5: 8%|▊ | 15/189 [00:24<04:41, 1.62s/it, loss=9.57]Epoch 3/5: 8%|▊ | 15/189 [00:25<04:41, 1.62s/it, loss=9.39]Epoch 3/5: 8%|▊ | 16/189 [00:25<04:35, 1.59s/it, loss=9.39]Epoch 3/5: 8%|▊ | 16/189 [00:27<04:35, 1.59s/it, loss=9.52]Epoch 3/5: 9%|▉ | 17/189 [00:27<04:22, 1.53s/it, loss=9.52]Epoch 3/5: 9%|▉ | 17/189 [00:28<04:22, 1.53s/it, loss=9.48]Epoch 3/5: 10%|▉ | 18/189 [00:28<04:23, 1.54s/it, loss=9.48]Epoch 3/5: 10%|▉ | 18/189 [00:30<04:23, 1.54s/it, loss=9.33]Epoch 3/5: 10%|█ | 19/189 [00:30<04:30, 1.59s/it, loss=9.33]Epoch 3/5: 10%|█ | 19/189 [00:32<04:30, 1.59s/it, loss=9.42]Epoch 3/5: 11%|█ | 20/189 [00:32<04:33, 1.62s/it, loss=9.42]Epoch 3/5: 11%|█ | 20/189 [00:33<04:33, 1.62s/it, loss=9.43]Epoch 3/5: 11%|█ | 21/189 [00:33<04:37, 1.65s/it, loss=9.43]Epoch 3/5: 11%|█ | 21/189 [00:35<04:37, 1.65s/it, loss=9.48]Epoch 3/5: 12%|█▏ | 22/189 [00:35<04:22, 1.57s/it, loss=9.48]Epoch 3/5: 12%|█▏ | 22/189 [00:36<04:22, 1.57s/it, loss=9.58]Epoch 3/5: 12%|█▏ | 23/189 [00:36<04:14, 1.53s/it, loss=9.58]Epoch 3/5: 12%|█▏ | 23/189 [00:38<04:14, 1.53s/it, loss=9.31]Epoch 3/5: 13%|█▎ | 24/189 [00:38<04:20, 1.58s/it, loss=9.31]Epoch 3/5: 13%|█▎ | 24/189 [00:40<04:20, 1.58s/it, loss=9.34]Epoch 3/5: 13%|█▎ | 25/189 [00:40<04:26, 1.62s/it, loss=9.34]Epoch 3/5: 13%|█▎ | 25/189 [00:41<04:26, 1.62s/it, loss=9.42]Epoch 3/5: 14%|█▍ | 26/189 [00:41<04:24, 1.62s/it, loss=9.42]Epoch 3/5: 14%|█▍ | 26/189 [00:43<04:24, 1.62s/it, loss=9.40]Epoch 3/5: 14%|█▍ | 27/189 [00:43<04:23, 1.63s/it, loss=9.40]Epoch 3/5: 14%|█▍ | 27/189 [00:44<04:23, 1.63s/it, loss=9.49]Epoch 3/5: 15%|█▍ | 28/189 [00:44<04:24, 1.64s/it, loss=9.49]Epoch 3/5: 15%|█▍ | 28/189 [00:46<04:24, 1.64s/it, loss=9.20]Epoch 3/5: 15%|█▌ | 29/189 [00:46<04:21, 1.63s/it, loss=9.20]Epoch 3/5: 15%|█▌ | 29/189 [00:48<04:21, 1.63s/it, loss=9.21]Epoch 3/5: 16%|█▌ | 30/189 [00:48<04:19, 1.63s/it, loss=9.21]Epoch 3/5: 16%|█▌ | 30/189 [00:49<04:19, 1.63s/it, loss=9.41]Epoch 3/5: 16%|█▋ | 31/189 [00:49<04:13, 1.61s/it, loss=9.41]Epoch 3/5: 16%|█▋ | 31/189 [00:51<04:13, 1.61s/it, loss=9.42]Epoch 3/5: 17%|█▋ | 32/189 [00:51<04:09, 1.59s/it, loss=9.42]Epoch 3/5: 17%|█▋ | 32/189 [00:52<04:09, 1.59s/it, loss=9.21]Epoch 3/5: 17%|█▋ | 33/189 [00:52<04:07, 1.59s/it, loss=9.21]Epoch 3/5: 17%|█▋ | 33/189 [00:54<04:07, 1.59s/it, loss=9.26]Epoch 3/5: 18%|█▊ | 34/189 [00:54<03:58, 1.54s/it, loss=9.26]Epoch 3/5: 18%|█▊ | 34/189 [00:55<03:58, 1.54s/it, loss=9.36]Epoch 3/5: 19%|█▊ | 35/189 [00:55<04:03, 1.58s/it, loss=9.36]Epoch 3/5: 19%|█▊ | 35/189 [00:57<04:03, 1.58s/it, loss=9.28]Epoch 3/5: 19%|█▉ | 36/189 [00:57<03:59, 1.57s/it, loss=9.28]Epoch 3/5: 19%|█▉ | 36/189 [00:59<03:59, 1.57s/it, loss=9.32]Epoch 3/5: 20%|█▉ | 37/189 [00:59<03:58, 1.57s/it, loss=9.32]Epoch 3/5: 20%|█▉ | 37/189 [01:00<03:58, 1.57s/it, loss=9.22]Epoch 3/5: 20%|██ | 38/189 [01:00<03:54, 1.55s/it, loss=9.22]Epoch 3/5: 20%|██ | 38/189 [01:02<03:54, 1.55s/it, loss=9.31]Epoch 3/5: 21%|██ | 39/189 [01:02<03:56, 1.58s/it, loss=9.31]Epoch 3/5: 21%|██ | 39/189 [01:03<03:56, 1.58s/it, loss=9.29]Epoch 3/5: 21%|██ | 40/189 [01:03<04:00, 1.62s/it, loss=9.29]Epoch 3/5: 21%|██ | 40/189 [01:05<04:00, 1.62s/it, loss=8.99]Epoch 3/5: 22%|██▏ | 41/189 [01:05<03:58, 1.61s/it, loss=8.99]Epoch 3/5: 22%|██▏ | 41/189 [01:07<03:58, 1.61s/it, loss=9.22]Epoch 3/5: 22%|██▏ | 42/189 [01:07<04:01, 1.65s/it, loss=9.22]Epoch 3/5: 22%|██▏ | 42/189 [01:08<04:01, 1.65s/it, loss=9.40]Epoch 3/5: 23%|██▎ | 43/189 [01:08<04:02, 1.66s/it, loss=9.40]Epoch 3/5: 23%|██▎ | 43/189 [01:10<04:02, 1.66s/it, loss=9.22]Epoch 3/5: 23%|██▎ | 44/189 [01:10<03:57, 1.64s/it, loss=9.22]Epoch 3/5: 23%|██▎ | 44/189 [01:12<03:57, 1.64s/it, loss=9.08]Epoch 3/5: 24%|██▍ | 45/189 [01:12<03:53, 1.62s/it, loss=9.08]Epoch 3/5: 24%|██▍ | 45/189 [01:13<03:53, 1.62s/it, loss=9.07]Epoch 3/5: 24%|██▍ | 46/189 [01:13<03:50, 1.61s/it, loss=9.07]Epoch 3/5: 24%|██▍ | 46/189 [01:15<03:50, 1.61s/it, loss=9.15]Epoch 3/5: 25%|██▍ | 47/189 [01:15<03:48, 1.61s/it, loss=9.15]Epoch 3/5: 25%|██▍ | 47/189 [01:17<03:48, 1.61s/it, loss=9.08]Epoch 3/5: 25%|██▌ | 48/189 [01:17<03:51, 1.64s/it, loss=9.08]Epoch 3/5: 25%|██▌ | 48/189 [01:18<03:51, 1.64s/it, loss=9.02]Epoch 3/5: 26%|██▌ | 49/189 [01:18<03:53, 1.67s/it, loss=9.02]Epoch 3/5: 26%|██▌ | 49/189 [01:20<03:53, 1.67s/it, loss=9.21]Epoch 3/5: 26%|██▋ | 50/189 [01:20<03:46, 1.63s/it, loss=9.21]Epoch 3/5: 26%|██▋ | 50/189 [01:22<03:46, 1.63s/it, loss=9.07]Epoch 3/5: 27%|██▋ | 51/189 [01:22<03:47, 1.65s/it, loss=9.07]Epoch 3/5: 27%|██▋ | 51/189 [01:23<03:47, 1.65s/it, loss=9.21]Epoch 3/5: 28%|██▊ | 52/189 [01:23<03:46, 1.65s/it, loss=9.21]Epoch 3/5: 28%|██▊ | 52/189 [01:25<03:46, 1.65s/it, loss=9.07]Epoch 3/5: 28%|██▊ | 53/189 [01:25<03:49, 1.69s/it, loss=9.07]Epoch 3/5: 28%|██▊ | 53/189 [01:27<03:49, 1.69s/it, loss=8.92]Epoch 3/5: 29%|██▊ | 54/189 [01:27<03:47, 1.68s/it, loss=8.92]Epoch 3/5: 29%|██▊ | 54/189 [01:28<03:47, 1.68s/it, loss=8.94]Epoch 3/5: 29%|██▉ | 55/189 [01:28<03:43, 1.67s/it, loss=8.94]Epoch 3/5: 29%|██▉ | 55/189 [01:30<03:43, 1.67s/it, loss=9.11]Epoch 3/5: 30%|██▉ | 56/189 [01:30<03:42, 1.67s/it, loss=9.11]Epoch 3/5: 30%|██▉ | 56/189 [01:32<03:42, 1.67s/it, loss=8.98]Epoch 3/5: 30%|███ | 57/189 [01:32<03:43, 1.69s/it, loss=8.98]Epoch 3/5: 30%|███ | 57/189 [01:33<03:43, 1.69s/it, loss=9.16]Epoch 3/5: 31%|███ | 58/189 [01:33<03:41, 1.69s/it, loss=9.16]Epoch 3/5: 31%|███ | 58/189 [01:35<03:41, 1.69s/it, loss=9.05]Epoch 3/5: 31%|███ | 59/189 [01:35<03:45, 1.73s/it, loss=9.05]Epoch 3/5: 31%|███ | 59/189 [01:37<03:45, 1.73s/it, loss=9.19]Epoch 3/5: 32%|███▏ | 60/189 [01:37<03:33, 1.65s/it, loss=9.19]Epoch 3/5: 32%|███▏ | 60/189 [01:38<03:33, 1.65s/it, loss=9.11]Epoch 3/5: 32%|███▏ | 61/189 [01:38<03:30, 1.64s/it, loss=9.11]Epoch 3/5: 32%|███▏ | 61/189 [01:40<03:30, 1.64s/it, loss=8.92]Epoch 3/5: 33%|███▎ | 62/189 [01:40<03:20, 1.58s/it, loss=8.92]Epoch 3/5: 33%|███▎ | 62/189 [01:41<03:20, 1.58s/it, loss=8.90]Epoch 3/5: 33%|███▎ | 63/189 [01:41<03:24, 1.62s/it, loss=8.90]Epoch 3/5: 33%|███▎ | 63/189 [01:43<03:24, 1.62s/it, loss=9.07]Epoch 3/5: 34%|███▍ | 64/189 [01:43<03:26, 1.65s/it, loss=9.07]Epoch 3/5: 34%|███▍ | 64/189 [01:45<03:26, 1.65s/it, loss=8.88]Epoch 3/5: 34%|███▍ | 65/189 [01:45<03:21, 1.63s/it, loss=8.88]Epoch 3/5: 34%|███▍ | 65/189 [01:46<03:21, 1.63s/it, loss=9.10]Epoch 3/5: 35%|███▍ | 66/189 [01:46<03:21, 1.64s/it, loss=9.10]Epoch 3/5: 35%|███▍ | 66/189 [01:48<03:21, 1.64s/it, loss=8.83]Epoch 3/5: 35%|███▌ | 67/189 [01:48<03:09, 1.55s/it, loss=8.83]Epoch 3/5: 35%|███▌ | 67/189 [01:49<03:09, 1.55s/it, loss=8.89]Epoch 3/5: 36%|███▌ | 68/189 [01:49<03:09, 1.57s/it, loss=8.89]Epoch 3/5: 36%|███▌ | 68/189 [01:51<03:09, 1.57s/it, loss=9.25]Epoch 3/5: 37%|███▋ | 69/189 [01:51<03:07, 1.57s/it, loss=9.25]Epoch 3/5: 37%|███▋ | 69/189 [01:53<03:07, 1.57s/it, loss=9.03]Epoch 3/5: 37%|███▋ | 70/189 [01:53<03:11, 1.61s/it, loss=9.03]Epoch 3/5: 37%|███▋ | 70/189 [01:54<03:11, 1.61s/it, loss=9.02]Epoch 3/5: 38%|███▊ | 71/189 [01:54<03:10, 1.61s/it, loss=9.02]Epoch 3/5: 38%|███▊ | 71/189 [01:56<03:10, 1.61s/it, loss=8.99]Epoch 3/5: 38%|███▊ | 72/189 [01:56<03:05, 1.58s/it, loss=8.99]Epoch 3/5: 38%|███▊ | 72/189 [01:57<03:05, 1.58s/it, loss=9.13]Epoch 3/5: 39%|███▊ | 73/189 [01:57<03:06, 1.61s/it, loss=9.13]Epoch 3/5: 39%|███▊ | 73/189 [01:59<03:06, 1.61s/it, loss=8.98]Epoch 3/5: 39%|███▉ | 74/189 [01:59<03:03, 1.59s/it, loss=8.98]Epoch 3/5: 39%|███▉ | 74/189 [02:01<03:03, 1.59s/it, loss=8.89]Epoch 3/5: 40%|███▉ | 75/189 [02:01<03:06, 1.64s/it, loss=8.89]Epoch 3/5: 40%|███▉ | 75/189 [02:02<03:06, 1.64s/it, loss=8.95]Epoch 3/5: 40%|████ | 76/189 [02:02<03:08, 1.67s/it, loss=8.95]Epoch 3/5: 40%|████ | 76/189 [02:04<03:08, 1.67s/it, loss=8.99]Epoch 3/5: 41%|████ | 77/189 [02:04<03:09, 1.69s/it, loss=8.99]Epoch 3/5: 41%|████ | 77/189 [02:06<03:09, 1.69s/it, loss=8.89]Epoch 3/5: 41%|████▏ | 78/189 [02:06<03:03, 1.65s/it, loss=8.89]Epoch 3/5: 41%|████▏ | 78/189 [02:07<03:03, 1.65s/it, loss=8.82]Epoch 3/5: 42%|████▏ | 79/189 [02:07<03:00, 1.64s/it, loss=8.82]Epoch 3/5: 42%|████▏ | 79/189 [02:09<03:00, 1.64s/it, loss=8.82]Epoch 3/5: 42%|████▏ | 80/189 [02:09<03:01, 1.67s/it, loss=8.82]Epoch 3/5: 42%|████▏ | 80/189 [02:11<03:01, 1.67s/it, loss=8.84]Epoch 3/5: 43%|████▎ | 81/189 [02:11<02:56, 1.63s/it, loss=8.84]Epoch 3/5: 43%|████▎ | 81/189 [02:12<02:56, 1.63s/it, loss=8.93]Epoch 3/5: 43%|████▎ | 82/189 [02:12<02:47, 1.56s/it, loss=8.93]Epoch 3/5: 43%|████▎ | 82/189 [02:14<02:47, 1.56s/it, loss=9.03]Epoch 3/5: 44%|████▍ | 83/189 [02:14<02:48, 1.59s/it, loss=9.03]Epoch 3/5: 44%|████▍ | 83/189 [02:15<02:48, 1.59s/it, loss=8.86]Epoch 3/5: 44%|████▍ | 84/189 [02:15<02:47, 1.60s/it, loss=8.86]Epoch 3/5: 44%|████▍ | 84/189 [02:17<02:47, 1.60s/it, loss=8.88]Epoch 3/5: 45%|████▍ | 85/189 [02:17<02:48, 1.62s/it, loss=8.88]Epoch 3/5: 45%|████▍ | 85/189 [02:19<02:48, 1.62s/it, loss=8.93]Epoch 3/5: 46%|████▌ | 86/189 [02:19<02:46, 1.61s/it, loss=8.93]Epoch 3/5: 46%|████▌ | 86/189 [02:20<02:46, 1.61s/it, loss=8.93]Epoch 3/5: 46%|████▌ | 87/189 [02:20<02:46, 1.63s/it, loss=8.93]Epoch 3/5: 46%|████▌ | 87/189 [02:22<02:46, 1.63s/it, loss=8.96]Epoch 3/5: 47%|████▋ | 88/189 [02:22<02:41, 1.60s/it, loss=8.96]Epoch 3/5: 47%|████▋ | 88/189 [02:24<02:41, 1.60s/it, loss=8.91]Epoch 3/5: 47%|████▋ | 89/189 [02:24<02:43, 1.64s/it, loss=8.91]Epoch 3/5: 47%|████▋ | 89/189 [02:25<02:43, 1.64s/it, loss=8.89]Epoch 3/5: 48%|████▊ | 90/189 [02:25<02:37, 1.59s/it, loss=8.89]Epoch 3/5: 48%|████▊ | 90/189 [02:27<02:37, 1.59s/it, loss=8.83]Epoch 3/5: 48%|████▊ | 91/189 [02:27<02:36, 1.59s/it, loss=8.83]Epoch 3/5: 48%|████▊ | 91/189 [02:28<02:36, 1.59s/it, loss=8.94]Epoch 3/5: 49%|████▊ | 92/189 [02:28<02:32, 1.58s/it, loss=8.94]Epoch 3/5: 49%|████▊ | 92/189 [02:30<02:32, 1.58s/it, loss=8.79]Epoch 3/5: 49%|████▉ | 93/189 [02:30<02:29, 1.56s/it, loss=8.79]Epoch 3/5: 49%|████▉ | 93/189 [02:31<02:29, 1.56s/it, loss=8.89]Epoch 3/5: 50%|████▉ | 94/189 [02:31<02:29, 1.58s/it, loss=8.89]Epoch 3/5: 50%|████▉ | 94/189 [02:33<02:29, 1.58s/it, loss=8.75]Epoch 3/5: 50%|█████ | 95/189 [02:33<02:26, 1.56s/it, loss=8.75]Epoch 3/5: 50%|█████ | 95/189 [02:34<02:26, 1.56s/it, loss=8.76]Epoch 3/5: 51%|█████ | 96/189 [02:34<02:27, 1.59s/it, loss=8.76]Epoch 3/5: 51%|█████ | 96/189 [02:36<02:27, 1.59s/it, loss=8.77]Epoch 3/5: 51%|█████▏ | 97/189 [02:36<02:26, 1.59s/it, loss=8.77]Epoch 3/5: 51%|█████▏ | 97/189 [02:38<02:26, 1.59s/it, loss=8.86]Epoch 3/5: 52%|█████▏ | 98/189 [02:38<02:24, 1.59s/it, loss=8.86]Epoch 3/5: 52%|█████▏ | 98/189 [02:39<02:24, 1.59s/it, loss=8.92]Epoch 3/5: 52%|█████▏ | 99/189 [02:39<02:25, 1.62s/it, loss=8.92]Epoch 3/5: 52%|█████▏ | 99/189 [02:41<02:25, 1.62s/it, loss=8.67]Epoch 3/5: 53%|█████▎ | 100/189 [02:41<02:24, 1.62s/it, loss=8.67]Epoch 3/5: 53%|█████▎ | 100/189 [02:43<02:24, 1.62s/it, loss=8.75]Epoch 3/5: 53%|█████▎ | 101/189 [02:43<02:24, 1.64s/it, loss=8.75]Epoch 3/5: 53%|█████▎ | 101/189 [02:44<02:24, 1.64s/it, loss=8.76]Epoch 3/5: 54%|█████▍ | 102/189 [02:44<02:25, 1.67s/it, loss=8.76]Epoch 3/5: 54%|█████▍ | 102/189 [02:46<02:25, 1.67s/it, loss=8.76]Epoch 3/5: 54%|█████▍ | 103/189 [02:46<02:21, 1.65s/it, loss=8.76]Epoch 3/5: 54%|█████▍ | 103/189 [02:48<02:21, 1.65s/it, loss=8.65]Epoch 3/5: 55%|█████▌ | 104/189 [02:48<02:20, 1.65s/it, loss=8.65]Epoch 3/5: 55%|█████▌ | 104/189 [02:49<02:20, 1.65s/it, loss=8.71]Epoch 3/5: 56%|█████▌ | 105/189 [02:49<02:20, 1.68s/it, loss=8.71]Epoch 3/5: 56%|█████▌ | 105/189 [02:51<02:20, 1.68s/it, loss=8.88]Epoch 3/5: 56%|█████▌ | 106/189 [02:51<02:17, 1.66s/it, loss=8.88]Epoch 3/5: 56%|█████▌ | 106/189 [02:53<02:17, 1.66s/it, loss=8.73]Epoch 3/5: 57%|█████▋ | 107/189 [02:53<02:16, 1.66s/it, loss=8.73]Epoch 3/5: 57%|█████▋ | 107/189 [02:54<02:16, 1.66s/it, loss=8.87]Epoch 3/5: 57%|█████▋ | 108/189 [02:54<02:14, 1.66s/it, loss=8.87]Epoch 3/5: 57%|█████▋ | 108/189 [02:56<02:14, 1.66s/it, loss=8.65]Epoch 3/5: 58%|█████▊ | 109/189 [02:56<02:12, 1.65s/it, loss=8.65]Epoch 3/5: 58%|█████▊ | 109/189 [02:58<02:12, 1.65s/it, loss=8.85]Epoch 3/5: 58%|█████▊ | 110/189 [02:58<02:09, 1.63s/it, loss=8.85]Epoch 3/5: 58%|█████▊ | 110/189 [02:59<02:09, 1.63s/it, loss=8.91]Epoch 3/5: 59%|█████▊ | 111/189 [02:59<02:05, 1.61s/it, loss=8.91]Epoch 3/5: 59%|█████▊ | 111/189 [03:01<02:05, 1.61s/it, loss=8.76]Epoch 3/5: 59%|█████▉ | 112/189 [03:01<02:08, 1.66s/it, loss=8.76]Epoch 3/5: 59%|█████▉ | 112/189 [03:02<02:08, 1.66s/it, loss=8.75]Epoch 3/5: 60%|█████▉ | 113/189 [03:02<02:04, 1.63s/it, loss=8.75]Epoch 3/5: 60%|█████▉ | 113/189 [03:04<02:04, 1.63s/it, loss=8.72]Epoch 3/5: 60%|██████ | 114/189 [03:04<02:01, 1.62s/it, loss=8.72]Epoch 3/5: 60%|██████ | 114/189 [03:06<02:01, 1.62s/it, loss=8.81]Epoch 3/5: 61%|██████ | 115/189 [03:06<02:00, 1.62s/it, loss=8.81]Epoch 3/5: 61%|██████ | 115/189 [03:07<02:00, 1.62s/it, loss=8.62]Epoch 3/5: 61%|██████▏ | 116/189 [03:07<02:01, 1.66s/it, loss=8.62]Epoch 3/5: 61%|██████▏ | 116/189 [03:09<02:01, 1.66s/it, loss=8.70]Epoch 3/5: 62%|██████▏ | 117/189 [03:09<02:00, 1.67s/it, loss=8.70]Epoch 3/5: 62%|██████▏ | 117/189 [03:11<02:00, 1.67s/it, loss=8.71]Epoch 3/5: 62%|██████▏ | 118/189 [03:11<01:58, 1.66s/it, loss=8.71]Epoch 3/5: 62%|██████▏ | 118/189 [03:12<01:58, 1.66s/it, loss=8.60]Epoch 3/5: 63%|██████▎ | 119/189 [03:12<01:55, 1.65s/it, loss=8.60]Epoch 3/5: 63%|██████▎ | 119/189 [03:14<01:55, 1.65s/it, loss=8.59]Epoch 3/5: 63%|██████▎ | 120/189 [03:14<01:51, 1.61s/it, loss=8.59]Epoch 3/5: 63%|██████▎ | 120/189 [03:16<01:51, 1.61s/it, loss=8.82]Epoch 3/5: 64%|██████▍ | 121/189 [03:16<01:53, 1.68s/it, loss=8.82]Epoch 3/5: 64%|██████▍ | 121/189 [03:17<01:53, 1.68s/it, loss=8.72]Epoch 3/5: 65%|██████▍ | 122/189 [03:17<01:54, 1.71s/it, loss=8.72]Epoch 3/5: 65%|██████▍ | 122/189 [03:19<01:54, 1.71s/it, loss=8.76]Epoch 3/5: 65%|██████▌ | 123/189 [03:19<01:52, 1.71s/it, loss=8.76]Epoch 3/5: 65%|██████▌ | 123/189 [03:21<01:52, 1.71s/it, loss=8.72]Epoch 3/5: 66%|██████▌ | 124/189 [03:21<01:48, 1.67s/it, loss=8.72]Epoch 3/5: 66%|██████▌ | 124/189 [03:22<01:48, 1.67s/it, loss=8.46]Epoch 3/5: 66%|██████▌ | 125/189 [03:22<01:47, 1.68s/it, loss=8.46]Epoch 3/5: 66%|██████▌ | 125/189 [03:24<01:47, 1.68s/it, loss=8.73]Epoch 3/5: 67%|██████▋ | 126/189 [03:24<01:44, 1.66s/it, loss=8.73]Epoch 3/5: 67%|██████▋ | 126/189 [03:26<01:44, 1.66s/it, loss=8.50]Epoch 3/5: 67%|██████▋ | 127/189 [03:26<01:44, 1.68s/it, loss=8.50]Epoch 3/5: 67%|██████▋ | 127/189 [03:27<01:44, 1.68s/it, loss=8.54]Epoch 3/5: 68%|██████▊ | 128/189 [03:27<01:40, 1.66s/it, loss=8.54]Epoch 3/5: 68%|██████▊ | 128/189 [03:29<01:40, 1.66s/it, loss=8.70]Epoch 3/5: 68%|██████▊ | 129/189 [03:29<01:37, 1.63s/it, loss=8.70]Epoch 3/5: 68%|██████▊ | 129/189 [03:31<01:37, 1.63s/it, loss=8.40]Epoch 3/5: 69%|██████▉ | 130/189 [03:31<01:35, 1.62s/it, loss=8.40]Epoch 3/5: 69%|██████▉ | 130/189 [03:32<01:35, 1.62s/it, loss=8.65]Epoch 3/5: 69%|██████▉ | 131/189 [03:32<01:32, 1.60s/it, loss=8.65]Epoch 3/5: 69%|██████▉ | 131/189 [03:34<01:32, 1.60s/it, loss=8.54]Epoch 3/5: 70%|██████▉ | 132/189 [03:34<01:31, 1.61s/it, loss=8.54]Epoch 3/5: 70%|██████▉ | 132/189 [03:35<01:31, 1.61s/it, loss=8.54]Epoch 3/5: 70%|███████ | 133/189 [03:35<01:31, 1.63s/it, loss=8.54]Epoch 3/5: 70%|███████ | 133/189 [03:37<01:31, 1.63s/it, loss=8.56]Epoch 3/5: 71%|███████ | 134/189 [03:37<01:25, 1.55s/it, loss=8.56]Epoch 3/5: 71%|███████ | 134/189 [03:38<01:25, 1.55s/it, loss=8.48]Epoch 3/5: 71%|███████▏ | 135/189 [03:38<01:24, 1.57s/it, loss=8.48]Epoch 3/5: 71%|███████▏ | 135/189 [03:40<01:24, 1.57s/it, loss=8.60]Epoch 3/5: 72%|███████▏ | 136/189 [03:40<01:21, 1.54s/it, loss=8.60]Epoch 3/5: 72%|███████▏ | 136/189 [03:42<01:21, 1.54s/it, loss=8.63]Epoch 3/5: 72%|███████▏ | 137/189 [03:42<01:23, 1.61s/it, loss=8.63]Epoch 3/5: 72%|███████▏ | 137/189 [03:43<01:23, 1.61s/it, loss=8.64]Epoch 3/5: 73%|███████▎ | 138/189 [03:43<01:21, 1.59s/it, loss=8.64]Epoch 3/5: 73%|███████▎ | 138/189 [03:45<01:21, 1.59s/it, loss=8.64]Epoch 3/5: 74%|███████▎ | 139/189 [03:45<01:20, 1.60s/it, loss=8.64]Epoch 3/5: 74%|███████▎ | 139/189 [03:47<01:20, 1.60s/it, loss=8.50]Epoch 3/5: 74%|███████▍ | 140/189 [03:47<01:19, 1.62s/it, loss=8.50]Epoch 3/5: 74%|███████▍ | 140/189 [03:48<01:19, 1.62s/it, loss=8.47]Epoch 3/5: 75%|███████▍ | 141/189 [03:48<01:17, 1.62s/it, loss=8.47]Epoch 3/5: 75%|███████▍ | 141/189 [03:50<01:17, 1.62s/it, loss=8.59]Epoch 3/5: 75%|███████▌ | 142/189 [03:50<01:15, 1.62s/it, loss=8.59]Epoch 3/5: 75%|███████▌ | 142/189 [03:51<01:15, 1.62s/it, loss=8.58]Epoch 3/5: 76%|███████▌ | 143/189 [03:51<01:13, 1.59s/it, loss=8.58]Epoch 3/5: 76%|███████▌ | 143/189 [03:53<01:13, 1.59s/it, loss=8.39]Epoch 3/5: 76%|███████▌ | 144/189 [03:53<01:07, 1.51s/it, loss=8.39]Epoch 3/5: 76%|███████▌ | 144/189 [03:54<01:07, 1.51s/it, loss=8.46]Epoch 3/5: 77%|███████▋ | 145/189 [03:54<01:04, 1.47s/it, loss=8.46]Epoch 3/5: 77%|███████▋ | 145/189 [03:56<01:04, 1.47s/it, loss=8.59]Epoch 3/5: 77%|███████▋ | 146/189 [03:56<01:05, 1.52s/it, loss=8.59]Epoch 3/5: 77%|███████▋ | 146/189 [03:57<01:05, 1.52s/it, loss=8.44]Epoch 3/5: 78%|███████▊ | 147/189 [03:57<01:05, 1.56s/it, loss=8.44]Epoch 3/5: 78%|███████▊ | 147/189 [03:59<01:05, 1.56s/it, loss=8.39]Epoch 3/5: 78%|███████▊ | 148/189 [03:59<01:06, 1.63s/it, loss=8.39]Epoch 3/5: 78%|███████▊ | 148/189 [04:01<01:06, 1.63s/it, loss=8.53]Epoch 3/5: 79%|███████▉ | 149/189 [04:01<01:05, 1.65s/it, loss=8.53]Epoch 3/5: 79%|███████▉ | 149/189 [04:02<01:05, 1.65s/it, loss=8.46]Epoch 3/5: 79%|███████▉ | 150/189 [04:02<01:03, 1.63s/it, loss=8.46]Epoch 3/5: 79%|███████▉ | 150/189 [04:04<01:03, 1.63s/it, loss=8.47]Epoch 3/5: 80%|███████▉ | 151/189 [04:04<01:00, 1.60s/it, loss=8.47]Epoch 3/5: 80%|███████▉ | 151/189 [04:06<01:00, 1.60s/it, loss=8.41]Epoch 3/5: 80%|████████ | 152/189 [04:06<01:00, 1.63s/it, loss=8.41]Epoch 3/5: 80%|████████ | 152/189 [04:07<01:00, 1.63s/it, loss=8.49]Epoch 3/5: 81%|████████ | 153/189 [04:07<00:58, 1.64s/it, loss=8.49]Epoch 3/5: 81%|████████ | 153/189 [04:09<00:58, 1.64s/it, loss=8.54]Epoch 3/5: 81%|████████▏ | 154/189 [04:09<00:56, 1.61s/it, loss=8.54]Epoch 3/5: 81%|████████▏ | 154/189 [04:10<00:56, 1.61s/it, loss=8.40]Epoch 3/5: 82%|████████▏ | 155/189 [04:10<00:55, 1.63s/it, loss=8.40]Epoch 3/5: 82%|████████▏ | 155/189 [04:12<00:55, 1.63s/it, loss=8.48]Epoch 3/5: 83%|████████▎ | 156/189 [04:12<00:52, 1.61s/it, loss=8.48]Epoch 3/5: 83%|████████▎ | 156/189 [04:14<00:52, 1.61s/it, loss=8.55]Epoch 3/5: 83%|████████▎ | 157/189 [04:14<00:51, 1.59s/it, loss=8.55]Epoch 3/5: 83%|████████▎ | 157/189 [04:15<00:51, 1.59s/it, loss=8.48]Epoch 3/5: 84%|████████▎ | 158/189 [04:15<00:48, 1.57s/it, loss=8.48]Epoch 3/5: 84%|████████▎ | 158/189 [04:17<00:48, 1.57s/it, loss=8.35]Epoch 3/5: 84%|████████▍ | 159/189 [04:17<00:47, 1.59s/it, loss=8.35]Epoch 3/5: 84%|████████▍ | 159/189 [04:18<00:47, 1.59s/it, loss=8.38]Epoch 3/5: 85%|████████▍ | 160/189 [04:18<00:46, 1.60s/it, loss=8.38]Epoch 3/5: 85%|████████▍ | 160/189 [04:20<00:46, 1.60s/it, loss=8.43]Epoch 3/5: 85%|████████▌ | 161/189 [04:20<00:44, 1.61s/it, loss=8.43]Epoch 3/5: 85%|████████▌ | 161/189 [04:22<00:44, 1.61s/it, loss=8.36]Epoch 3/5: 86%|████████▌ | 162/189 [04:22<00:42, 1.59s/it, loss=8.36]Epoch 3/5: 86%|████████▌ | 162/189 [04:23<00:42, 1.59s/it, loss=8.53]Epoch 3/5: 86%|████████▌ | 163/189 [04:23<00:40, 1.57s/it, loss=8.53]Epoch 3/5: 86%|████████▌ | 163/189 [04:25<00:40, 1.57s/it, loss=8.36]Epoch 3/5: 87%|████████▋ | 164/189 [04:25<00:39, 1.58s/it, loss=8.36]Epoch 3/5: 87%|████████▋ | 164/189 [04:26<00:39, 1.58s/it, loss=8.48]Epoch 3/5: 87%|████████▋ | 165/189 [04:26<00:38, 1.61s/it, loss=8.48]Epoch 3/5: 87%|████████▋ | 165/189 [04:28<00:38, 1.61s/it, loss=8.53]Epoch 3/5: 88%|████████▊ | 166/189 [04:28<00:37, 1.63s/it, loss=8.53]Epoch 3/5: 88%|████████▊ | 166/189 [04:30<00:37, 1.63s/it, loss=8.50]Epoch 3/5: 88%|████████▊ | 167/189 [04:30<00:36, 1.65s/it, loss=8.50]Epoch 3/5: 88%|████████▊ | 167/189 [04:31<00:36, 1.65s/it, loss=8.52]Epoch 3/5: 89%|████████▉ | 168/189 [04:31<00:34, 1.64s/it, loss=8.52]Epoch 3/5: 89%|████████▉ | 168/189 [04:33<00:34, 1.64s/it, loss=8.48]Epoch 3/5: 89%|████████▉ | 169/189 [04:33<00:32, 1.62s/it, loss=8.48]Epoch 3/5: 89%|████████▉ | 169/189 [04:35<00:32, 1.62s/it, loss=8.49]Epoch 3/5: 90%|████████▉ | 170/189 [04:35<00:30, 1.63s/it, loss=8.49]Epoch 3/5: 90%|████████▉ | 170/189 [04:36<00:30, 1.63s/it, loss=8.54]Epoch 3/5: 90%|█████████ | 171/189 [04:36<00:28, 1.60s/it, loss=8.54]Epoch 3/5: 90%|█████████ | 171/189 [04:38<00:28, 1.60s/it, loss=8.48]Epoch 3/5: 91%|█████████ | 172/189 [04:38<00:27, 1.61s/it, loss=8.48]Epoch 3/5: 91%|█████████ | 172/189 [04:39<00:27, 1.61s/it, loss=8.46]Epoch 3/5: 92%|█████████▏| 173/189 [04:39<00:25, 1.60s/it, loss=8.46]Epoch 3/5: 92%|█████████▏| 173/189 [04:41<00:25, 1.60s/it, loss=8.47]Epoch 3/5: 92%|█████████▏| 174/189 [04:41<00:23, 1.60s/it, loss=8.47]Epoch 3/5: 92%|█████████▏| 174/189 [04:42<00:23, 1.60s/it, loss=8.33]Epoch 3/5: 93%|█████████▎| 175/189 [04:42<00:22, 1.58s/it, loss=8.33]Epoch 3/5: 93%|█████████▎| 175/189 [04:44<00:22, 1.58s/it, loss=8.39]Epoch 3/5: 93%|█████████▎| 176/189 [04:44<00:20, 1.57s/it, loss=8.39]Epoch 3/5: 93%|█████████▎| 176/189 [04:45<00:20, 1.57s/it, loss=8.47]Epoch 3/5: 94%|█████████▎| 177/189 [04:45<00:18, 1.56s/it, loss=8.47]Epoch 3/5: 94%|█████████▎| 177/189 [04:47<00:18, 1.56s/it, loss=8.23]Epoch 3/5: 94%|█████████▍| 178/189 [04:47<00:17, 1.61s/it, loss=8.23]Epoch 3/5: 94%|█████████▍| 178/189 [04:49<00:17, 1.61s/it, loss=8.32]Epoch 3/5: 95%|█████████▍| 179/189 [04:49<00:16, 1.63s/it, loss=8.32]Epoch 3/5: 95%|█████████▍| 179/189 [04:50<00:16, 1.63s/it, loss=8.36]Epoch 3/5: 95%|█████████▌| 180/189 [04:50<00:14, 1.61s/it, loss=8.36]Epoch 3/5: 95%|█████████▌| 180/189 [04:52<00:14, 1.61s/it, loss=8.42]Epoch 3/5: 96%|█████████▌| 181/189 [04:52<00:12, 1.59s/it, loss=8.42]Epoch 3/5: 96%|█████████▌| 181/189 [04:54<00:12, 1.59s/it, loss=8.34]Epoch 3/5: 96%|█████████▋| 182/189 [04:54<00:11, 1.58s/it, loss=8.34]Epoch 3/5: 96%|█████████▋| 182/189 [04:55<00:11, 1.58s/it, loss=8.30]Epoch 3/5: 97%|█████████▋| 183/189 [04:55<00:09, 1.61s/it, loss=8.30]Epoch 3/5: 97%|█████████▋| 183/189 [04:57<00:09, 1.61s/it, loss=8.43]Epoch 3/5: 97%|█████████▋| 184/189 [04:57<00:07, 1.59s/it, loss=8.43]Epoch 3/5: 97%|█████████▋| 184/189 [04:58<00:07, 1.59s/it, loss=8.47]Epoch 3/5: 98%|█████████▊| 185/189 [04:58<00:06, 1.56s/it, loss=8.47]Epoch 3/5: 98%|█████████▊| 185/189 [05:00<00:06, 1.56s/it, loss=8.35]Epoch 3/5: 98%|█████████▊| 186/189 [05:00<00:04, 1.50s/it, loss=8.35]Epoch 3/5: 98%|█████████▊| 186/189 [05:01<00:04, 1.50s/it, loss=8.30]Epoch 3/5: 99%|█████████▉| 187/189 [05:01<00:03, 1.57s/it, loss=8.30]Epoch 3/5: 99%|█████████▉| 187/189 [05:03<00:03, 1.57s/it, loss=8.30]Epoch 3/5: 99%|█████████▉| 188/189 [05:03<00:01, 1.59s/it, loss=8.30]Epoch 3/5: 99%|█████████▉| 188/189 [05:05<00:01, 1.59s/it, loss=8.23]Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00, 1.57s/it, loss=8.23]Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00, 1.61s/it, loss=8.23]
0%| | 0/23 [00:00<?, ?it/s] 4%|▍ | 1/23 [00:00<00:05, 3.91it/s] 9%|▊ | 2/23 [00:00<00:06, 3.04it/s] 13%|█▎ | 3/23 [00:00<00:06, 2.91it/s] 17%|█▋ | 4/23 [00:01<00:06, 3.04it/s] 22%|██▏ | 5/23 [00:01<00:06, 2.85it/s] 26%|██▌ | 6/23 [00:02<00:05, 2.97it/s] 30%|███ | 7/23 [00:02<00:05, 3.07it/s] 35%|███▍ | 8/23 [00:02<00:05, 2.98it/s] 39%|███▉ | 9/23 [00:03<00:04, 2.83it/s] 43%|████▎ | 10/23 [00:03<00:04, 2.87it/s] 48%|████▊ | 11/23 [00:03<00:04, 2.68it/s] 52%|█████▏ | 12/23 [00:04<00:04, 2.66it/s] 57%|█████▋ | 13/23 [00:04<00:03, 2.74it/s] 61%|██████ | 14/23 [00:04<00:03, 2.95it/s] 65%|██████▌ | 15/23 [00:05<00:02, 2.91it/s] 70%|██████▉ | 16/23 [00:05<00:02, 2.85it/s] 74%|███████▍ | 17/23 [00:05<00:02, 2.74it/s] 78%|███████▊ | 18/23 [00:06<00:01, 2.87it/s] 83%|████████▎ | 19/23 [00:06<00:01, 2.83it/s] 87%|████████▋ | 20/23 [00:06<00:01, 2.78it/s] 91%|█████████▏| 21/23 [00:07<00:00, 2.83it/s] 96%|█████████▌| 22/23 [00:07<00:00, 2.79it/s]100%|██████████| 23/23 [00:07<00:00, 2.96it/s]100%|██████████| 23/23 [00:07<00:00, 2.88it/s]
Epoch 3: train_loss=8.8681 | R@10=0.0193 | DCG@10=0.2030 | NDCG@10=0.0485
Epoch 4/5: 0%| | 0/189 [00:00<?, ?it/s]Epoch 4/5: 0%| | 0/189 [00:01<?, ?it/s, loss=8.34]Epoch 4/5: 1%| | 1/189 [00:01<04:53, 1.56s/it, loss=8.34]Epoch 4/5: 1%| | 1/189 [00:03<04:53, 1.56s/it, loss=8.29]Epoch 4/5: 1%| | 2/189 [00:03<04:55, 1.58s/it, loss=8.29]Epoch 4/5: 1%| | 2/189 [00:04<04:55, 1.58s/it, loss=8.42]Epoch 4/5: 2%|▏ | 3/189 [00:04<05:03, 1.63s/it, loss=8.42]Epoch 4/5: 2%|▏ | 3/189 [00:06<05:03, 1.63s/it, loss=8.33]Epoch 4/5: 2%|▏ | 4/189 [00:06<05:07, 1.66s/it, loss=8.33]Epoch 4/5: 2%|▏ | 4/189 [00:08<05:07, 1.66s/it, loss=8.42]Epoch 4/5: 3%|▎ | 5/189 [00:08<05:05, 1.66s/it, loss=8.42]Epoch 4/5: 3%|▎ | 5/189 [00:09<05:05, 1.66s/it, loss=8.43]Epoch 4/5: 3%|▎ | 6/189 [00:09<05:00, 1.64s/it, loss=8.43]Epoch 4/5: 3%|▎ | 6/189 [00:11<05:00, 1.64s/it, loss=8.19]Epoch 4/5: 4%|▎ | 7/189 [00:11<04:49, 1.59s/it, loss=8.19]Epoch 4/5: 4%|▎ | 7/189 [00:12<04:49, 1.59s/it, loss=8.26]Epoch 4/5: 4%|▍ | 8/189 [00:12<04:48, 1.59s/it, loss=8.26]Epoch 4/5: 4%|▍ | 8/189 [00:14<04:48, 1.59s/it, loss=8.38]Epoch 4/5: 5%|▍ | 9/189 [00:14<04:51, 1.62s/it, loss=8.38]Epoch 4/5: 5%|▍ | 9/189 [00:16<04:51, 1.62s/it, loss=8.28]Epoch 4/5: 5%|▌ | 10/189 [00:16<04:48, 1.61s/it, loss=8.28]Epoch 4/5: 5%|▌ | 10/189 [00:17<04:48, 1.61s/it, loss=8.32]Epoch 4/5: 6%|▌ | 11/189 [00:17<04:50, 1.63s/it, loss=8.32]Epoch 4/5: 6%|▌ | 11/189 [00:19<04:50, 1.63s/it, loss=8.37]Epoch 4/5: 6%|▋ | 12/189 [00:19<04:53, 1.66s/it, loss=8.37]Epoch 4/5: 6%|▋ | 12/189 [00:21<04:53, 1.66s/it, loss=8.25]Epoch 4/5: 7%|▋ | 13/189 [00:21<04:47, 1.64s/it, loss=8.25]Epoch 4/5: 7%|▋ | 13/189 [00:22<04:47, 1.64s/it, loss=8.36]Epoch 4/5: 7%|▋ | 14/189 [00:22<04:41, 1.61s/it, loss=8.36]Epoch 4/5: 7%|▋ | 14/189 [00:24<04:41, 1.61s/it, loss=8.40]Epoch 4/5: 8%|▊ | 15/189 [00:24<04:34, 1.58s/it, loss=8.40]Epoch 4/5: 8%|▊ | 15/189 [00:25<04:34, 1.58s/it, loss=8.28]Epoch 4/5: 8%|▊ | 16/189 [00:25<04:38, 1.61s/it, loss=8.28]Epoch 4/5: 8%|▊ | 16/189 [00:27<04:38, 1.61s/it, loss=8.17]Epoch 4/5: 9%|▉ | 17/189 [00:27<04:37, 1.61s/it, loss=8.17]Epoch 4/5: 9%|▉ | 17/189 [00:29<04:37, 1.61s/it, loss=8.30]Epoch 4/5: 10%|▉ | 18/189 [00:29<04:36, 1.61s/it, loss=8.30]Epoch 4/5: 10%|▉ | 18/189 [00:30<04:36, 1.61s/it, loss=8.25]Epoch 4/5: 10%|█ | 19/189 [00:30<04:35, 1.62s/it, loss=8.25]Epoch 4/5: 10%|█ | 19/189 [00:32<04:35, 1.62s/it, loss=8.19]Epoch 4/5: 11%|█ | 20/189 [00:32<04:30, 1.60s/it, loss=8.19]Epoch 4/5: 11%|█ | 20/189 [00:34<04:30, 1.60s/it, loss=8.09]Epoch 4/5: 11%|█ | 21/189 [00:34<04:33, 1.63s/it, loss=8.09]Epoch 4/5: 11%|█ | 21/189 [00:35<04:33, 1.63s/it, loss=8.21]Epoch 4/5: 12%|█▏ | 22/189 [00:35<04:29, 1.61s/it, loss=8.21]Epoch 4/5: 12%|█▏ | 22/189 [00:37<04:29, 1.61s/it, loss=8.21]Epoch 4/5: 12%|█▏ | 23/189 [00:37<04:24, 1.59s/it, loss=8.21]Epoch 4/5: 12%|█▏ | 23/189 [00:38<04:24, 1.59s/it, loss=8.20]Epoch 4/5: 13%|█▎ | 24/189 [00:38<04:30, 1.64s/it, loss=8.20]Epoch 4/5: 13%|█▎ | 24/189 [00:40<04:30, 1.64s/it, loss=8.22]Epoch 4/5: 13%|█▎ | 25/189 [00:40<04:30, 1.65s/it, loss=8.22]Epoch 4/5: 13%|█▎ | 25/189 [00:42<04:30, 1.65s/it, loss=8.23]Epoch 4/5: 14%|█▍ | 26/189 [00:42<04:26, 1.63s/it, loss=8.23]Epoch 4/5: 14%|█▍ | 26/189 [00:43<04:26, 1.63s/it, loss=8.23]Epoch 4/5: 14%|█▍ | 27/189 [00:43<04:13, 1.57s/it, loss=8.23]Epoch 4/5: 14%|█▍ | 27/189 [00:45<04:13, 1.57s/it, loss=8.28]Epoch 4/5: 15%|█▍ | 28/189 [00:45<04:15, 1.58s/it, loss=8.28]Epoch 4/5: 15%|█▍ | 28/189 [00:46<04:15, 1.58s/it, loss=8.15]Epoch 4/5: 15%|█▌ | 29/189 [00:46<04:04, 1.53s/it, loss=8.15]Epoch 4/5: 15%|█▌ | 29/189 [00:48<04:04, 1.53s/it, loss=8.05]Epoch 4/5: 16%|█▌ | 30/189 [00:48<04:04, 1.54s/it, loss=8.05]Epoch 4/5: 16%|█▌ | 30/189 [00:49<04:04, 1.54s/it, loss=8.17]Epoch 4/5: 16%|█▋ | 31/189 [00:49<04:05, 1.55s/it, loss=8.17]Epoch 4/5: 16%|█▋ | 31/189 [00:51<04:05, 1.55s/it, loss=8.15]Epoch 4/5: 17%|█▋ | 32/189 [00:51<04:10, 1.59s/it, loss=8.15]Epoch 4/5: 17%|█▋ | 32/189 [00:53<04:10, 1.59s/it, loss=8.12]Epoch 4/5: 17%|█▋ | 33/189 [00:53<04:09, 1.60s/it, loss=8.12]Epoch 4/5: 17%|█▋ | 33/189 [00:54<04:09, 1.60s/it, loss=8.06]Epoch 4/5: 18%|█▊ | 34/189 [00:54<04:11, 1.62s/it, loss=8.06]Epoch 4/5: 18%|█▊ | 34/189 [00:56<04:11, 1.62s/it, loss=8.15]Epoch 4/5: 19%|█▊ | 35/189 [00:56<04:10, 1.63s/it, loss=8.15]Epoch 4/5: 19%|█▊ | 35/189 [00:58<04:10, 1.63s/it, loss=8.18]Epoch 4/5: 19%|█▉ | 36/189 [00:58<04:10, 1.64s/it, loss=8.18]Epoch 4/5: 19%|█▉ | 36/189 [00:59<04:10, 1.64s/it, loss=8.28]Epoch 4/5: 20%|█▉ | 37/189 [00:59<03:52, 1.53s/it, loss=8.28]Epoch 4/5: 20%|█▉ | 37/189 [01:00<03:52, 1.53s/it, loss=8.14]Epoch 4/5: 20%|██ | 38/189 [01:00<03:50, 1.53s/it, loss=8.14]Epoch 4/5: 20%|██ | 38/189 [01:02<03:50, 1.53s/it, loss=8.13]Epoch 4/5: 21%|██ | 39/189 [01:02<03:50, 1.54s/it, loss=8.13]Epoch 4/5: 21%|██ | 39/189 [01:04<03:50, 1.54s/it, loss=8.22]Epoch 4/5: 21%|██ | 40/189 [01:04<03:53, 1.57s/it, loss=8.22]Epoch 4/5: 21%|██ | 40/189 [01:05<03:53, 1.57s/it, loss=8.24]Epoch 4/5: 22%|██▏ | 41/189 [01:05<03:55, 1.59s/it, loss=8.24]Epoch 4/5: 22%|██▏ | 41/189 [01:07<03:55, 1.59s/it, loss=8.19]Epoch 4/5: 22%|██▏ | 42/189 [01:07<03:51, 1.57s/it, loss=8.19]Epoch 4/5: 22%|██▏ | 42/189 [01:08<03:51, 1.57s/it, loss=7.95]Epoch 4/5: 23%|██▎ | 43/189 [01:08<03:49, 1.57s/it, loss=7.95]Epoch 4/5: 23%|██▎ | 43/189 [01:10<03:49, 1.57s/it, loss=8.04]Epoch 4/5: 23%|██▎ | 44/189 [01:10<03:42, 1.53s/it, loss=8.04]Epoch 4/5: 23%|██▎ | 44/189 [01:11<03:42, 1.53s/it, loss=8.11]Epoch 4/5: 24%|██▍ | 45/189 [01:11<03:48, 1.59s/it, loss=8.11]Epoch 4/5: 24%|██▍ | 45/189 [01:13<03:48, 1.59s/it, loss=8.13]Epoch 4/5: 24%|██▍ | 46/189 [01:13<03:50, 1.61s/it, loss=8.13]Epoch 4/5: 24%|██▍ | 46/189 [01:15<03:50, 1.61s/it, loss=8.33]Epoch 4/5: 25%|██▍ | 47/189 [01:15<03:48, 1.61s/it, loss=8.33]Epoch 4/5: 25%|██▍ | 47/189 [01:16<03:48, 1.61s/it, loss=8.22]Epoch 4/5: 25%|██▌ | 48/189 [01:16<03:48, 1.62s/it, loss=8.22]Epoch 4/5: 25%|██▌ | 48/189 [01:18<03:48, 1.62s/it, loss=8.22]Epoch 4/5: 26%|██▌ | 49/189 [01:18<03:46, 1.61s/it, loss=8.22]Epoch 4/5: 26%|██▌ | 49/189 [01:20<03:46, 1.61s/it, loss=8.21]Epoch 4/5: 26%|██▋ | 50/189 [01:20<03:46, 1.63s/it, loss=8.21]Epoch 4/5: 26%|██▋ | 50/189 [01:21<03:46, 1.63s/it, loss=8.11]Epoch 4/5: 27%|██▋ | 51/189 [01:21<03:39, 1.59s/it, loss=8.11]Epoch 4/5: 27%|██▋ | 51/189 [01:23<03:39, 1.59s/it, loss=8.11]Epoch 4/5: 28%|██▊ | 52/189 [01:23<03:35, 1.57s/it, loss=8.11]Epoch 4/5: 28%|██▊ | 52/189 [01:24<03:35, 1.57s/it, loss=8.19]Epoch 4/5: 28%|██▊ | 53/189 [01:24<03:37, 1.60s/it, loss=8.19]Epoch 4/5: 28%|██▊ | 53/189 [01:26<03:37, 1.60s/it, loss=8.15]Epoch 4/5: 29%|██▊ | 54/189 [01:26<03:38, 1.62s/it, loss=8.15]Epoch 4/5: 29%|██▊ | 54/189 [01:27<03:38, 1.62s/it, loss=8.08]Epoch 4/5: 29%|██▉ | 55/189 [01:27<03:33, 1.59s/it, loss=8.08]Epoch 4/5: 29%|██▉ | 55/189 [01:29<03:33, 1.59s/it, loss=8.10]Epoch 4/5: 30%|██▉ | 56/189 [01:29<03:27, 1.56s/it, loss=8.10]Epoch 4/5: 30%|██▉ | 56/189 [01:30<03:27, 1.56s/it, loss=8.18]Epoch 4/5: 30%|███ | 57/189 [01:30<03:25, 1.55s/it, loss=8.18]Epoch 4/5: 30%|███ | 57/189 [01:32<03:25, 1.55s/it, loss=8.15]Epoch 4/5: 31%|███ | 58/189 [01:32<03:29, 1.60s/it, loss=8.15]Epoch 4/5: 31%|███ | 58/189 [01:34<03:29, 1.60s/it, loss=8.15]Epoch 4/5: 31%|███ | 59/189 [01:34<03:27, 1.60s/it, loss=8.15]Epoch 4/5: 31%|███ | 59/189 [01:35<03:27, 1.60s/it, loss=8.23]Epoch 4/5: 32%|███▏ | 60/189 [01:35<03:28, 1.62s/it, loss=8.23]Epoch 4/5: 32%|███▏ | 60/189 [01:37<03:28, 1.62s/it, loss=8.15]Epoch 4/5: 32%|███▏ | 61/189 [01:37<03:29, 1.63s/it, loss=8.15]Epoch 4/5: 32%|███▏ | 61/189 [01:39<03:29, 1.63s/it, loss=8.03]Epoch 4/5: 33%|███▎ | 62/189 [01:39<03:27, 1.63s/it, loss=8.03]Epoch 4/5: 33%|███▎ | 62/189 [01:41<03:27, 1.63s/it, loss=7.94]Epoch 4/5: 33%|███▎ | 63/189 [01:41<03:31, 1.68s/it, loss=7.94]Epoch 4/5: 33%|███▎ | 63/189 [01:42<03:31, 1.68s/it, loss=8.06]Epoch 4/5: 34%|███▍ | 64/189 [01:42<03:29, 1.68s/it, loss=8.06]Epoch 4/5: 34%|███▍ | 64/189 [01:44<03:29, 1.68s/it, loss=8.11]Epoch 4/5: 34%|███▍ | 65/189 [01:44<03:25, 1.65s/it, loss=8.11]Epoch 4/5: 34%|███▍ | 65/189 [01:45<03:25, 1.65s/it, loss=7.98]Epoch 4/5: 35%|███▍ | 66/189 [01:45<03:23, 1.65s/it, loss=7.98]Epoch 4/5: 35%|███▍ | 66/189 [01:47<03:23, 1.65s/it, loss=8.13]Epoch 4/5: 35%|███▌ | 67/189 [01:47<03:16, 1.61s/it, loss=8.13]Epoch 4/5: 35%|███▌ | 67/189 [01:48<03:16, 1.61s/it, loss=8.01]Epoch 4/5: 36%|███▌ | 68/189 [01:48<03:09, 1.56s/it, loss=8.01]Epoch 4/5: 36%|███▌ | 68/189 [01:50<03:09, 1.56s/it, loss=8.11]Epoch 4/5: 37%|███▋ | 69/189 [01:50<03:08, 1.57s/it, loss=8.11]Epoch 4/5: 37%|███▋ | 69/189 [01:52<03:08, 1.57s/it, loss=8.02]Epoch 4/5: 37%|███▋ | 70/189 [01:52<03:08, 1.58s/it, loss=8.02]Epoch 4/5: 37%|███▋ | 70/189 [01:53<03:08, 1.58s/it, loss=8.03]Epoch 4/5: 38%|███▊ | 71/189 [01:53<03:09, 1.60s/it, loss=8.03]Epoch 4/5: 38%|███▊ | 71/189 [01:55<03:09, 1.60s/it, loss=8.05]Epoch 4/5: 38%|███▊ | 72/189 [01:55<03:10, 1.63s/it, loss=8.05]Epoch 4/5: 38%|███▊ | 72/189 [01:57<03:10, 1.63s/it, loss=8.16]Epoch 4/5: 39%|███▊ | 73/189 [01:57<03:06, 1.61s/it, loss=8.16]Epoch 4/5: 39%|███▊ | 73/189 [01:58<03:06, 1.61s/it, loss=8.16]Epoch 4/5: 39%|███▉ | 74/189 [01:58<03:07, 1.63s/it, loss=8.16]Epoch 4/5: 39%|███▉ | 74/189 [02:00<03:07, 1.63s/it, loss=8.11]Epoch 4/5: 40%|███▉ | 75/189 [02:00<03:03, 1.61s/it, loss=8.11]Epoch 4/5: 40%|███▉ | 75/189 [02:01<03:03, 1.61s/it, loss=8.11]Epoch 4/5: 40%|████ | 76/189 [02:01<03:04, 1.63s/it, loss=8.11]Epoch 4/5: 40%|████ | 76/189 [02:03<03:04, 1.63s/it, loss=8.08]Epoch 4/5: 41%|████ | 77/189 [02:03<03:00, 1.61s/it, loss=8.08]Epoch 4/5: 41%|████ | 77/189 [02:04<03:00, 1.61s/it, loss=8.08]Epoch 4/5: 41%|████▏ | 78/189 [02:04<02:53, 1.56s/it, loss=8.08]Epoch 4/5: 41%|████▏ | 78/189 [02:06<02:53, 1.56s/it, loss=8.08]Epoch 4/5: 42%|████▏ | 79/189 [02:06<02:52, 1.57s/it, loss=8.08]Epoch 4/5: 42%|████▏ | 79/189 [02:08<02:52, 1.57s/it, loss=8.12]Epoch 4/5: 42%|████▏ | 80/189 [02:08<02:50, 1.57s/it, loss=8.12]Epoch 4/5: 42%|████▏ | 80/189 [02:09<02:50, 1.57s/it, loss=8.11]Epoch 4/5: 43%|████▎ | 81/189 [02:09<02:49, 1.57s/it, loss=8.11]Epoch 4/5: 43%|████▎ | 81/189 [02:11<02:49, 1.57s/it, loss=7.96]Epoch 4/5: 43%|████▎ | 82/189 [02:11<02:51, 1.60s/it, loss=7.96]Epoch 4/5: 43%|████▎ | 82/189 [02:12<02:51, 1.60s/it, loss=8.03]Epoch 4/5: 44%|████▍ | 83/189 [02:12<02:50, 1.61s/it, loss=8.03]Epoch 4/5: 44%|████▍ | 83/189 [02:14<02:50, 1.61s/it, loss=7.98]Epoch 4/5: 44%|████▍ | 84/189 [02:14<02:47, 1.59s/it, loss=7.98]Epoch 4/5: 44%|████▍ | 84/189 [02:16<02:47, 1.59s/it, loss=7.97]Epoch 4/5: 45%|████▍ | 85/189 [02:16<02:51, 1.65s/it, loss=7.97]Epoch 4/5: 45%|████▍ | 85/189 [02:18<02:51, 1.65s/it, loss=8.13]Epoch 4/5: 46%|████▌ | 86/189 [02:18<02:50, 1.66s/it, loss=8.13]Epoch 4/5: 46%|████▌ | 86/189 [02:19<02:50, 1.66s/it, loss=8.12]Epoch 4/5: 46%|████▌ | 87/189 [02:19<02:47, 1.64s/it, loss=8.12]Epoch 4/5: 46%|████▌ | 87/189 [02:21<02:47, 1.64s/it, loss=8.12]Epoch 4/5: 47%|████▋ | 88/189 [02:21<02:45, 1.64s/it, loss=8.12]Epoch 4/5: 47%|████▋ | 88/189 [02:23<02:45, 1.64s/it, loss=8.02]Epoch 4/5: 47%|████▋ | 89/189 [02:23<02:50, 1.70s/it, loss=8.02]Epoch 4/5: 47%|████▋ | 89/189 [02:24<02:50, 1.70s/it, loss=8.00]Epoch 4/5: 48%|████▊ | 90/189 [02:24<02:46, 1.68s/it, loss=8.00]Epoch 4/5: 48%|████▊ | 90/189 [02:26<02:46, 1.68s/it, loss=8.12]Epoch 4/5: 48%|████▊ | 91/189 [02:26<02:44, 1.68s/it, loss=8.12]Epoch 4/5: 48%|████▊ | 91/189 [02:27<02:44, 1.68s/it, loss=8.13]Epoch 4/5: 49%|████▊ | 92/189 [02:27<02:40, 1.65s/it, loss=8.13]Epoch 4/5: 49%|████▊ | 92/189 [02:29<02:40, 1.65s/it, loss=8.04]Epoch 4/5: 49%|████▉ | 93/189 [02:29<02:37, 1.64s/it, loss=8.04]Epoch 4/5: 49%|████▉ | 93/189 [02:31<02:37, 1.64s/it, loss=8.08]Epoch 4/5: 50%|████▉ | 94/189 [02:31<02:31, 1.59s/it, loss=8.08]Epoch 4/5: 50%|████▉ | 94/189 [02:32<02:31, 1.59s/it, loss=8.00]Epoch 4/5: 50%|█████ | 95/189 [02:32<02:32, 1.62s/it, loss=8.00]Epoch 4/5: 50%|█████ | 95/189 [02:34<02:32, 1.62s/it, loss=8.02]Epoch 4/5: 51%|█████ | 96/189 [02:34<02:30, 1.62s/it, loss=8.02]Epoch 4/5: 51%|█████ | 96/189 [02:36<02:30, 1.62s/it, loss=8.11]Epoch 4/5: 51%|█████▏ | 97/189 [02:36<02:31, 1.65s/it, loss=8.11]Epoch 4/5: 51%|█████▏ | 97/189 [02:37<02:31, 1.65s/it, loss=7.96]Epoch 4/5: 52%|█████▏ | 98/189 [02:37<02:31, 1.66s/it, loss=7.96]Epoch 4/5: 52%|█████▏ | 98/189 [02:39<02:31, 1.66s/it, loss=8.11]Epoch 4/5: 52%|█████▏ | 99/189 [02:39<02:27, 1.64s/it, loss=8.11]Epoch 4/5: 52%|█████▏ | 99/189 [02:40<02:27, 1.64s/it, loss=7.94]Epoch 4/5: 53%|█████▎ | 100/189 [02:40<02:23, 1.61s/it, loss=7.94]Epoch 4/5: 53%|█████▎ | 100/189 [02:42<02:23, 1.61s/it, loss=8.13]Epoch 4/5: 53%|█████▎ | 101/189 [02:42<02:22, 1.62s/it, loss=8.13]Epoch 4/5: 53%|█████▎ | 101/189 [02:44<02:22, 1.62s/it, loss=8.12]Epoch 4/5: 54%|█████▍ | 102/189 [02:44<02:17, 1.58s/it, loss=8.12]Epoch 4/5: 54%|█████▍ | 102/189 [02:45<02:17, 1.58s/it, loss=8.07]Epoch 4/5: 54%|█████▍ | 103/189 [02:45<02:11, 1.53s/it, loss=8.07]Epoch 4/5: 54%|█████▍ | 103/189 [02:47<02:11, 1.53s/it, loss=7.99]Epoch 4/5: 55%|█████▌ | 104/189 [02:47<02:14, 1.58s/it, loss=7.99]Epoch 4/5: 55%|█████▌ | 104/189 [02:48<02:14, 1.58s/it, loss=7.98]Epoch 4/5: 56%|█████▌ | 105/189 [02:48<02:14, 1.61s/it, loss=7.98]Epoch 4/5: 56%|█████▌ | 105/189 [02:50<02:14, 1.61s/it, loss=7.96]Epoch 4/5: 56%|█████▌ | 106/189 [02:50<02:14, 1.62s/it, loss=7.96]Epoch 4/5: 56%|█████▌ | 106/189 [02:52<02:14, 1.62s/it, loss=8.02]Epoch 4/5: 57%|█████▋ | 107/189 [02:52<02:15, 1.65s/it, loss=8.02]Epoch 4/5: 57%|█████▋ | 107/189 [02:53<02:15, 1.65s/it, loss=8.03]Epoch 4/5: 57%|█████▋ | 108/189 [02:53<02:09, 1.60s/it, loss=8.03]Epoch 4/5: 57%|█████▋ | 108/189 [02:55<02:09, 1.60s/it, loss=8.18]Epoch 4/5: 58%|█████▊ | 109/189 [02:55<02:07, 1.59s/it, loss=8.18]Epoch 4/5: 58%|█████▊ | 109/189 [02:56<02:07, 1.59s/it, loss=7.83]Epoch 4/5: 58%|█████▊ | 110/189 [02:56<02:03, 1.56s/it, loss=7.83]Epoch 4/5: 58%|█████▊ | 110/189 [02:58<02:03, 1.56s/it, loss=8.14]Epoch 4/5: 59%|█████▊ | 111/189 [02:58<02:02, 1.57s/it, loss=8.14]Epoch 4/5: 59%|█████▊ | 111/189 [03:00<02:02, 1.57s/it, loss=8.05]Epoch 4/5: 59%|█████▉ | 112/189 [03:00<02:03, 1.60s/it, loss=8.05]Epoch 4/5: 59%|█████▉ | 112/189 [03:01<02:03, 1.60s/it, loss=8.09]Epoch 4/5: 60%|█████▉ | 113/189 [03:01<02:03, 1.62s/it, loss=8.09]Epoch 4/5: 60%|█████▉ | 113/189 [03:03<02:03, 1.62s/it, loss=7.88]Epoch 4/5: 60%|██████ | 114/189 [03:03<02:01, 1.61s/it, loss=7.88]Epoch 4/5: 60%|██████ | 114/189 [03:04<02:01, 1.61s/it, loss=7.95]Epoch 4/5: 61%|██████ | 115/189 [03:04<01:57, 1.59s/it, loss=7.95]Epoch 4/5: 61%|██████ | 115/189 [03:06<01:57, 1.59s/it, loss=8.16]Epoch 4/5: 61%|██████▏ | 116/189 [03:06<01:57, 1.61s/it, loss=8.16]Epoch 4/5: 61%|██████▏ | 116/189 [03:08<01:57, 1.61s/it, loss=8.02]Epoch 4/5: 62%|██████▏ | 117/189 [03:08<01:55, 1.60s/it, loss=8.02]Epoch 4/5: 62%|██████▏ | 117/189 [03:09<01:55, 1.60s/it, loss=8.01]Epoch 4/5: 62%|██████▏ | 118/189 [03:09<01:55, 1.62s/it, loss=8.01]Epoch 4/5: 62%|██████▏ | 118/189 [03:11<01:55, 1.62s/it, loss=7.77]Epoch 4/5: 63%|██████▎ | 119/189 [03:11<01:53, 1.62s/it, loss=7.77]Epoch 4/5: 63%|██████▎ | 119/189 [03:12<01:53, 1.62s/it, loss=8.01]Epoch 4/5: 63%|██████▎ | 120/189 [03:12<01:52, 1.63s/it, loss=8.01]Epoch 4/5: 63%|██████▎ | 120/189 [03:14<01:52, 1.63s/it, loss=7.82]Epoch 4/5: 64%|██████▍ | 121/189 [03:14<01:51, 1.64s/it, loss=7.82]Epoch 4/5: 64%|██████▍ | 121/189 [03:16<01:51, 1.64s/it, loss=8.04]Epoch 4/5: 65%|██████▍ | 122/189 [03:16<01:51, 1.67s/it, loss=8.04]Epoch 4/5: 65%|██████▍ | 122/189 [03:17<01:51, 1.67s/it, loss=8.08]Epoch 4/5: 65%|██████▌ | 123/189 [03:17<01:47, 1.63s/it, loss=8.08]Epoch 4/5: 65%|██████▌ | 123/189 [03:19<01:47, 1.63s/it, loss=7.92]Epoch 4/5: 66%|██████▌ | 124/189 [03:19<01:45, 1.63s/it, loss=7.92]Epoch 4/5: 66%|██████▌ | 124/189 [03:21<01:45, 1.63s/it, loss=8.08]Epoch 4/5: 66%|██████▌ | 125/189 [03:21<01:41, 1.59s/it, loss=8.08]Epoch 4/5: 66%|██████▌ | 125/189 [03:22<01:41, 1.59s/it, loss=7.87]Epoch 4/5: 67%|██████▋ | 126/189 [03:22<01:38, 1.57s/it, loss=7.87]Epoch 4/5: 67%|██████▋ | 126/189 [03:24<01:38, 1.57s/it, loss=8.12]Epoch 4/5: 67%|██████▋ | 127/189 [03:24<01:38, 1.58s/it, loss=8.12]Epoch 4/5: 67%|██████▋ | 127/189 [03:25<01:38, 1.58s/it, loss=8.02]Epoch 4/5: 68%|██████▊ | 128/189 [03:25<01:36, 1.58s/it, loss=8.02]Epoch 4/5: 68%|██████▊ | 128/189 [03:27<01:36, 1.58s/it, loss=8.03]Epoch 4/5: 68%|██████▊ | 129/189 [03:27<01:38, 1.63s/it, loss=8.03]Epoch 4/5: 68%|██████▊ | 129/189 [03:29<01:38, 1.63s/it, loss=8.11]Epoch 4/5: 69%|██████▉ | 130/189 [03:29<01:36, 1.63s/it, loss=8.11]Epoch 4/5: 69%|██████▉ | 130/189 [03:30<01:36, 1.63s/it, loss=7.91]Epoch 4/5: 69%|██████▉ | 131/189 [03:30<01:34, 1.64s/it, loss=7.91]Epoch 4/5: 69%|██████▉ | 131/189 [03:32<01:34, 1.64s/it, loss=7.94]Epoch 4/5: 70%|██████▉ | 132/189 [03:32<01:34, 1.67s/it, loss=7.94]Epoch 4/5: 70%|██████▉ | 132/189 [03:34<01:34, 1.67s/it, loss=7.95]Epoch 4/5: 70%|███████ | 133/189 [03:34<01:33, 1.66s/it, loss=7.95]Epoch 4/5: 70%|███████ | 133/189 [03:35<01:33, 1.66s/it, loss=7.78]Epoch 4/5: 71%|███████ | 134/189 [03:35<01:29, 1.63s/it, loss=7.78]Epoch 4/5: 71%|███████ | 134/189 [03:37<01:29, 1.63s/it, loss=8.03]Epoch 4/5: 71%|███████▏ | 135/189 [03:37<01:28, 1.64s/it, loss=8.03]Epoch 4/5: 71%|███████▏ | 135/189 [03:39<01:28, 1.64s/it, loss=7.90]Epoch 4/5: 72%|███████▏ | 136/189 [03:39<01:26, 1.64s/it, loss=7.90]Epoch 4/5: 72%|███████▏ | 136/189 [03:40<01:26, 1.64s/it, loss=8.02]Epoch 4/5: 72%|███████▏ | 137/189 [03:40<01:26, 1.67s/it, loss=8.02]Epoch 4/5: 72%|███████▏ | 137/189 [03:42<01:26, 1.67s/it, loss=7.86]Epoch 4/5: 73%|███████▎ | 138/189 [03:42<01:25, 1.68s/it, loss=7.86]Epoch 4/5: 73%|███████▎ | 138/189 [03:44<01:25, 1.68s/it, loss=7.98]Epoch 4/5: 74%|███████▎ | 139/189 [03:44<01:23, 1.66s/it, loss=7.98]Epoch 4/5: 74%|███████▎ | 139/189 [03:45<01:23, 1.66s/it, loss=7.85]Epoch 4/5: 74%|███████▍ | 140/189 [03:45<01:22, 1.68s/it, loss=7.85]Epoch 4/5: 74%|███████▍ | 140/189 [03:47<01:22, 1.68s/it, loss=8.20]Epoch 4/5: 75%|███████▍ | 141/189 [03:47<01:16, 1.60s/it, loss=8.20]Epoch 4/5: 75%|███████▍ | 141/189 [03:48<01:16, 1.60s/it, loss=8.05]Epoch 4/5: 75%|███████▌ | 142/189 [03:48<01:16, 1.63s/it, loss=8.05]Epoch 4/5: 75%|███████▌ | 142/189 [03:50<01:16, 1.63s/it, loss=8.03]Epoch 4/5: 76%|███████▌ | 143/189 [03:50<01:16, 1.66s/it, loss=8.03]Epoch 4/5: 76%|███████▌ | 143/189 [03:52<01:16, 1.66s/it, loss=7.94]Epoch 4/5: 76%|███████▌ | 144/189 [03:52<01:14, 1.65s/it, loss=7.94]Epoch 4/5: 76%|███████▌ | 144/189 [03:53<01:14, 1.65s/it, loss=8.02]Epoch 4/5: 77%|███████▋ | 145/189 [03:53<01:11, 1.63s/it, loss=8.02]Epoch 4/5: 77%|███████▋ | 145/189 [03:55<01:11, 1.63s/it, loss=7.91]Epoch 4/5: 77%|███████▋ | 146/189 [03:55<01:09, 1.62s/it, loss=7.91]Epoch 4/5: 77%|███████▋ | 146/189 [03:57<01:09, 1.62s/it, loss=7.79]Epoch 4/5: 78%|███████▊ | 147/189 [03:57<01:09, 1.64s/it, loss=7.79]Epoch 4/5: 78%|███████▊ | 147/189 [03:58<01:09, 1.64s/it, loss=7.92]Epoch 4/5: 78%|███████▊ | 148/189 [03:58<01:07, 1.66s/it, loss=7.92]Epoch 4/5: 78%|███████▊ | 148/189 [04:00<01:07, 1.66s/it, loss=7.88]Epoch 4/5: 79%|███████▉ | 149/189 [04:00<01:06, 1.67s/it, loss=7.88]Epoch 4/5: 79%|███████▉ | 149/189 [04:02<01:06, 1.67s/it, loss=7.77]Epoch 4/5: 79%|███████▉ | 150/189 [04:02<01:05, 1.68s/it, loss=7.77]Epoch 4/5: 79%|███████▉ | 150/189 [04:04<01:05, 1.68s/it, loss=7.93]Epoch 4/5: 80%|███████▉ | 151/189 [04:04<01:04, 1.70s/it, loss=7.93]Epoch 4/5: 80%|███████▉ | 151/189 [04:05<01:04, 1.70s/it, loss=7.93]Epoch 4/5: 80%|████████ | 152/189 [04:05<01:02, 1.69s/it, loss=7.93]Epoch 4/5: 80%|████████ | 152/189 [04:07<01:02, 1.69s/it, loss=7.96]Epoch 4/5: 81%|████████ | 153/189 [04:07<01:01, 1.71s/it, loss=7.96]Epoch 4/5: 81%|████████ | 153/189 [04:09<01:01, 1.71s/it, loss=7.98]Epoch 4/5: 81%|████████▏ | 154/189 [04:09<00:59, 1.71s/it, loss=7.98]Epoch 4/5: 81%|████████▏ | 154/189 [04:10<00:59, 1.71s/it, loss=7.90]Epoch 4/5: 82%|████████▏ | 155/189 [04:10<00:58, 1.71s/it, loss=7.90]Epoch 4/5: 82%|████████▏ | 155/189 [04:12<00:58, 1.71s/it, loss=7.77]Epoch 4/5: 83%|████████▎ | 156/189 [04:12<00:54, 1.65s/it, loss=7.77]Epoch 4/5: 83%|████████▎ | 156/189 [04:14<00:54, 1.65s/it, loss=7.96]Epoch 4/5: 83%|████████▎ | 157/189 [04:14<00:53, 1.66s/it, loss=7.96]Epoch 4/5: 83%|████████▎ | 157/189 [04:15<00:53, 1.66s/it, loss=7.86]Epoch 4/5: 84%|████████▎ | 158/189 [04:15<00:50, 1.63s/it, loss=7.86]Epoch 4/5: 84%|████████▎ | 158/189 [04:17<00:50, 1.63s/it, loss=7.83]Epoch 4/5: 84%|████████▍ | 159/189 [04:17<00:49, 1.64s/it, loss=7.83]Epoch 4/5: 84%|████████▍ | 159/189 [04:18<00:49, 1.64s/it, loss=7.81]Epoch 4/5: 85%|████████▍ | 160/189 [04:18<00:47, 1.65s/it, loss=7.81]Epoch 4/5: 85%|████████▍ | 160/189 [04:20<00:47, 1.65s/it, loss=7.94]Epoch 4/5: 85%|████████▌ | 161/189 [04:20<00:46, 1.66s/it, loss=7.94]Epoch 4/5: 85%|████████▌ | 161/189 [04:22<00:46, 1.66s/it, loss=7.79]Epoch 4/5: 86%|████████▌ | 162/189 [04:22<00:44, 1.66s/it, loss=7.79]Epoch 4/5: 86%|████████▌ | 162/189 [04:23<00:44, 1.66s/it, loss=7.77]Epoch 4/5: 86%|████████▌ | 163/189 [04:23<00:41, 1.61s/it, loss=7.77]Epoch 4/5: 86%|████████▌ | 163/189 [04:25<00:41, 1.61s/it, loss=7.72]Epoch 4/5: 87%|████████▋ | 164/189 [04:25<00:40, 1.63s/it, loss=7.72]Epoch 4/5: 87%|████████▋ | 164/189 [04:26<00:40, 1.63s/it, loss=8.04]Epoch 4/5: 87%|████████▋ | 165/189 [04:26<00:37, 1.58s/it, loss=8.04]Epoch 4/5: 87%|████████▋ | 165/189 [04:28<00:37, 1.58s/it, loss=7.95]Epoch 4/5: 88%|████████▊ | 166/189 [04:28<00:35, 1.53s/it, loss=7.95]Epoch 4/5: 88%|████████▊ | 166/189 [04:29<00:35, 1.53s/it, loss=7.90]Epoch 4/5: 88%|████████▊ | 167/189 [04:29<00:34, 1.55s/it, loss=7.90]Epoch 4/5: 88%|████████▊ | 167/189 [04:31<00:34, 1.55s/it, loss=7.85]Epoch 4/5: 89%|████████▉ | 168/189 [04:31<00:33, 1.58s/it, loss=7.85]Epoch 4/5: 89%|████████▉ | 168/189 [04:33<00:33, 1.58s/it, loss=7.91]Epoch 4/5: 89%|████████▉ | 169/189 [04:33<00:32, 1.60s/it, loss=7.91]Epoch 4/5: 89%|████████▉ | 169/189 [04:34<00:32, 1.60s/it, loss=7.77]Epoch 4/5: 90%|████████▉ | 170/189 [04:34<00:29, 1.58s/it, loss=7.77]Epoch 4/5: 90%|████████▉ | 170/189 [04:36<00:29, 1.58s/it, loss=7.96]Epoch 4/5: 90%|█████████ | 171/189 [04:36<00:29, 1.62s/it, loss=7.96]Epoch 4/5: 90%|█████████ | 171/189 [04:38<00:29, 1.62s/it, loss=8.05]Epoch 4/5: 91%|█████████ | 172/189 [04:38<00:27, 1.60s/it, loss=8.05]Epoch 4/5: 91%|█████████ | 172/189 [04:39<00:27, 1.60s/it, loss=8.11]Epoch 4/5: 92%|█████████▏| 173/189 [04:39<00:25, 1.60s/it, loss=8.11]Epoch 4/5: 92%|█████████▏| 173/189 [04:41<00:25, 1.60s/it, loss=7.86]Epoch 4/5: 92%|█████████▏| 174/189 [04:41<00:23, 1.59s/it, loss=7.86]Epoch 4/5: 92%|█████████▏| 174/189 [04:42<00:23, 1.59s/it, loss=7.77]Epoch 4/5: 93%|█████████▎| 175/189 [04:42<00:22, 1.60s/it, loss=7.77]Epoch 4/5: 93%|█████████▎| 175/189 [04:44<00:22, 1.60s/it, loss=7.96]Epoch 4/5: 93%|█████████▎| 176/189 [04:44<00:21, 1.62s/it, loss=7.96]Epoch 4/5: 93%|█████████▎| 176/189 [04:46<00:21, 1.62s/it, loss=7.82]Epoch 4/5: 94%|█████████▎| 177/189 [04:46<00:19, 1.63s/it, loss=7.82]Epoch 4/5: 94%|█████████▎| 177/189 [04:47<00:19, 1.63s/it, loss=7.88]Epoch 4/5: 94%|█████████▍| 178/189 [04:47<00:17, 1.62s/it, loss=7.88]Epoch 4/5: 94%|█████████▍| 178/189 [04:49<00:17, 1.62s/it, loss=7.86]Epoch 4/5: 95%|█████████▍| 179/189 [04:49<00:16, 1.60s/it, loss=7.86]Epoch 4/5: 95%|█████████▍| 179/189 [04:50<00:16, 1.60s/it, loss=7.97]Epoch 4/5: 95%|█████████▌| 180/189 [04:50<00:14, 1.60s/it, loss=7.97]Epoch 4/5: 95%|█████████▌| 180/189 [04:52<00:14, 1.60s/it, loss=7.95]Epoch 4/5: 96%|█████████▌| 181/189 [04:52<00:12, 1.61s/it, loss=7.95]Epoch 4/5: 96%|█████████▌| 181/189 [04:53<00:12, 1.61s/it, loss=7.97]Epoch 4/5: 96%|█████████▋| 182/189 [04:53<00:10, 1.56s/it, loss=7.97]Epoch 4/5: 96%|█████████▋| 182/189 [04:55<00:10, 1.56s/it, loss=7.95]Epoch 4/5: 97%|█████████▋| 183/189 [04:55<00:09, 1.62s/it, loss=7.95]Epoch 4/5: 97%|█████████▋| 183/189 [04:57<00:09, 1.62s/it, loss=7.92]Epoch 4/5: 97%|█████████▋| 184/189 [04:57<00:08, 1.64s/it, loss=7.92]Epoch 4/5: 97%|█████████▋| 184/189 [04:59<00:08, 1.64s/it, loss=7.97]Epoch 4/5: 98%|█████████▊| 185/189 [04:59<00:06, 1.66s/it, loss=7.97]Epoch 4/5: 98%|█████████▊| 185/189 [05:00<00:06, 1.66s/it, loss=7.78]Epoch 4/5: 98%|█████████▊| 186/189 [05:00<00:05, 1.67s/it, loss=7.78]Epoch 4/5: 98%|█████████▊| 186/189 [05:02<00:05, 1.67s/it, loss=7.72]Epoch 4/5: 99%|█████████▉| 187/189 [05:02<00:03, 1.57s/it, loss=7.72]Epoch 4/5: 99%|█████████▉| 187/189 [05:03<00:03, 1.57s/it, loss=7.84]Epoch 4/5: 99%|█████████▉| 188/189 [05:03<00:01, 1.55s/it, loss=7.84]Epoch 4/5: 99%|█████████▉| 188/189 [05:05<00:01, 1.55s/it, loss=7.76]Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00, 1.54s/it, loss=7.76]Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00, 1.61s/it, loss=7.76]
0%| | 0/23 [00:00<?, ?it/s] 4%|▍ | 1/23 [00:00<00:07, 2.82it/s] 9%|▊ | 2/23 [00:00<00:07, 2.96it/s] 13%|█▎ | 3/23 [00:00<00:05, 3.48it/s] 17%|█▋ | 4/23 [00:01<00:06, 2.97it/s] 22%|██▏ | 5/23 [00:01<00:06, 2.89it/s] 26%|██▌ | 6/23 [00:02<00:05, 2.87it/s] 30%|███ | 7/23 [00:02<00:05, 2.90it/s] 35%|███▍ | 8/23 [00:02<00:05, 2.97it/s] 39%|███▉ | 9/23 [00:03<00:04, 3.02it/s] 43%|████▎ | 10/23 [00:03<00:04, 2.97it/s] 48%|████▊ | 11/23 [00:03<00:04, 2.92it/s] 52%|█████▏ | 12/23 [00:04<00:03, 3.05it/s] 57%|█████▋ | 13/23 [00:04<00:03, 2.98it/s] 61%|██████ | 14/23 [00:04<00:03, 2.97it/s] 65%|██████▌ | 15/23 [00:05<00:02, 2.88it/s] 70%|██████▉ | 16/23 [00:05<00:02, 2.88it/s] 74%|███████▍ | 17/23 [00:05<00:02, 2.81it/s] 78%|███████▊ | 18/23 [00:06<00:01, 2.89it/s] 83%|████████▎ | 19/23 [00:06<00:01, 2.94it/s] 87%|████████▋ | 20/23 [00:06<00:00, 3.06it/s] 91%|█████████▏| 21/23 [00:07<00:00, 3.05it/s] 96%|█████████▌| 22/23 [00:07<00:00, 2.95it/s]100%|██████████| 23/23 [00:07<00:00, 2.92it/s]100%|██████████| 23/23 [00:07<00:00, 2.95it/s]
Epoch 4: train_loss=8.0537 | R@10=0.0265 | DCG@10=0.2720 | NDCG@10=0.0660
Epoch 5/5: 0%| | 0/189 [00:00<?, ?it/s]Epoch 5/5: 0%| | 0/189 [00:01<?, ?it/s, loss=7.68]Epoch 5/5: 1%| | 1/189 [00:01<04:44, 1.51s/it, loss=7.68]Epoch 5/5: 1%| | 1/189 [00:02<04:44, 1.51s/it, loss=7.91]Epoch 5/5: 1%| | 2/189 [00:02<04:39, 1.49s/it, loss=7.91]Epoch 5/5: 1%| | 2/189 [00:04<04:39, 1.49s/it, loss=7.74]Epoch 5/5: 2%|▏ | 3/189 [00:04<04:38, 1.50s/it, loss=7.74]Epoch 5/5: 2%|▏ | 3/189 [00:05<04:38, 1.50s/it, loss=7.80]Epoch 5/5: 2%|▏ | 4/189 [00:05<04:33, 1.48s/it, loss=7.80]Epoch 5/5: 2%|▏ | 4/189 [00:07<04:33, 1.48s/it, loss=7.79]Epoch 5/5: 3%|▎ | 5/189 [00:07<04:37, 1.51s/it, loss=7.79]Epoch 5/5: 3%|▎ | 5/189 [00:09<04:37, 1.51s/it, loss=7.93]Epoch 5/5: 3%|▎ | 6/189 [00:09<04:42, 1.55s/it, loss=7.93]Epoch 5/5: 3%|▎ | 6/189 [00:10<04:42, 1.55s/it, loss=7.79]Epoch 5/5: 4%|▎ | 7/189 [00:10<04:28, 1.47s/it, loss=7.79]Epoch 5/5: 4%|▎ | 7/189 [00:12<04:28, 1.47s/it, loss=8.03]Epoch 5/5: 4%|▍ | 8/189 [00:12<04:41, 1.55s/it, loss=8.03]Epoch 5/5: 4%|▍ | 8/189 [00:13<04:41, 1.55s/it, loss=7.76]Epoch 5/5: 5%|▍ | 9/189 [00:13<04:36, 1.54s/it, loss=7.76]Epoch 5/5: 5%|▍ | 9/189 [00:15<04:36, 1.54s/it, loss=7.93]Epoch 5/5: 5%|▌ | 10/189 [00:15<04:29, 1.51s/it, loss=7.93]Epoch 5/5: 5%|▌ | 10/189 [00:16<04:29, 1.51s/it, loss=8.03]Epoch 5/5: 6%|▌ | 11/189 [00:16<04:30, 1.52s/it, loss=8.03]Epoch 5/5: 6%|▌ | 11/189 [00:18<04:30, 1.52s/it, loss=7.88]Epoch 5/5: 6%|▋ | 12/189 [00:18<04:32, 1.54s/it, loss=7.88]Epoch 5/5: 6%|▋ | 12/189 [00:19<04:32, 1.54s/it, loss=7.80]Epoch 5/5: 7%|▋ | 13/189 [00:19<04:29, 1.53s/it, loss=7.80]Epoch 5/5: 7%|▋ | 13/189 [00:21<04:29, 1.53s/it, loss=7.97]Epoch 5/5: 7%|▋ | 14/189 [00:21<04:24, 1.51s/it, loss=7.97]Epoch 5/5: 7%|▋ | 14/189 [00:22<04:24, 1.51s/it, loss=8.00]Epoch 5/5: 8%|▊ | 15/189 [00:22<04:24, 1.52s/it, loss=8.00]Epoch 5/5: 8%|▊ | 15/189 [00:24<04:24, 1.52s/it, loss=7.76]Epoch 5/5: 8%|▊ | 16/189 [00:24<04:25, 1.54s/it, loss=7.76]Epoch 5/5: 8%|▊ | 16/189 [00:25<04:25, 1.54s/it, loss=8.08]Epoch 5/5: 9%|▉ | 17/189 [00:25<04:22, 1.53s/it, loss=8.08]Epoch 5/5: 9%|▉ | 17/189 [00:27<04:22, 1.53s/it, loss=7.82]Epoch 5/5: 10%|▉ | 18/189 [00:27<04:20, 1.53s/it, loss=7.82]Epoch 5/5: 10%|▉ | 18/189 [00:28<04:20, 1.53s/it, loss=7.66]Epoch 5/5: 10%|█ | 19/189 [00:28<04:24, 1.56s/it, loss=7.66]Epoch 5/5: 10%|█ | 19/189 [00:30<04:24, 1.56s/it, loss=7.69]Epoch 5/5: 11%|█ | 20/189 [00:30<04:22, 1.55s/it, loss=7.69]Epoch 5/5: 11%|█ | 20/189 [00:32<04:22, 1.55s/it, loss=7.87]Epoch 5/5: 11%|█ | 21/189 [00:32<04:22, 1.56s/it, loss=7.87]Epoch 5/5: 11%|█ | 21/189 [00:33<04:22, 1.56s/it, loss=7.87]Epoch 5/5: 12%|█▏ | 22/189 [00:33<04:21, 1.56s/it, loss=7.87]Epoch 5/5: 12%|█▏ | 22/189 [00:35<04:21, 1.56s/it, loss=7.87]Epoch 5/5: 12%|█▏ | 23/189 [00:35<04:16, 1.54s/it, loss=7.87]Epoch 5/5: 12%|█▏ | 23/189 [00:36<04:16, 1.54s/it, loss=7.64]Epoch 5/5: 13%|█▎ | 24/189 [00:36<04:13, 1.54s/it, loss=7.64]Epoch 5/5: 13%|█▎ | 24/189 [00:38<04:13, 1.54s/it, loss=7.83]Epoch 5/5: 13%|█▎ | 25/189 [00:38<04:14, 1.55s/it, loss=7.83]Epoch 5/5: 13%|█▎ | 25/189 [00:39<04:14, 1.55s/it, loss=7.88]Epoch 5/5: 14%|█▍ | 26/189 [00:39<04:12, 1.55s/it, loss=7.88]Epoch 5/5: 14%|█▍ | 26/189 [00:41<04:12, 1.55s/it, loss=7.93]Epoch 5/5: 14%|█▍ | 27/189 [00:41<04:12, 1.56s/it, loss=7.93]Epoch 5/5: 14%|█▍ | 27/189 [00:42<04:12, 1.56s/it, loss=7.88]Epoch 5/5: 15%|█▍ | 28/189 [00:42<04:06, 1.53s/it, loss=7.88]Epoch 5/5: 15%|█▍ | 28/189 [00:44<04:06, 1.53s/it, loss=7.90]Epoch 5/5: 15%|█▌ | 29/189 [00:44<03:49, 1.44s/it, loss=7.90]Epoch 5/5: 15%|█▌ | 29/189 [00:45<03:49, 1.44s/it, loss=7.81]Epoch 5/5: 16%|█▌ | 30/189 [00:45<03:56, 1.49s/it, loss=7.81]Epoch 5/5: 16%|█▌ | 30/189 [00:47<03:56, 1.49s/it, loss=7.91]Epoch 5/5: 16%|█▋ | 31/189 [00:47<04:01, 1.53s/it, loss=7.91]Epoch 5/5: 16%|█▋ | 31/189 [00:49<04:01, 1.53s/it, loss=7.87]Epoch 5/5: 17%|█▋ | 32/189 [00:49<04:06, 1.57s/it, loss=7.87]Epoch 5/5: 17%|█▋ | 32/189 [00:50<04:06, 1.57s/it, loss=7.83]Epoch 5/5: 17%|█▋ | 33/189 [00:50<04:02, 1.56s/it, loss=7.83]Epoch 5/5: 17%|█▋ | 33/189 [00:52<04:02, 1.56s/it, loss=7.84]Epoch 5/5: 18%|█▊ | 34/189 [00:52<04:04, 1.58s/it, loss=7.84]Epoch 5/5: 18%|█▊ | 34/189 [00:53<04:04, 1.58s/it, loss=7.87]Epoch 5/5: 19%|█▊ | 35/189 [00:53<03:57, 1.55s/it, loss=7.87]Epoch 5/5: 19%|█▊ | 35/189 [00:55<03:57, 1.55s/it, loss=7.78]Epoch 5/5: 19%|█▉ | 36/189 [00:55<03:57, 1.55s/it, loss=7.78]Epoch 5/5: 19%|█▉ | 36/189 [00:56<03:57, 1.55s/it, loss=7.81]Epoch 5/5: 20%|█▉ | 37/189 [00:56<04:05, 1.62s/it, loss=7.81]Epoch 5/5: 20%|█▉ | 37/189 [00:58<04:05, 1.62s/it, loss=7.78]Epoch 5/5: 20%|██ | 38/189 [00:58<04:05, 1.63s/it, loss=7.78]Epoch 5/5: 20%|██ | 38/189 [01:00<04:05, 1.63s/it, loss=7.81]Epoch 5/5: 21%|██ | 39/189 [01:00<04:05, 1.64s/it, loss=7.81]Epoch 5/5: 21%|██ | 39/189 [01:01<04:05, 1.64s/it, loss=7.80]Epoch 5/5: 21%|██ | 40/189 [01:01<04:03, 1.63s/it, loss=7.80]Epoch 5/5: 21%|██ | 40/189 [01:03<04:03, 1.63s/it, loss=7.67]Epoch 5/5: 22%|██▏ | 41/189 [01:03<04:01, 1.63s/it, loss=7.67]Epoch 5/5: 22%|██▏ | 41/189 [01:05<04:01, 1.63s/it, loss=7.80]Epoch 5/5: 22%|██▏ | 42/189 [01:05<04:04, 1.67s/it, loss=7.80]Epoch 5/5: 22%|██▏ | 42/189 [01:06<04:04, 1.67s/it, loss=7.76]Epoch 5/5: 23%|██▎ | 43/189 [01:06<04:03, 1.67s/it, loss=7.76]Epoch 5/5: 23%|██▎ | 43/189 [01:08<04:03, 1.67s/it, loss=7.85]Epoch 5/5: 23%|██▎ | 44/189 [01:08<04:02, 1.67s/it, loss=7.85]Epoch 5/5: 23%|██▎ | 44/189 [01:10<04:02, 1.67s/it, loss=7.73]Epoch 5/5: 24%|██▍ | 45/189 [01:10<04:00, 1.67s/it, loss=7.73]Epoch 5/5: 24%|██▍ | 45/189 [01:11<04:00, 1.67s/it, loss=7.94]Epoch 5/5: 24%|██▍ | 46/189 [01:11<03:57, 1.66s/it, loss=7.94]Epoch 5/5: 24%|██▍ | 46/189 [01:13<03:57, 1.66s/it, loss=7.81]Epoch 5/5: 25%|██▍ | 47/189 [01:13<03:55, 1.66s/it, loss=7.81]Epoch 5/5: 25%|██▍ | 47/189 [01:15<03:55, 1.66s/it, loss=7.85]Epoch 5/5: 25%|██▌ | 48/189 [01:15<03:59, 1.70s/it, loss=7.85]Epoch 5/5: 25%|██▌ | 48/189 [01:17<03:59, 1.70s/it, loss=7.63]Epoch 5/5: 26%|██▌ | 49/189 [01:17<04:00, 1.72s/it, loss=7.63]Epoch 5/5: 26%|██▌ | 49/189 [01:18<04:00, 1.72s/it, loss=7.65]Epoch 5/5: 26%|██▋ | 50/189 [01:18<03:59, 1.72s/it, loss=7.65]Epoch 5/5: 26%|██▋ | 50/189 [01:20<03:59, 1.72s/it, loss=7.83]Epoch 5/5: 27%|██▋ | 51/189 [01:20<03:55, 1.70s/it, loss=7.83]Epoch 5/5: 27%|██▋ | 51/189 [01:22<03:55, 1.70s/it, loss=7.81]Epoch 5/5: 28%|██▊ | 52/189 [01:22<03:57, 1.73s/it, loss=7.81]Epoch 5/5: 28%|██▊ | 52/189 [01:23<03:57, 1.73s/it, loss=7.69]Epoch 5/5: 28%|██▊ | 53/189 [01:23<03:51, 1.70s/it, loss=7.69]Epoch 5/5: 28%|██▊ | 53/189 [01:25<03:51, 1.70s/it, loss=7.81]Epoch 5/5: 29%|██▊ | 54/189 [01:25<03:51, 1.71s/it, loss=7.81]Epoch 5/5: 29%|██▊ | 54/189 [01:27<03:51, 1.71s/it, loss=7.74]Epoch 5/5: 29%|██▉ | 55/189 [01:27<03:44, 1.67s/it, loss=7.74]Epoch 5/5: 29%|██▉ | 55/189 [01:28<03:44, 1.67s/it, loss=7.67]Epoch 5/5: 30%|██▉ | 56/189 [01:28<03:37, 1.63s/it, loss=7.67]Epoch 5/5: 30%|██▉ | 56/189 [01:30<03:37, 1.63s/it, loss=7.84]Epoch 5/5: 30%|███ | 57/189 [01:30<03:38, 1.65s/it, loss=7.84]Epoch 5/5: 30%|███ | 57/189 [01:32<03:38, 1.65s/it, loss=7.85]Epoch 5/5: 31%|███ | 58/189 [01:32<03:37, 1.66s/it, loss=7.85]Epoch 5/5: 31%|███ | 58/189 [01:33<03:37, 1.66s/it, loss=7.93]Epoch 5/5: 31%|███ | 59/189 [01:33<03:33, 1.65s/it, loss=7.93]Epoch 5/5: 31%|███ | 59/189 [01:35<03:33, 1.65s/it, loss=7.71]Epoch 5/5: 32%|███▏ | 60/189 [01:35<03:30, 1.63s/it, loss=7.71]Epoch 5/5: 32%|███▏ | 60/189 [01:36<03:30, 1.63s/it, loss=7.58]Epoch 5/5: 32%|███▏ | 61/189 [01:36<03:26, 1.61s/it, loss=7.58]Epoch 5/5: 32%|███▏ | 61/189 [01:38<03:26, 1.61s/it, loss=7.78]Epoch 5/5: 33%|███▎ | 62/189 [01:38<03:26, 1.63s/it, loss=7.78]Epoch 5/5: 33%|███▎ | 62/189 [01:40<03:26, 1.63s/it, loss=7.76]Epoch 5/5: 33%|███▎ | 63/189 [01:40<03:22, 1.61s/it, loss=7.76]Epoch 5/5: 33%|███▎ | 63/189 [01:41<03:22, 1.61s/it, loss=7.74]Epoch 5/5: 34%|███▍ | 64/189 [01:41<03:23, 1.63s/it, loss=7.74]Epoch 5/5: 34%|███▍ | 64/189 [01:43<03:23, 1.63s/it, loss=7.81]Epoch 5/5: 34%|███▍ | 65/189 [01:43<03:24, 1.65s/it, loss=7.81]Epoch 5/5: 34%|███▍ | 65/189 [01:45<03:24, 1.65s/it, loss=7.62]Epoch 5/5: 35%|███▍ | 66/189 [01:45<03:22, 1.64s/it, loss=7.62]Epoch 5/5: 35%|███▍ | 66/189 [01:46<03:22, 1.64s/it, loss=7.59]Epoch 5/5: 35%|███▌ | 67/189 [01:46<03:18, 1.63s/it, loss=7.59]Epoch 5/5: 35%|███▌ | 67/189 [01:48<03:18, 1.63s/it, loss=7.66]Epoch 5/5: 36%|███▌ | 68/189 [01:48<03:17, 1.63s/it, loss=7.66]Epoch 5/5: 36%|███▌ | 68/189 [01:49<03:17, 1.63s/it, loss=7.72]Epoch 5/5: 37%|███▋ | 69/189 [01:49<03:13, 1.61s/it, loss=7.72]Epoch 5/5: 37%|███▋ | 69/189 [01:51<03:13, 1.61s/it, loss=7.85]Epoch 5/5: 37%|███▋ | 70/189 [01:51<03:14, 1.64s/it, loss=7.85]Epoch 5/5: 37%|███▋ | 70/189 [01:53<03:14, 1.64s/it, loss=7.71]Epoch 5/5: 38%|███▊ | 71/189 [01:53<03:13, 1.64s/it, loss=7.71]Epoch 5/5: 38%|███▊ | 71/189 [01:54<03:13, 1.64s/it, loss=7.71]Epoch 5/5: 38%|███▊ | 72/189 [01:54<03:07, 1.61s/it, loss=7.71]Epoch 5/5: 38%|███▊ | 72/189 [01:56<03:07, 1.61s/it, loss=7.80]Epoch 5/5: 39%|███▊ | 73/189 [01:56<03:09, 1.64s/it, loss=7.80]Epoch 5/5: 39%|███▊ | 73/189 [01:58<03:09, 1.64s/it, loss=7.79]Epoch 5/5: 39%|███▉ | 74/189 [01:58<03:05, 1.62s/it, loss=7.79]Epoch 5/5: 39%|███▉ | 74/189 [01:59<03:05, 1.62s/it, loss=7.75]Epoch 5/5: 40%|███▉ | 75/189 [01:59<03:03, 1.61s/it, loss=7.75]Epoch 5/5: 40%|███▉ | 75/189 [02:01<03:03, 1.61s/it, loss=7.79]Epoch 5/5: 40%|████ | 76/189 [02:01<03:04, 1.63s/it, loss=7.79]Epoch 5/5: 40%|████ | 76/189 [02:02<03:04, 1.63s/it, loss=7.69]Epoch 5/5: 41%|████ | 77/189 [02:02<02:57, 1.59s/it, loss=7.69]Epoch 5/5: 41%|████ | 77/189 [02:04<02:57, 1.59s/it, loss=7.83]Epoch 5/5: 41%|████▏ | 78/189 [02:04<02:58, 1.60s/it, loss=7.83]Epoch 5/5: 41%|████▏ | 78/189 [02:06<02:58, 1.60s/it, loss=7.76]Epoch 5/5: 42%|████▏ | 79/189 [02:06<03:02, 1.66s/it, loss=7.76]Epoch 5/5: 42%|████▏ | 79/189 [02:08<03:02, 1.66s/it, loss=7.67]Epoch 5/5: 42%|████▏ | 80/189 [02:08<03:03, 1.68s/it, loss=7.67]Epoch 5/5: 42%|████▏ | 80/189 [02:09<03:03, 1.68s/it, loss=7.84]Epoch 5/5: 43%|████▎ | 81/189 [02:09<03:01, 1.68s/it, loss=7.84]Epoch 5/5: 43%|████▎ | 81/189 [02:11<03:01, 1.68s/it, loss=8.03]Epoch 5/5: 43%|████▎ | 82/189 [02:11<03:02, 1.70s/it, loss=8.03]Epoch 5/5: 43%|████▎ | 82/189 [02:13<03:02, 1.70s/it, loss=7.73]Epoch 5/5: 44%|████▍ | 83/189 [02:13<02:54, 1.64s/it, loss=7.73]Epoch 5/5: 44%|████▍ | 83/189 [02:14<02:54, 1.64s/it, loss=7.82]Epoch 5/5: 44%|████▍ | 84/189 [02:14<02:53, 1.65s/it, loss=7.82]Epoch 5/5: 44%|████▍ | 84/189 [02:16<02:53, 1.65s/it, loss=7.69]Epoch 5/5: 45%|████▍ | 85/189 [02:16<02:49, 1.63s/it, loss=7.69]Epoch 5/5: 45%|████▍ | 85/189 [02:17<02:49, 1.63s/it, loss=7.72]Epoch 5/5: 46%|████▌ | 86/189 [02:17<02:44, 1.60s/it, loss=7.72]Epoch 5/5: 46%|████▌ | 86/189 [02:19<02:44, 1.60s/it, loss=7.68]Epoch 5/5: 46%|████▌ | 87/189 [02:19<02:41, 1.58s/it, loss=7.68]Epoch 5/5: 46%|████▌ | 87/189 [02:20<02:41, 1.58s/it, loss=7.72]Epoch 5/5: 47%|████▋ | 88/189 [02:20<02:41, 1.60s/it, loss=7.72]Epoch 5/5: 47%|████▋ | 88/189 [02:22<02:41, 1.60s/it, loss=7.60]Epoch 5/5: 47%|████▋ | 89/189 [02:22<02:40, 1.60s/it, loss=7.60]Epoch 5/5: 47%|████▋ | 89/189 [02:24<02:40, 1.60s/it, loss=7.75]Epoch 5/5: 48%|████▊ | 90/189 [02:24<02:42, 1.64s/it, loss=7.75]Epoch 5/5: 48%|████▊ | 90/189 [02:25<02:42, 1.64s/it, loss=7.84]Epoch 5/5: 48%|████▊ | 91/189 [02:25<02:39, 1.63s/it, loss=7.84]Epoch 5/5: 48%|████▊ | 91/189 [02:27<02:39, 1.63s/it, loss=7.83]Epoch 5/5: 49%|████▊ | 92/189 [02:27<02:28, 1.53s/it, loss=7.83]Epoch 5/5: 49%|████▊ | 92/189 [02:28<02:28, 1.53s/it, loss=7.78]Epoch 5/5: 49%|████▉ | 93/189 [02:28<02:30, 1.57s/it, loss=7.78]Epoch 5/5: 49%|████▉ | 93/189 [02:30<02:30, 1.57s/it, loss=7.77]Epoch 5/5: 50%|████▉ | 94/189 [02:30<02:33, 1.61s/it, loss=7.77]Epoch 5/5: 50%|████▉ | 94/189 [02:32<02:33, 1.61s/it, loss=7.85]Epoch 5/5: 50%|█████ | 95/189 [02:32<02:31, 1.62s/it, loss=7.85]Epoch 5/5: 50%|█████ | 95/189 [02:33<02:31, 1.62s/it, loss=7.67]Epoch 5/5: 51%|█████ | 96/189 [02:33<02:26, 1.58s/it, loss=7.67]Epoch 5/5: 51%|█████ | 96/189 [02:35<02:26, 1.58s/it, loss=7.77]Epoch 5/5: 51%|█████▏ | 97/189 [02:35<02:21, 1.54s/it, loss=7.77]Epoch 5/5: 51%|█████▏ | 97/189 [02:36<02:21, 1.54s/it, loss=7.68]Epoch 5/5: 52%|█████▏ | 98/189 [02:36<02:23, 1.58s/it, loss=7.68]Epoch 5/5: 52%|█████▏ | 98/189 [02:38<02:23, 1.58s/it, loss=7.81]Epoch 5/5: 52%|█████▏ | 99/189 [02:38<02:25, 1.61s/it, loss=7.81]Epoch 5/5: 52%|█████▏ | 99/189 [02:40<02:25, 1.61s/it, loss=7.78]Epoch 5/5: 53%|█████▎ | 100/189 [02:40<02:25, 1.63s/it, loss=7.78]Epoch 5/5: 53%|█████▎ | 100/189 [02:41<02:25, 1.63s/it, loss=7.66]Epoch 5/5: 53%|█████▎ | 101/189 [02:41<02:24, 1.64s/it, loss=7.66]Epoch 5/5: 53%|█████▎ | 101/189 [02:43<02:24, 1.64s/it, loss=7.66]Epoch 5/5: 54%|█████▍ | 102/189 [02:43<02:21, 1.62s/it, loss=7.66]Epoch 5/5: 54%|█████▍ | 102/189 [02:45<02:21, 1.62s/it, loss=7.67]Epoch 5/5: 54%|█████▍ | 103/189 [02:45<02:22, 1.66s/it, loss=7.67]Epoch 5/5: 54%|█████▍ | 103/189 [02:46<02:22, 1.66s/it, loss=7.71]Epoch 5/5: 55%|█████▌ | 104/189 [02:46<02:18, 1.62s/it, loss=7.71]Epoch 5/5: 55%|█████▌ | 104/189 [02:48<02:18, 1.62s/it, loss=7.64]Epoch 5/5: 56%|█████▌ | 105/189 [02:48<02:18, 1.65s/it, loss=7.64]Epoch 5/5: 56%|█████▌ | 105/189 [02:50<02:18, 1.65s/it, loss=7.69]Epoch 5/5: 56%|█████▌ | 106/189 [02:50<02:16, 1.64s/it, loss=7.69]Epoch 5/5: 56%|█████▌ | 106/189 [02:51<02:16, 1.64s/it, loss=7.73]Epoch 5/5: 57%|█████▋ | 107/189 [02:51<02:13, 1.63s/it, loss=7.73]Epoch 5/5: 57%|█████▋ | 107/189 [02:53<02:13, 1.63s/it, loss=7.70]Epoch 5/5: 57%|█████▋ | 108/189 [02:53<02:08, 1.58s/it, loss=7.70]Epoch 5/5: 57%|█████▋ | 108/189 [02:54<02:08, 1.58s/it, loss=7.76]Epoch 5/5: 58%|█████▊ | 109/189 [02:54<02:04, 1.55s/it, loss=7.76]Epoch 5/5: 58%|█████▊ | 109/189 [02:56<02:04, 1.55s/it, loss=7.62]Epoch 5/5: 58%|█████▊ | 110/189 [02:56<02:02, 1.56s/it, loss=7.62]Epoch 5/5: 58%|█████▊ | 110/189 [02:57<02:02, 1.56s/it, loss=7.79]Epoch 5/5: 59%|█████▊ | 111/189 [02:57<02:00, 1.54s/it, loss=7.79]Epoch 5/5: 59%|█████▊ | 111/189 [02:59<02:00, 1.54s/it, loss=7.76]Epoch 5/5: 59%|█████▉ | 112/189 [02:59<01:58, 1.54s/it, loss=7.76]Epoch 5/5: 59%|█████▉ | 112/189 [03:00<01:58, 1.54s/it, loss=7.63]Epoch 5/5: 60%|█████▉ | 113/189 [03:00<01:58, 1.56s/it, loss=7.63]Epoch 5/5: 60%|█████▉ | 113/189 [03:02<01:58, 1.56s/it, loss=7.92]Epoch 5/5: 60%|██████ | 114/189 [03:02<01:54, 1.52s/it, loss=7.92]Epoch 5/5: 60%|██████ | 114/189 [03:03<01:54, 1.52s/it, loss=7.59]Epoch 5/5: 61%|██████ | 115/189 [03:03<01:52, 1.52s/it, loss=7.59]Epoch 5/5: 61%|██████ | 115/189 [03:05<01:52, 1.52s/it, loss=7.76]Epoch 5/5: 61%|██████▏ | 116/189 [03:05<01:52, 1.54s/it, loss=7.76]Epoch 5/5: 61%|██████▏ | 116/189 [03:06<01:52, 1.54s/it, loss=7.61]Epoch 5/5: 62%|██████▏ | 117/189 [03:06<01:50, 1.53s/it, loss=7.61]Epoch 5/5: 62%|██████▏ | 117/189 [03:08<01:50, 1.53s/it, loss=7.73]Epoch 5/5: 62%|██████▏ | 118/189 [03:08<01:46, 1.50s/it, loss=7.73]Epoch 5/5: 62%|██████▏ | 118/189 [03:09<01:46, 1.50s/it, loss=7.88]Epoch 5/5: 63%|██████▎ | 119/189 [03:09<01:47, 1.54s/it, loss=7.88]Epoch 5/5: 63%|██████▎ | 119/189 [03:11<01:47, 1.54s/it, loss=7.71]Epoch 5/5: 63%|██████▎ | 120/189 [03:11<01:43, 1.50s/it, loss=7.71]Epoch 5/5: 63%|██████▎ | 120/189 [03:12<01:43, 1.50s/it, loss=7.70]Epoch 5/5: 64%|██████▍ | 121/189 [03:12<01:44, 1.54s/it, loss=7.70]Epoch 5/5: 64%|██████▍ | 121/189 [03:14<01:44, 1.54s/it, loss=7.70]Epoch 5/5: 65%|██████▍ | 122/189 [03:14<01:43, 1.55s/it, loss=7.70]Epoch 5/5: 65%|██████▍ | 122/189 [03:16<01:43, 1.55s/it, loss=7.76]Epoch 5/5: 65%|██████▌ | 123/189 [03:16<01:45, 1.60s/it, loss=7.76]Epoch 5/5: 65%|██████▌ | 123/189 [03:17<01:45, 1.60s/it, loss=7.71]Epoch 5/5: 66%|██████▌ | 124/189 [03:17<01:45, 1.62s/it, loss=7.71]Epoch 5/5: 66%|██████▌ | 124/189 [03:19<01:45, 1.62s/it, loss=7.87]Epoch 5/5: 66%|██████▌ | 125/189 [03:19<01:41, 1.59s/it, loss=7.87]Epoch 5/5: 66%|██████▌ | 125/189 [03:21<01:41, 1.59s/it, loss=7.70]Epoch 5/5: 67%|██████▋ | 126/189 [03:21<01:42, 1.62s/it, loss=7.70]Epoch 5/5: 67%|██████▋ | 126/189 [03:22<01:42, 1.62s/it, loss=7.93]Epoch 5/5: 67%|██████▋ | 127/189 [03:22<01:41, 1.63s/it, loss=7.93]Epoch 5/5: 67%|██████▋ | 127/189 [03:24<01:41, 1.63s/it, loss=7.88]Epoch 5/5: 68%|██████▊ | 128/189 [03:24<01:40, 1.65s/it, loss=7.88]Epoch 5/5: 68%|██████▊ | 128/189 [03:26<01:40, 1.65s/it, loss=7.75]Epoch 5/5: 68%|██████▊ | 129/189 [03:26<01:39, 1.65s/it, loss=7.75]Epoch 5/5: 68%|██████▊ | 129/189 [03:27<01:39, 1.65s/it, loss=7.55]Epoch 5/5: 69%|██████▉ | 130/189 [03:27<01:36, 1.63s/it, loss=7.55]Epoch 5/5: 69%|██████▉ | 130/189 [03:29<01:36, 1.63s/it, loss=7.70]Epoch 5/5: 69%|██████▉ | 131/189 [03:29<01:34, 1.62s/it, loss=7.70]Epoch 5/5: 69%|██████▉ | 131/189 [03:30<01:34, 1.62s/it, loss=7.82]Epoch 5/5: 70%|██████▉ | 132/189 [03:30<01:31, 1.60s/it, loss=7.82]Epoch 5/5: 70%|██████▉ | 132/189 [03:32<01:31, 1.60s/it, loss=7.72]Epoch 5/5: 70%|███████ | 133/189 [03:32<01:27, 1.56s/it, loss=7.72]Epoch 5/5: 70%|███████ | 133/189 [03:34<01:27, 1.56s/it, loss=7.75]Epoch 5/5: 71%|███████ | 134/189 [03:34<01:29, 1.62s/it, loss=7.75]Epoch 5/5: 71%|███████ | 134/189 [03:35<01:29, 1.62s/it, loss=7.70]Epoch 5/5: 71%|███████▏ | 135/189 [03:35<01:29, 1.65s/it, loss=7.70]Epoch 5/5: 71%|███████▏ | 135/189 [03:37<01:29, 1.65s/it, loss=7.73]Epoch 5/5: 72%|███████▏ | 136/189 [03:37<01:26, 1.63s/it, loss=7.73]Epoch 5/5: 72%|███████▏ | 136/189 [03:39<01:26, 1.63s/it, loss=7.61]Epoch 5/5: 72%|███████▏ | 137/189 [03:39<01:25, 1.64s/it, loss=7.61]Epoch 5/5: 72%|███████▏ | 137/189 [03:40<01:25, 1.64s/it, loss=7.61]Epoch 5/5: 73%|███████▎ | 138/189 [03:40<01:23, 1.64s/it, loss=7.61]Epoch 5/5: 73%|███████▎ | 138/189 [03:42<01:23, 1.64s/it, loss=7.74]Epoch 5/5: 74%|███████▎ | 139/189 [03:42<01:20, 1.61s/it, loss=7.74]Epoch 5/5: 74%|███████▎ | 139/189 [03:43<01:20, 1.61s/it, loss=7.77]Epoch 5/5: 74%|███████▍ | 140/189 [03:43<01:18, 1.60s/it, loss=7.77]Epoch 5/5: 74%|███████▍ | 140/189 [03:45<01:18, 1.60s/it, loss=7.84]Epoch 5/5: 75%|███████▍ | 141/189 [03:45<01:18, 1.64s/it, loss=7.84]Epoch 5/5: 75%|███████▍ | 141/189 [03:47<01:18, 1.64s/it, loss=7.75]Epoch 5/5: 75%|███████▌ | 142/189 [03:47<01:16, 1.63s/it, loss=7.75]Epoch 5/5: 75%|███████▌ | 142/189 [03:48<01:16, 1.63s/it, loss=7.62]Epoch 5/5: 76%|███████▌ | 143/189 [03:48<01:14, 1.61s/it, loss=7.62]Epoch 5/5: 76%|███████▌ | 143/189 [03:50<01:14, 1.61s/it, loss=7.75]Epoch 5/5: 76%|███████▌ | 144/189 [03:50<01:11, 1.59s/it, loss=7.75]Epoch 5/5: 76%|███████▌ | 144/189 [03:51<01:11, 1.59s/it, loss=7.64]Epoch 5/5: 77%|███████▋ | 145/189 [03:51<01:10, 1.60s/it, loss=7.64]Epoch 5/5: 77%|███████▋ | 145/189 [03:53<01:10, 1.60s/it, loss=7.44]Epoch 5/5: 77%|███████▋ | 146/189 [03:53<01:08, 1.59s/it, loss=7.44]Epoch 5/5: 77%|███████▋ | 146/189 [03:55<01:08, 1.59s/it, loss=7.66]Epoch 5/5: 78%|███████▊ | 147/189 [03:55<01:06, 1.59s/it, loss=7.66]Epoch 5/5: 78%|███████▊ | 147/189 [03:56<01:06, 1.59s/it, loss=7.79]Epoch 5/5: 78%|███████▊ | 148/189 [03:56<01:06, 1.63s/it, loss=7.79]Epoch 5/5: 78%|███████▊ | 148/189 [03:58<01:06, 1.63s/it, loss=7.79]Epoch 5/5: 79%|███████▉ | 149/189 [03:58<01:05, 1.65s/it, loss=7.79]Epoch 5/5: 79%|███████▉ | 149/189 [04:00<01:05, 1.65s/it, loss=7.73]Epoch 5/5: 79%|███████▉ | 150/189 [04:00<01:04, 1.66s/it, loss=7.73]Epoch 5/5: 79%|███████▉ | 150/189 [04:01<01:04, 1.66s/it, loss=7.68]Epoch 5/5: 80%|███████▉ | 151/189 [04:01<01:03, 1.66s/it, loss=7.68]Epoch 5/5: 80%|███████▉ | 151/189 [04:03<01:03, 1.66s/it, loss=7.79]Epoch 5/5: 80%|████████ | 152/189 [04:03<01:01, 1.67s/it, loss=7.79]Epoch 5/5: 80%|████████ | 152/189 [04:05<01:01, 1.67s/it, loss=7.63]Epoch 5/5: 81%|████████ | 153/189 [04:05<01:00, 1.68s/it, loss=7.63]Epoch 5/5: 81%|████████ | 153/189 [04:06<01:00, 1.68s/it, loss=7.74]Epoch 5/5: 81%|████████▏ | 154/189 [04:06<00:58, 1.67s/it, loss=7.74]Epoch 5/5: 81%|████████▏ | 154/189 [04:08<00:58, 1.67s/it, loss=7.77]Epoch 5/5: 82%|████████▏ | 155/189 [04:08<00:57, 1.70s/it, loss=7.77]Epoch 5/5: 82%|████████▏ | 155/189 [04:10<00:57, 1.70s/it, loss=7.56]Epoch 5/5: 83%|████████▎ | 156/189 [04:10<00:55, 1.68s/it, loss=7.56]Epoch 5/5: 83%|████████▎ | 156/189 [04:11<00:55, 1.68s/it, loss=7.86]Epoch 5/5: 83%|████████▎ | 157/189 [04:11<00:51, 1.62s/it, loss=7.86]Epoch 5/5: 83%|████████▎ | 157/189 [04:13<00:51, 1.62s/it, loss=7.72]Epoch 5/5: 84%|████████▎ | 158/189 [04:13<00:50, 1.63s/it, loss=7.72]Epoch 5/5: 84%|████████▎ | 158/189 [04:14<00:50, 1.63s/it, loss=7.66]Epoch 5/5: 84%|████████▍ | 159/189 [04:14<00:48, 1.62s/it, loss=7.66]Epoch 5/5: 84%|████████▍ | 159/189 [04:16<00:48, 1.62s/it, loss=7.76]Epoch 5/5: 85%|████████▍ | 160/189 [04:16<00:45, 1.58s/it, loss=7.76]Epoch 5/5: 85%|████████▍ | 160/189 [04:18<00:45, 1.58s/it, loss=7.86]Epoch 5/5: 85%|████████▌ | 161/189 [04:18<00:44, 1.61s/it, loss=7.86]Epoch 5/5: 85%|████████▌ | 161/189 [04:19<00:44, 1.61s/it, loss=7.72]Epoch 5/5: 86%|████████▌ | 162/189 [04:19<00:41, 1.55s/it, loss=7.72]Epoch 5/5: 86%|████████▌ | 162/189 [04:21<00:41, 1.55s/it, loss=7.62]Epoch 5/5: 86%|████████▌ | 163/189 [04:21<00:40, 1.55s/it, loss=7.62]Epoch 5/5: 86%|████████▌ | 163/189 [04:22<00:40, 1.55s/it, loss=7.76]Epoch 5/5: 87%|████████▋ | 164/189 [04:22<00:39, 1.57s/it, loss=7.76]Epoch 5/5: 87%|████████▋ | 164/189 [04:24<00:39, 1.57s/it, loss=7.89]Epoch 5/5: 87%|████████▋ | 165/189 [04:24<00:37, 1.55s/it, loss=7.89]Epoch 5/5: 87%|████████▋ | 165/189 [04:25<00:37, 1.55s/it, loss=7.70]Epoch 5/5: 88%|████████▊ | 166/189 [04:25<00:36, 1.57s/it, loss=7.70]Epoch 5/5: 88%|████████▊ | 166/189 [04:27<00:36, 1.57s/it, loss=7.61]Epoch 5/5: 88%|████████▊ | 167/189 [04:27<00:34, 1.56s/it, loss=7.61]Epoch 5/5: 88%|████████▊ | 167/189 [04:29<00:34, 1.56s/it, loss=7.61]Epoch 5/5: 89%|████████▉ | 168/189 [04:29<00:33, 1.60s/it, loss=7.61]Epoch 5/5: 89%|████████▉ | 168/189 [04:30<00:33, 1.60s/it, loss=7.61]Epoch 5/5: 89%|████████▉ | 169/189 [04:30<00:32, 1.63s/it, loss=7.61]Epoch 5/5: 89%|████████▉ | 169/189 [04:32<00:32, 1.63s/it, loss=7.71]Epoch 5/5: 90%|████████▉ | 170/189 [04:32<00:30, 1.63s/it, loss=7.71]Epoch 5/5: 90%|████████▉ | 170/189 [04:34<00:30, 1.63s/it, loss=7.57]Epoch 5/5: 90%|█████████ | 171/189 [04:34<00:29, 1.66s/it, loss=7.57]Epoch 5/5: 90%|█████████ | 171/189 [04:35<00:29, 1.66s/it, loss=7.71]Epoch 5/5: 91%|█████████ | 172/189 [04:35<00:27, 1.64s/it, loss=7.71]Epoch 5/5: 91%|█████████ | 172/189 [04:37<00:27, 1.64s/it, loss=7.65]Epoch 5/5: 92%|█████████▏| 173/189 [04:37<00:26, 1.65s/it, loss=7.65]Epoch 5/5: 92%|█████████▏| 173/189 [04:39<00:26, 1.65s/it, loss=7.61]Epoch 5/5: 92%|█████████▏| 174/189 [04:39<00:24, 1.64s/it, loss=7.61]Epoch 5/5: 92%|█████████▏| 174/189 [04:40<00:24, 1.64s/it, loss=7.82]Epoch 5/5: 93%|█████████▎| 175/189 [04:40<00:22, 1.57s/it, loss=7.82]Epoch 5/5: 93%|█████████▎| 175/189 [04:41<00:22, 1.57s/it, loss=7.57]Epoch 5/5: 93%|█████████▎| 176/189 [04:41<00:20, 1.56s/it, loss=7.57]Epoch 5/5: 93%|█████████▎| 176/189 [04:43<00:20, 1.56s/it, loss=7.75]Epoch 5/5: 94%|█████████▎| 177/189 [04:43<00:19, 1.59s/it, loss=7.75]Epoch 5/5: 94%|█████████▎| 177/189 [04:45<00:19, 1.59s/it, loss=7.71]Epoch 5/5: 94%|█████████▍| 178/189 [04:45<00:17, 1.60s/it, loss=7.71]Epoch 5/5: 94%|█████████▍| 178/189 [04:47<00:17, 1.60s/it, loss=7.75]Epoch 5/5: 95%|█████████▍| 179/189 [04:47<00:16, 1.65s/it, loss=7.75]Epoch 5/5: 95%|█████████▍| 179/189 [04:48<00:16, 1.65s/it, loss=7.66]Epoch 5/5: 95%|█████████▌| 180/189 [04:48<00:14, 1.66s/it, loss=7.66]Epoch 5/5: 95%|█████████▌| 180/189 [04:50<00:14, 1.66s/it, loss=7.64]Epoch 5/5: 96%|█████████▌| 181/189 [04:50<00:12, 1.59s/it, loss=7.64]Epoch 5/5: 96%|█████████▌| 181/189 [04:51<00:12, 1.59s/it, loss=7.60]Epoch 5/5: 96%|█████████▋| 182/189 [04:51<00:11, 1.60s/it, loss=7.60]Epoch 5/5: 96%|█████████▋| 182/189 [04:53<00:11, 1.60s/it, loss=7.72]Epoch 5/5: 97%|█████████▋| 183/189 [04:53<00:09, 1.58s/it, loss=7.72]Epoch 5/5: 97%|█████████▋| 183/189 [04:54<00:09, 1.58s/it, loss=7.76]Epoch 5/5: 97%|█████████▋| 184/189 [04:54<00:07, 1.56s/it, loss=7.76]Epoch 5/5: 97%|█████████▋| 184/189 [04:56<00:07, 1.56s/it, loss=7.67]Epoch 5/5: 98%|█████████▊| 185/189 [04:56<00:06, 1.60s/it, loss=7.67]Epoch 5/5: 98%|█████████▊| 185/189 [04:58<00:06, 1.60s/it, loss=7.57]Epoch 5/5: 98%|█████████▊| 186/189 [04:58<00:04, 1.61s/it, loss=7.57]Epoch 5/5: 98%|█████████▊| 186/189 [04:59<00:04, 1.61s/it, loss=7.77]Epoch 5/5: 99%|█████████▉| 187/189 [04:59<00:03, 1.62s/it, loss=7.77]Epoch 5/5: 99%|█████████▉| 187/189 [05:01<00:03, 1.62s/it, loss=7.66]Epoch 5/5: 99%|█████████▉| 188/189 [05:01<00:01, 1.58s/it, loss=7.66]Epoch 5/5: 99%|█████████▉| 188/189 [05:02<00:01, 1.58s/it, loss=7.65]Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00, 1.53s/it, loss=7.65]Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00, 1.60s/it, loss=7.65]
0%| | 0/23 [00:00<?, ?it/s] 4%|▍ | 1/23 [00:00<00:06, 3.64it/s] 9%|▊ | 2/23 [00:00<00:06, 3.30it/s] 13%|█▎ | 3/23 [00:00<00:06, 3.02it/s] 17%|█▋ | 4/23 [00:01<00:06, 3.02it/s] 22%|██▏ | 5/23 [00:01<00:05, 3.16it/s] 26%|██▌ | 6/23 [00:01<00:05, 2.89it/s] 30%|███ | 7/23 [00:02<00:05, 3.01it/s] 35%|███▍ | 8/23 [00:02<00:05, 2.90it/s] 39%|███▉ | 9/23 [00:03<00:05, 2.70it/s] 43%|████▎ | 10/23 [00:03<00:04, 2.83it/s] 48%|████▊ | 11/23 [00:03<00:04, 2.82it/s] 52%|█████▏ | 12/23 [00:04<00:03, 2.88it/s] 57%|█████▋ | 13/23 [00:04<00:03, 3.19it/s] 61%|██████ | 14/23 [00:04<00:02, 3.18it/s] 65%|██████▌ | 15/23 [00:05<00:02, 2.98it/s] 70%|██████▉ | 16/23 [00:05<00:02, 3.16it/s] 74%|███████▍ | 17/23 [00:05<00:01, 3.10it/s] 78%|███████▊ | 18/23 [00:05<00:01, 3.07it/s] 83%|████████▎ | 19/23 [00:06<00:01, 3.23it/s] 87%|████████▋ | 20/23 [00:06<00:00, 3.15it/s] 91%|█████████▏| 21/23 [00:06<00:00, 3.09it/s] 96%|█████████▌| 22/23 [00:07<00:00, 3.16it/s]100%|██████████| 23/23 [00:07<00:00, 3.06it/s]100%|██████████| 23/23 [00:07<00:00, 3.04it/s]
Epoch 5: train_loss=7.7534 | R@10=0.0242 | DCG@10=0.2592 | NDCG@10=0.0612
{'recall@5': 0.01575394982720289,
'recall@10': 0.0241689173872425,
'recall@20': 0.04311617966492117,
'precision@5': 0.05770308123249291,
'precision@10': 0.05546218487394955,
'precision@20': 0.05042016806722694,
'dcg@5': 0.17425028015585506,
'dcg@10': 0.2591816378073866,
'dcg@20': 0.37284562258827253,
'ndcg@5': 0.06126104414152299,
'ndcg@10': 0.061165418642456525,
'ndcg@20': 0.06258607128889172,
'mrr': 0.13528584705055305}
The model shows promising results—metrics are improving across epochs and beginning to converge. I’d like to scale up with larger embedding dimensions and more training epochs, but I’m limited by compute in this environment.
One challenge I’ve encountered is the difficulty of fairly evaluating recommendation models. There’s significant variation in how metrics are calculated across papers and implementations, making it hard to compare results directly. I plan to dig deeper into this topic in a future post.
Remove the rec repo so that the notebook runs end to end on restart, and unneccesary files are removed from the blog.
This post presents a GPTRec implementation using my rec framework. The key contributions are:
rec framework’s capabilitiesThe model shows reasonable performance, validating that the architecture is implemented correctly.
Looking ahead, I’d like to integrate sequential models into the rec framework. The framework currently supports a Retrieval → Ranking pipeline via the train_all script. I’m considering two approaches:
Three-stage pipeline (Retrieval → Sequential → Ranking): The ranking model would either become a hybrid combining traditional ranking with sequential signals, or incorporate the sequential model’s logits into the ranking embeddings.
Sequential ranking: Replace the ranking stage with a sequential model that also leverages user/item features. This aligns with industry trends—see Meta’s recent work on sequence learning for personalized recommendations.
Finally, a note on tooling: I found solveit’s compute limitations frustrating for this implementation-heavy post, requiring many restarts. For future implementation work, I’ll likely develop locally and reserve solveit for paper reviews and lighter research tasks.