Example 3: Groupwise Attention Scoring Function¶

Architecture¶

The architecture of the GroupwiseMultiHeadAttentionNetwork is more complex than the others. Similar to the GSF model, it accepts inputs produced by a sampler. The input is then run through a multi-head self attention network as first described by Vaswani et al. [Vaswani2017]. The attention network outputs scores of the same structure as the GSF model, which is reshaped and aggregated in the same way. It is out of the scope of this tutorial to explain the architecture and inner workings of the attention model.

The implementation is based on the Transformer Model for Language Understanding tutorial on the TensorFlow website. There are also many good tutorials on the web that explain the details step by step, such as this one and this one.

Overview¶

The steps for this model are the same as the GSF example.

Load the initial dataset.
Apply the same preprocessing steps as in the first example.
For each query, generate M samples that contain a subset documents of size G.
Configure the neural network model.
Train the model.
Evaluate the model.

Load The Data¶

Loading the data can be accomplished by following the steps outlined in the Load The Data page.

Preprocessing¶

The preprocessing steps are the same as in the MLP and GSF examples.

Sample Documents¶

Sampling is the same as in the GSF example.

Set Up The Model¶

The scoring function for this example is implemented in the GroupwiseMultiHeadAttentionNetwork class. This model accepts 10 parameters as defined in the ModelParameter class and is setup in the same way as the others.

n_layers
This specifies the number of attention layers to use in the self attention mechanism.
n_model
This is the base number of network units used for various parts of the model. This includes the final layer of the feed forward sub network as part of the encoding layers and to determine the size of the weight matrices for the key, query and values as part of the attention mechanism. Note that this value has to be exactly divisible by the number of heads (n_heads).
n_feed_forward_units
This is the number of hidden units in a feed forward sub network.
n_heads
The number of heads to split into. Note n_model must be an exact multiple of this value.
use_layer_norm
Optionally disable using layer normalization.
n_features
The number of (dense sequential) features.
group_size
The number of documents in a group.
use_average
Ai et al. aggregate the scores of each document across samples by summing their values. However, my sampling strategy does not guarantee that each document is seen with equal frequency and so averaging the scores over the number of times a document is actually seen usually produces better results.
share_weights
Setting this to true will first run each individual document through a shared hidden layer before concatenating document features within the same group.
dropout_rate
An optional float or list of floats specifying the dropout rate for each layer. If it is a single value then the dropout rate will be the same across all layers. If it is a list it must be the same length as n_units. If it is None then dropout will not be used.
random_seed
An optional integer that can be used to seed random number generation.

Train The Model¶

Training is performed exactly as in the GSF example.

Test The Model¶

Testing is also performed exactly as in the MLP and GSF example.

Running The Script¶

A script to train and evaluate the GroupwiseMultiHeadAttentionNetwork can be found in the examples package here.

This model is much more sensitive to the hyper-parameters and does not appear to be as effective as the others.

usage: examples.attn.mltr30k.py [-h] --train-file TRAIN_FILE --valid-file
                                VALID_FILE --test-file TEST_FILE
                                --checkpoint-dir CHECKPOINT_DIR
                                [--run-eagerly] [--scaler SCALER SCALER]
                                [--max-epochs MAX_EPOCHS]
                                [--optimizer {adagrad,adam,sgd,nesterov,rmsprop}]
                                [--learning-rate LEARNING_RATE]
                                [--loss {ndcg,softmax,cross_entropy,mse}]
                                [--list-size LIST_SIZE]
                                [--group-size GROUP_SIZE] [--sample-pre-batch]
                                [--multiples MULTIPLES]
                                [--training-batch-size TRAINING_BATCH_SIZE]
                                [--evaluation-batch-size EVALUATION_BATCH_SIZE]
                                [--use-average] [--share-weights]
                                [--n-model N_MODEL] [--use-layer-norm]
                                [--n-layers N_LAYERS] [--n-heads N_HEADS]
                                [--n-feed-forward-units N_FEED_FORWARD_UNITS]
                                [--dropout-rate DROPOUT_RATE]
                                [--drop-remainder] [--random-seed RANDOM_SEED]

Named Arguments¶

--train-file

The training tfrecords file.

--valid-file

The validation tfrecords file.

--test-file

The test tfrecords file.

--checkpoint-dir

The directory where model checkpoints will be saved.

--run-eagerly

Default: False

--scaler

This argument requires two parameters. The first is the path to a scaler file created with the build dataset script. The second is the name of the scaler to use. Choose one of: minmax, standard, robust, power.

--max-epochs

The maximum number of epochs before the training terminates no matter what.

Default: 500

--optimizer

Possible choices: adagrad, adam, sgd, nesterov, rmsprop

Default: “adagrad”

--learning-rate

Default: 0.001

--loss

Possible choices: ndcg, softmax, cross_entropy, mse

Default: “ndcg”

--list-size

The maximum number of documents per query or no maximum if not set.

--group-size

The group size to use.

Default: 16

--sample-pre-batch

If this flag is set then the alternate form of training will be performed where documents are sampled before training.

Default: False

--multiples

The sampling multiplier.

Default: 1

--training-batch-size

Default: 128

--evaluation-batch-size

Default: 128

--use-average

According to the paper, when a document is sampled more than once its scores are summed. When this option is set the scores are averaged over the number of times each document is seen instead.

Default: False

--share-weights

Apply each document through a shared dense layer before concatenating them.

Default: False

--n-model

Default: 128

--use-layer-norm

Default: False

--n-layers

Default: 4

--n-heads

Default: 8

--n-feed-forward-units

Default: 128

--dropout-rate

Default: 0.0

--drop-remainder

This is necessary when using the keras training/eval loops.

Default: False

--random-seed

The random seed to use for sampling query results.

python examples/attn/mltr30k.py                     \
    --train-file data/mltr30k/train.tfrecords.gz    \
    --valid-file data/mltr30k/valid.tfrecords.gz    \
    --test-file data/mltr30k/test.tfrecords.gz      \
    --scaler data/mltr30k/train.scalers.db power    \
    --max-epochs 100                                \
    --training-batch-size 4                         \
    --evaluation-batch-size 4                       \
    --multiples 2                                   \
    --group-size 16                                 \
    --optimizer adam                                \
    --learning-rate 0.0001                          \
    --n-layers 4                                    \
    --n-model 64                                    \
    --n-feed-forward-units 128                      \
    --n-heads 2                                     \
    --dropout-rate 0.1                              \
    --use-layer-norm                                \
    --use-average                                   \
    --checkpoint-dir data/mltr30k/models/attn

tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792875000 Hz
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5640b5484b50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7428 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5640b5b25610 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
Removing existing checkpoint directory: data/mltr30k/models/attn-mult5-drp0.4-m64-use_average=T-share_weights=T
2020-07-22 18:33:14.393783: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Model: "groupwise_multi_head_attention_network"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
encoder (Encoder)            multiple                  440608
_________________________________________________________________
dense_24 (Dense)             multiple                  34832
=================================================================
Total params: 475,440
Trainable params: 475,440
Non-trainable params: 0
_________________________________________________________________
epoch:     1 step:     4577 elapsed time:   198.59s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.6716 val/ndcg@01:     0.3990 val/ndcg@05:     0.3838 val/ndcg@10:     0.3867 *
epoch:     2 step:     9154 elapsed time:   385.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.7924 val/ndcg@01:     0.4094 val/ndcg@05:     0.3941 val/ndcg@10:     0.3983 *
epoch:     3 step:    13731 elapsed time:   571.72s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8115 val/ndcg@01:     0.4100 val/ndcg@05:     0.4026 val/ndcg@10:     0.4137 *
epoch:     4 step:    18308 elapsed time:   757.03s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8186 val/ndcg@01:     0.4091 val/ndcg@05:     0.3959 val/ndcg@10:     0.4040
epoch:     5 step:    22885 elapsed time:   943.77s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8279 val/ndcg@01:     0.4050 val/ndcg@05:     0.3948 val/ndcg@10:     0.3993
epoch:     6 step:    27462 elapsed time:  1130.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8259 val/ndcg@01:     0.4260 val/ndcg@05:     0.4185 val/ndcg@10:     0.4336 *
epoch:     7 step:    32039 elapsed time:  1315.31s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8346 val/ndcg@01:     0.4131 val/ndcg@05:     0.4041 val/ndcg@10:     0.4156
epoch:     8 step:    36616 elapsed time:  1501.99s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8346 val/ndcg@01:     0.4163 val/ndcg@05:     0.4116 val/ndcg@10:     0.4279
epoch:     9 step:    41193 elapsed time:  1688.24s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8360 val/ndcg@01:     0.4186 val/ndcg@05:     0.4198 val/ndcg@10:     0.4373 *
epoch:    10 step:    45770 elapsed time:  1874.92s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8392 val/ndcg@01:     0.4221 val/ndcg@05:     0.4186 val/ndcg@10:     0.4318
epoch:    11 step:    50347 elapsed time:  2061.17s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8475 val/ndcg@01:     0.4314 val/ndcg@05:     0.4152 val/ndcg@10:     0.4245
epoch:    12 step:    54924 elapsed time:  2247.55s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8424 val/ndcg@01:     0.4247 val/ndcg@05:     0.4182 val/ndcg@10:     0.4358
epoch:    13 step:    59501 elapsed time:  2434.37s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8430 val/ndcg@01:     0.4174 val/ndcg@05:     0.4208 val/ndcg@10:     0.4408 *
epoch:    14 step:    64078 elapsed time:  2620.02s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8422 val/ndcg@01:     0.4202 val/ndcg@05:     0.4071 val/ndcg@10:     0.4185
epoch:    15 step:    68655 elapsed time:  2806.09s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8417 val/ndcg@01:     0.4113 val/ndcg@05:     0.4067 val/ndcg@10:     0.4222
epoch:    16 step:    73232 elapsed time:  2992.29s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8489 val/ndcg@01:     0.4282 val/ndcg@05:     0.4238 val/ndcg@10:     0.4385 *
epoch:    17 step:    77809 elapsed time:  3178.73s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8450 val/ndcg@01:     0.4239 val/ndcg@05:     0.4216 val/ndcg@10:     0.4411
epoch:    18 step:    82386 elapsed time:  3364.77s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8521 val/ndcg@01:     0.4168 val/ndcg@05:     0.4216 val/ndcg@10:     0.4380
epoch:    19 step:    86963 elapsed time:  3551.37s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8490 val/ndcg@01:     0.4342 val/ndcg@05:     0.4259 val/ndcg@10:     0.4432 *
epoch:    20 step:    91540 elapsed time:  3737.84s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8520 val/ndcg@01:     0.4250 val/ndcg@05:     0.4202 val/ndcg@10:     0.4363
epoch:    21 step:    96117 elapsed time:  3924.04s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8510 val/ndcg@01:     0.4269 val/ndcg@05:     0.4178 val/ndcg@10:     0.4264
epoch:    22 step:   100694 elapsed time:  4110.23s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8483 val/ndcg@01:     0.4049 val/ndcg@05:     0.4090 val/ndcg@10:     0.4326
epoch:    23 step:   105271 elapsed time:  4296.78s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8526 val/ndcg@01:     0.4307 val/ndcg@05:     0.4204 val/ndcg@10:     0.4328
epoch:    24 step:   109848 elapsed time:  4482.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8504 val/ndcg@01:     0.4378 val/ndcg@05:     0.4281 val/ndcg@10:     0.4440 *
epoch:    25 step:   114425 elapsed time:  4669.89s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8542 val/ndcg@01:     0.4322 val/ndcg@05:     0.4284 val/ndcg@10:     0.4476 *
epoch:    26 step:   119002 elapsed time:  4856.38s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8559 val/ndcg@01:     0.4322 val/ndcg@05:     0.4246 val/ndcg@10:     0.4356
epoch:    27 step:   123579 elapsed time:  5042.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8509 val/ndcg@01:     0.4234 val/ndcg@05:     0.4242 val/ndcg@10:     0.4429
epoch:    28 step:   128156 elapsed time:  5228.48s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8524 val/ndcg@01:     0.4439 val/ndcg@05:     0.4337 val/ndcg@10:     0.4537 *
epoch:    29 step:   132733 elapsed time:  5414.89s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8562 val/ndcg@01:     0.4348 val/ndcg@05:     0.4298 val/ndcg@10:     0.4451
epoch:    30 step:   137310 elapsed time:  5601.20s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8529 val/ndcg@01:     0.4280 val/ndcg@05:     0.4293 val/ndcg@10:     0.4464
epoch:    31 step:   141887 elapsed time:  5787.70s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8553 val/ndcg@01:     0.4176 val/ndcg@05:     0.4224 val/ndcg@10:     0.4417
epoch:    32 step:   146464 elapsed time:  5974.22s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8547 val/ndcg@01:     0.4233 val/ndcg@05:     0.4232 val/ndcg@10:     0.4464
epoch:    33 step:   151041 elapsed time:  6160.91s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8580 val/ndcg@01:     0.4283 val/ndcg@05:     0.4260 val/ndcg@10:     0.4462
epoch:    34 step:   155618 elapsed time:  6347.56s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8535 val/ndcg@01:     0.4348 val/ndcg@05:     0.4297 val/ndcg@10:     0.4530
epoch:    35 step:   160195 elapsed time:  6534.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8552 val/ndcg@01:     0.4344 val/ndcg@05:     0.4266 val/ndcg@10:     0.4465
epoch:    36 step:   164772 elapsed time:  6720.32s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8582 val/ndcg@01:     0.4401 val/ndcg@05:     0.4323 val/ndcg@10:     0.4506
epoch:    37 step:   169349 elapsed time:  6906.39s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8549 val/ndcg@01:     0.4233 val/ndcg@05:     0.4208 val/ndcg@10:     0.4383
epoch:    38 step:   173926 elapsed time:  7093.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8527 val/ndcg@01:     0.4355 val/ndcg@05:     0.4327 val/ndcg@10:     0.4556
epoch:    39 step:   178503 elapsed time:  7280.01s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8565 val/ndcg@01:     0.4350 val/ndcg@05:     0.4323 val/ndcg@10:     0.4477
epoch:    40 step:   183080 elapsed time:  7466.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8580 val/ndcg@01:     0.4234 val/ndcg@05:     0.4221 val/ndcg@10:     0.4363
epoch:    41 step:   187657 elapsed time:  7652.91s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8597 val/ndcg@01:     0.4292 val/ndcg@05:     0.4255 val/ndcg@10:     0.4435
epoch:    42 step:   192234 elapsed time:  7839.61s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8591 val/ndcg@01:     0.4309 val/ndcg@05:     0.4276 val/ndcg@10:     0.4482
epoch:    43 step:   196811 elapsed time:  8025.63s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8583 val/ndcg@01:     0.4157 val/ndcg@05:     0.4036 val/ndcg@10:     0.4089
epoch:    44 step:   201388 elapsed time:  8212.23s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8576 val/ndcg@01:     0.4370 val/ndcg@05:     0.4287 val/ndcg@10:     0.4454
epoch:    45 step:   205965 elapsed time:  8398.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8546 val/ndcg@01:     0.4237 val/ndcg@05:     0.4229 val/ndcg@10:     0.4451
epoch:    46 step:   210542 elapsed time:  8584.90s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8611 val/ndcg@01:     0.4329 val/ndcg@05:     0.4285 val/ndcg@10:     0.4489
epoch:    47 step:   215119 elapsed time:  8771.20s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8628 val/ndcg@01:     0.4386 val/ndcg@05:     0.4302 val/ndcg@10:     0.4490
epoch:    48 step:   219696 elapsed time:  8958.07s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8615 val/ndcg@01:     0.4354 val/ndcg@05:     0.4316 val/ndcg@10:     0.4471
epoch:    49 step:   224273 elapsed time:  9143.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8622 val/ndcg@01:     0.4333 val/ndcg@05:     0.4310 val/ndcg@10:     0.4522
epoch:    50 step:   228850 elapsed time:  9329.97s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8627 val/ndcg@01:     0.4356 val/ndcg@05:     0.4317 val/ndcg@10:     0.4508
epoch:    51 step:   233427 elapsed time:  9516.51s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8658 val/ndcg@01:     0.4386 val/ndcg@05:     0.4295 val/ndcg@10:     0.4496
epoch:    52 step:   238004 elapsed time:  9702.59s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8619 val/ndcg@01:     0.4257 val/ndcg@05:     0.4293 val/ndcg@10:     0.4506
epoch:    53 step:   242581 elapsed time:  9889.15s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8658 val/ndcg@01:     0.4413 val/ndcg@05:     0.4346 val/ndcg@10:     0.4545 *
epoch:    54 step:   247158 elapsed time: 10075.51s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8671 val/ndcg@01:     0.4293 val/ndcg@05:     0.4275 val/ndcg@10:     0.4492
epoch:    55 step:   251735 elapsed time: 10262.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8665 val/ndcg@01:     0.4263 val/ndcg@05:     0.4215 val/ndcg@10:     0.4384
epoch:    56 step:   256312 elapsed time: 10448.79s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8684 val/ndcg@01:     0.4262 val/ndcg@05:     0.4042 val/ndcg@10:     0.4069
epoch:    57 step:   260889 elapsed time: 10634.69s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8643 val/ndcg@01:     0.4352 val/ndcg@05:     0.4290 val/ndcg@10:     0.4469
epoch:    58 step:   265466 elapsed time: 10821.48s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8617 val/ndcg@01:     0.4387 val/ndcg@05:     0.4280 val/ndcg@10:     0.4438
epoch:    59 step:   270043 elapsed time: 11007.65s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8622 val/ndcg@01:     0.4251 val/ndcg@05:     0.4225 val/ndcg@10:     0.4365
epoch:    60 step:   274620 elapsed time: 11193.97s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8628 val/ndcg@01:     0.4311 val/ndcg@05:     0.4334 val/ndcg@10:     0.4560
epoch:    61 step:   279197 elapsed time: 11380.67s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8636 val/ndcg@01:     0.4342 val/ndcg@05:     0.4327 val/ndcg@10:     0.4549
epoch:    62 step:   283774 elapsed time: 11567.60s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8687 val/ndcg@01:     0.4404 val/ndcg@05:     0.4337 val/ndcg@10:     0.4497
epoch:    63 step:   288351 elapsed time: 11754.40s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8683 val/ndcg@01:     0.4273 val/ndcg@05:     0.4217 val/ndcg@10:     0.4349
epoch:    64 step:   292928 elapsed time: 11941.03s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8700 val/ndcg@01:     0.4384 val/ndcg@05:     0.4349 val/ndcg@10:     0.4541 *
epoch:    65 step:   297505 elapsed time: 12127.10s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8695 val/ndcg@01:     0.3998 val/ndcg@05:     0.4109 val/ndcg@10:     0.4345
epoch:    66 step:   302082 elapsed time: 12313.08s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8694 val/ndcg@01:     0.4379 val/ndcg@05:     0.4305 val/ndcg@10:     0.4480
epoch:    67 step:   306659 elapsed time: 12499.40s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8722 val/ndcg@01:     0.4366 val/ndcg@05:     0.4277 val/ndcg@10:     0.4406
epoch:    68 step:   311236 elapsed time: 12685.55s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8720 val/ndcg@01:     0.4344 val/ndcg@05:     0.4297 val/ndcg@10:     0.4508
epoch:    69 step:   315813 elapsed time: 12871.95s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8676 val/ndcg@01:     0.4307 val/ndcg@05:     0.4312 val/ndcg@10:     0.4515
epoch:    70 step:   320390 elapsed time: 13058.21s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8695 val/ndcg@01:     0.4334 val/ndcg@05:     0.4292 val/ndcg@10:     0.4531
epoch:    71 step:   324967 elapsed time: 13244.83s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8694 val/ndcg@01:     0.4389 val/ndcg@05:     0.4343 val/ndcg@10:     0.4540
epoch:    72 step:   329544 elapsed time: 13431.24s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8708 val/ndcg@01:     0.4312 val/ndcg@05:     0.4310 val/ndcg@10:     0.4516
epoch:    73 step:   334121 elapsed time: 13617.28s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8692 val/ndcg@01:     0.4420 val/ndcg@05:     0.4356 val/ndcg@10:     0.4518 *
epoch:    74 step:   338698 elapsed time: 13803.08s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8700 val/ndcg@01:     0.4439 val/ndcg@05:     0.4334 val/ndcg@10:     0.4523
epoch:    75 step:   343275 elapsed time: 13989.09s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8663 val/ndcg@01:     0.4456 val/ndcg@05:     0.4347 val/ndcg@10:     0.4496
epoch:    76 step:   347852 elapsed time: 14175.29s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8651 val/ndcg@01:     0.4233 val/ndcg@05:     0.4266 val/ndcg@10:     0.4453
epoch:    77 step:   352429 elapsed time: 14361.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8648 val/ndcg@01:     0.4269 val/ndcg@05:     0.4176 val/ndcg@10:     0.4243
epoch:    78 step:   357006 elapsed time: 14547.40s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8614 val/ndcg@01:     0.4394 val/ndcg@05:     0.4329 val/ndcg@10:     0.4511
epoch:    79 step:   361583 elapsed time: 14733.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8690 val/ndcg@01:     0.4441 val/ndcg@05:     0.4341 val/ndcg@10:     0.4510
epoch:    80 step:   366160 elapsed time: 14919.43s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8716 val/ndcg@01:     0.4292 val/ndcg@05:     0.4262 val/ndcg@10:     0.4458
epoch:    81 step:   370737 elapsed time: 15105.86s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8703 val/ndcg@01:     0.4327 val/ndcg@05:     0.4273 val/ndcg@10:     0.4457
epoch:    82 step:   375314 elapsed time: 15291.96s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8690 val/ndcg@01:     0.4411 val/ndcg@05:     0.4345 val/ndcg@10:     0.4533
epoch:    83 step:   379891 elapsed time: 15477.95s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8656 val/ndcg@01:     0.4451 val/ndcg@05:     0.4402 val/ndcg@10:     0.4568 *
epoch:    84 step:   384468 elapsed time: 15663.99s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8725 val/ndcg@01:     0.4384 val/ndcg@05:     0.4336 val/ndcg@10:     0.4507
epoch:    85 step:   389045 elapsed time: 15849.70s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8732 val/ndcg@01:     0.4450 val/ndcg@05:     0.4384 val/ndcg@10:     0.4548
epoch:    86 step:   393622 elapsed time: 16035.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8695 val/ndcg@01:     0.4421 val/ndcg@05:     0.4384 val/ndcg@10:     0.4605
epoch:    87 step:   398199 elapsed time: 16222.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8729 val/ndcg@01:     0.4394 val/ndcg@05:     0.4353 val/ndcg@10:     0.4535
epoch:    88 step:   402776 elapsed time: 16408.79s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8717 val/ndcg@01:     0.4256 val/ndcg@05:     0.4257 val/ndcg@10:     0.4430
epoch:    89 step:   407353 elapsed time: 16595.00s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8649 val/ndcg@01:     0.4366 val/ndcg@05:     0.4332 val/ndcg@10:     0.4524
epoch:    90 step:   411930 elapsed time: 16781.43s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8697 val/ndcg@01:     0.4322 val/ndcg@05:     0.4317 val/ndcg@10:     0.4516
epoch:    91 step:   416507 elapsed time: 16967.26s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8687 val/ndcg@01:     0.4199 val/ndcg@05:     0.4177 val/ndcg@10:     0.4407
epoch:    92 step:   421084 elapsed time: 17153.37s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8651 val/ndcg@01:     0.4266 val/ndcg@05:     0.4278 val/ndcg@10:     0.4501
epoch:    93 step:   425661 elapsed time: 17339.56s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8712 val/ndcg@01:     0.4377 val/ndcg@05:     0.4309 val/ndcg@10:     0.4540
epoch:    94 step:   430238 elapsed time: 17525.42s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8699 val/ndcg@01:     0.4449 val/ndcg@05:     0.4335 val/ndcg@10:     0.4512
epoch:    95 step:   434815 elapsed time: 17711.62s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8690 val/ndcg@01:     0.4382 val/ndcg@05:     0.4322 val/ndcg@10:     0.4545
epoch:    96 step:   439392 elapsed time: 17897.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8709 val/ndcg@01:     0.4317 val/ndcg@05:     0.4254 val/ndcg@10:     0.4450
epoch:    97 step:   443969 elapsed time: 18083.42s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8677 val/ndcg@01:     0.4333 val/ndcg@05:     0.4300 val/ndcg@10:     0.4443
epoch:    98 step:   448546 elapsed time: 18269.44s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8676 val/ndcg@01:     0.4339 val/ndcg@05:     0.4295 val/ndcg@10:     0.4523
epoch:    99 step:   453123 elapsed time: 18455.35s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8625 val/ndcg@01:     0.4349 val/ndcg@05:     0.4324 val/ndcg@10:     0.4507
epoch:   100 step:   457700 elapsed time: 18640.85s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:    -2.8646 val/ndcg@01:     0.4434 val/ndcg@05:     0.4326 val/ndcg@10:     0.4476
test/ndcg@01:     0.4446 test/ndcg@05:     0.4397 test/ndcg@10:     0.4594