Example 2: Groupwise Scoring Function¶

Architecture¶

Fig. 2 Model architecture for the GroupwiseScoringNetwork model.¶

The second example will demonstrate how to use the GroupwiseScoringNetwork. This is what Ai et al. [Ai2019] call a Groupwise Scoring Function (GSF). The architecture of the model by Ai et al. (as I understand it) is shown in Fig. 2. The preprocessing and model architecture is a bit more involved than in the first example and will be described in more detail below.

Overview¶

The SimpleScoringNetwork, from the previous example, scores each document independently. However, this is an unlikely assumption about how documents are rated or ranked. It is likely that the observation of a document will impact the rating of a subsequent one. Additionally, modeling the dependence between documents should help to improve the relative rankings between ones that are similar.

A naive approach to modeling dependencies between documents in a query, suggested by Ai et al., is to construct a model whose input is a concatenation of all the documents features and whose output is score for each document. The downside of this approach is that it potentially leads to a very high dimensional input space. The maximum number of documents in the MSLR dataset is 1251, which would result in a 170,136 dimensional input vector. This would explode the number of parameters that need to be learned and would likely require more training data than is available. Additionally, most queries contain many fewer documents than the maximum. This could be overcome by padding, but would be very inefficient and possibly lead to training difficulty.

Instead, Ai et al. propose a model that uses the concatenation technique, but each input (and output) only considers a subset of documents of size G. One can then generate many subsets and aggregate the the scores across all of them to obtain a final score for each document. Unfortunately, if there are N documents, then there are \(\frac{N!}{(N-G)!}\) possible subsets, which will be intractably large for most real world datasets. Their solution is shuffle the documents of the query and select M subsequences of length G (i.e., a group). They call a network that processes samples of this type a GroupwiseScoringFunction, because it assigns scores to a group of documents simultaneously and then aggregates the results over many groups for a query into a final score for each document.

The basic procedure is similar to the first example with one additional step for the document sampling.

Load the initial dataset.
Apply the same preprocessing steps as in the first example.
For each query, generate M samples that contain a subset documents of size G.
Configure the neural network model.
Train the model.
Evaluate the model.

Load The Data¶

See the description in the first example.

Preprocessing¶

See the description in the first example. Note, that it is not necessary to shuffle the documents here, because we will be shuffling them during the sampling process anyway. The figure above starts after we have batched the data. There are 4 documents per query and 5 features with a batch size of 2. So, the output shape after batching is (2, 4, 5).

Sample Documents¶

There are many possible ways of sampling documents from a query and I don’t understand the details of Ai et al.’s approach. I will first describe several of these possibilities and then highlight the method implemented in the toolkit and how to use it. There is currently only one method supported, IndependentMultiOutputSampler (see No Duplicates In Group), but I hope to implement others in the future. The output of a sampler is a tensor of shape (batch size, number of samples, group size, number of features) as illustrated in the second step of Fig. 2.

Possible Sampling Methods¶

With Replacement¶

Probably the simplest method (to implement) is to sample each document with replacement. This can easily be accomplished by generating random integers in the interval of [0, N) and selecting the documents corresponding to those indices. This can be done for as many samples as you would like to generate. This introduces the possibility than a document may be included in a group more than once, which may or may not be desirable. There is also no guarantee that documents will be appear with the same frequency (e.g., document 1 may appear in 3 groups while document 2 may not appear in any).

Equal Frequency Guarantee¶

Fig. 3 An example of generating samples that guarantees each document is seen exactly the same number of times. In this example we generate groups of size 4 from a list of 6 documents. We do this apply the procedure twice (K = 2) to produce a total of 4 samples.¶

If the number of documents is a multiple of the group size then this method is also very simple. We can just shuffle the documents and reshape the list to \((\frac{N}{G}, G)\). This will guarantee each document is seen exactly once. To guarantee each document is seen exactly K times we repeat this process K times. If the number of documents is not a multiple of the group size, then one group will have fewer documents than the others. This could be handled in several different ways.

One solution is to pad the remaining group, which is shown in Fig. 3. Although padding simplifies the sampling implementation, it can complicate the scoring aggregation later on.

Fig. 4 An example of generating samples that guarantees each document is seen exactly the same number of times even when N is not divisible by G. In this example we generate groups of size 3 from a list of 5 documents. We first copy/tile the document indices 3 times, then reshape the array so that there are 5 samples each with a group size of 3.¶

An alternative solution that avoids padding is to create as many copies of the document list in order to guarantee the length is exactly divisible by G. Making G copies is the simplest way to guarantee this property, but sometimes a smaller number could be used. See Fig. 4. Like sampling with replacement, there is now a possibility that a document can appear more than once in a group. However, all documents are guaranteed to be present and occur with equal frequency.

No Duplicates In Group¶

Fig. 5 An illustration of generating samples that guarantees no documents are included more than once in a group. In this example we start with 5 documents and create 4 samples with 3 documents per group.¶

This last approach generates samples that guarantee there are no duplicate documents in a group, but does not guarantee that all documents will be seen at the same frequency (or at all). For each sample we generate we duplicate the list of documents and shuffle their order. We then keep only the first G documents from each sample. This is illustrated in Fig. 5. This guarantees that documents are never duplicated within a group.

However, we face the same problem as the sampling with replacement strategy that some documents may occur more frequently than others, or not at all. With this approach we cannot make any guarantees, but with a large enough number of samples we can be reasonably sure we have at least one of each document. How many samples we need for what ever reasonableness we are comfortable with can be determined by the coupon collector’s problem.

In practice there may be queries that have fewer documents than a specified group size. There are at least two strategies for dealing with this case. The first is to pad and the second is to sample with replacement.

Sampling With The Toolkit¶

The toolkit currently only supports one sampling method implemented in the class IndependentMultiOutputSampler. The current implementation hard codes the minimum number of samples that are generated based on the expectation given by the coupon collector’s problem:

\[\mathop{\mathbb{E}}[X] = n H_{n}\]

However, you can generate additional multiples of this value to increase the probability that every document in a query is included at least once. When a query has fewer documents than the group size the class adopts the second strategy of sampling with replacement.

Prior to applying the sampler each instance returned by a call to the dataset will be an (\(X\), \(y\)) pair with the structure of \(X\) and \(y\) described in Parse A Single Query Instance. After applying the sampler the output structure is augmented with several new items.

A new entry in the \(X\) dictionary with the key sample_dense is added for the sampled documents (the original feature tensor(s) are preserved).
A new entry in the \(X\) dictionary with the key scatter_idx is added that maps the documents in the sampled feature space back to their original index in the full document list. This is useful/essential for aggregating the scores of each document in a query.
A new entry in the \(X\) dictionary with the key document_counts is added that keeps track of how many times each document has been sampled. This is useful if we want to average the scores over documents in a query instead of summing them.
To preserve the original target values we return a tuple of \(y\) values with the original data as the first element and the sampled values as the second element: (\(y_{o}\), \(y_{s}\)). This makes each instance returned by a call to the dataset have the following structure (\(X\), (\(y_{o}\), \(y_{s}\))).

Putting It Together¶

To prepare the data for input to the GroupwiseScoringNetwork we only need to make a small addition to the preprocessing function in the MLP example.

# Add this just after batching
sampler = IndependentMultiOutputSampler(group_size)

train_data = train_data.map(sampler)
valid_data = valid_data.map(sampler)
test_data = test_data.map(sampler)

This is implemented in the corresponding prepare_data() function.

Setup The Model¶

In this example we use the GroupwiseScoringNetwork as the scoring function. The model accepts 7 parameters as defined in the ModelParameter class.

n_features
The number of dense sequential features in the data. This might be able to be inferred from the data during execution, but it is currently a required parameter.
n_units
A list of integers that specifies the number of hidden units in each layer (the size of the list determines the number of layers).
group_size
The number of documents in a group.
use_average
Ai et al. aggregate the scores of each document across samples by summing their values. However, my sampling strategy does not guarantee that each document is seen with equal frequency and so averaging the scores over the number of times a document is actually seen usually produces better results.
share_weights
I was unable to reproduce the results of Ai et al. without modifying the architecture of the network. Setting this to True will make this modification and likely improve the performance.
dropout_rate
An optional float or list of floats specifying the dropout rate for each layer. If it is a single value then the dropout rate will be the same across all layers. If it is a list it must be the same length as n_units. If it is None then dropout will not be used.
random_seed
An optional integer that can be used to seed random number generation.

Standard Model¶

This brings us to the middle column of Fig. 2 labeled Network Input. The call() function expects the (dense sequential) input data to have shape (batch size, number of samples, group size`, ``number of features). Data with this shape is produced by the IndependentMultiOutputSampler and is shown in the previous column labeled Sample Output. In the figure, it has a batch size of 2, 3 samples, a group size of 2, and each document has 5 features.

The goal of the GSF is to model dependencies and features between documents within a query. In this model, this is accomplished by concatenating the features of documents within a group. This can easily and efficiently be achieved by simply reshaping the data produced by the sampler. This is shown at the bottom of the column labeled Network Input.

The tensor now has a shape of (6, 10). The first three rows of the tensor correspond to the samples of the first query in the batch and the final 3 the second query. The first 5 columns of the tensor correspond to the features of the first document in the group and the second 5 the other one.

After the data is reshaped it is passed through a standard neural network model. The figure shows two hidden layers, but it can be as deep and wide as you like. Similar to the MLP model a batch normalization layer and parameterized ReLU activation are applied to the hidden layers with optional dropout afterwards.

The network now outputs a score for each document in a group (i.e., each column) and for each of the 6 samples (there are 3 per query and 2 queries in the batch. This output is reshaped to (batch size, number of samples, group size) and then the scores for all documents in the query are summed using tensorflow.scatter_nd(), which gives us the final shape of (2, 4).

Adjusted Model¶

../_images/gsf-adjusted-architecture.svg

Fig. 6 Adjustments to the standard GSF model.¶

I was unable to reproduce the results of Ai et al. [Ai2019] with the model as described above. In order to achieve comparable results some modifications to the network structure were required (in addition to using a special scaler). These architecture changes are described here.

The primary change in the model occurs at the input to the network. We take the output of the sampler and flatten it so that each document across all samples and queries in the batch are treated as individual inputs to the model. This creates a tensor of shape (batch size \(\times\) number of samples \(\times\) group size, number of features), which is (12, 5) for the example in the figure. This input is passed though a standard dense layer with \(D\) units (3 in the figure example), which produces an output tensor of shape (batch size \(\times\) number of samples \(\times\) group size, D). The features in this reduced dimensional space are then concatenated like before by reshaping. The result is the tensor shown in the right column labeled New Input. This tensor is then run through the same hidden layer structure and output process as in the standard model.

Train The Model¶

Training is performed exactly as in the MLP example. In this case the second element of the \((x, y)\) tuple returned when iterating over the dataset is now a tuple itself as described above. So we now enter the branch where isintance(y, (tuple, list) is true.

Test The Model¶

Testing is also performed exactly as in the MLP example.

Running The Script¶

A script to train and evaluate the GroupwiseScoringNetwork can be found in the examples package here.

usage: examples.gsf.mltr30k.py [-h] --train-file TRAIN_FILE --valid-file
                               VALID_FILE --test-file TEST_FILE
                               --checkpoint-dir CHECKPOINT_DIR
                               [--scaler SCALER SCALER] [--run-eagerly]
                               [--max-epochs MAX_EPOCHS]
                               [--optimizer {adagrad,adam,sgd,nesterov,rmsprop}]
                               [--learning-rate LEARNING_RATE]
                               [--loss {ndcg,bidi_ndcg,softmax,cross_entropy,mse}]
                               [--list-size LIST_SIZE]
                               [--group-size GROUP_SIZE] [--sample-pre-batch]
                               [--multiples MULTIPLES]
                               [--training-batch-size TRAINING_BATCH_SIZE]
                               [--evaluation-batch-size EVALUATION_BATCH_SIZE]
                               [--use-average] [--share-weights]
                               [--n-units N_UNITS [N_UNITS ...]]
                               [--dropout-rate DROPOUT_RATE]
                               [--drop-remainder] [--random-seed RANDOM_SEED]

Named Arguments¶

--train-file

The training tfrecords file.

--valid-file

The validation tfrecords file.

--test-file

The test tfrecords file.

--checkpoint-dir

The directory where model checkpoints will be saved.

--scaler

This argument requires two parameters. The first is the path to a scaler file created with the build dataset script. The second is the name of the scaler to use. Choose one of: minmax, standard, robust, power.

--run-eagerly

Default: False

--max-epochs

The maximum number of epochs before the training terminates no matter what.

Default: 500

--optimizer

Possible choices: adagrad, adam, sgd, nesterov, rmsprop

Default: “adagrad”

--learning-rate

Default: 0.001

--loss

Possible choices: ndcg, bidi_ndcg, softmax, cross_entropy, mse

Default: “ndcg”

--list-size

The maximum number of documents per query or no maximum if not set.

--group-size

The group size to use.

Default: 16

--sample-pre-batch

If this flag is set then the alternate form of training will be performed where documents are sampled before training.

Default: False

--multiples

The sampling multiplier.

Default: 1

--training-batch-size

Default: 128

--evaluation-batch-size

Default: 128

--use-average

According to the paper, when a document is sampled more than once its scores are summed. When this option is set the scores are averaged over the number of times each document is seen instead.

Default: False

--share-weights

Apply each document through a shared dense layer before concatenating them.

Default: False

--n-units

Default: [64, 32, 16]

--dropout-rate

Default: 0.0

--drop-remainder

This is necessary when using the keras training/eval loops.

Default: False

--random-seed

The random seed to use for sampling query results.

Example Usage¶

> python examples/gsf/mltr30k.py                    \
    --train-file data/mltr30k/train.tfrecords.gz    \
    --valid-file data/mltr30k/valid.tfrecords.gz    \
    --test-file data/mltr30k/test.tfrecords.gz      \
    --scaler data/mltr30k/train.scalers.db power    \
    --max-epochs 100                                \
    --checkpoint-dir data/mltr30k/models/           \
    --training-batch-size 32                        \
    --evaluation-batch-size 64                      \
    --multiples 3                                   \
    --group-size 16                                 \
    --optimizer adam                                \
    --learning-rate 0.0005                          \
    --n-units 64 128 64 32                          \
    --loss ndcg                                     \
    --use-average                                   \
    --share-weights

tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792875000 Hz
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56083e4c4ea0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7428 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56083eb65340 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
Removing existing checkpoint directory: data/mltr30k/models/
model_params: {'n_features': 136, 'n_units': [64, 128, 64, 32], 'group_size': 16, 'use_average': True, 'share_weights': True, 'dropout_rate': 0.0}
2020-07-21 17:43:02.726040: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
Model: "groupwise_scoring_network"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
shared_input (Dense)         multiple                  8768
_________________________________________________________________
shared_activation (PReLU)    multiple                  64
_________________________________________________________________
shared_batch_norm (BatchNorm multiple                  256
_________________________________________________________________
dense (Dense)                multiple                  131200
_________________________________________________________________
dense_1 (Dense)              multiple                  8256
_________________________________________________________________
dense_2 (Dense)              multiple                  2080
_________________________________________________________________
p_re_lu (PReLU)              multiple                  128
_________________________________________________________________
p_re_lu_1 (PReLU)            multiple                  64
_________________________________________________________________
p_re_lu_2 (PReLU)            multiple                  32
_________________________________________________________________
batch_normalization (BatchNo multiple                  512
_________________________________________________________________
batch_normalization_1 (Batch multiple                  256
_________________________________________________________________
batch_normalization_2 (Batch multiple                  128
_________________________________________________________________
dense_3 (Dense)              multiple                  528
=================================================================
Total params: 152,272
Trainable params: 151,696
Non-trainable params: 576
_________________________________________________________________
epoch:     1 step:      573 elapsed time:    45.20s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -21.5078 val/ndcg@01:     0.4177 val/ndcg@05:     0.4111 val/ndcg@10:     0.4331 *
epoch:     2 step:     1146 elapsed time:    81.91s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.4067 val/ndcg@01:     0.4282 val/ndcg@05:     0.4250 val/ndcg@10:     0.4458 *
epoch:     3 step:     1719 elapsed time:   119.16s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.5617 val/ndcg@01:     0.4345 val/ndcg@05:     0.4281 val/ndcg@10:     0.4486 *
epoch:     4 step:     2292 elapsed time:   156.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.6476 val/ndcg@01:     0.4383 val/ndcg@05:     0.4313 val/ndcg@10:     0.4514 *
epoch:     5 step:     2865 elapsed time:   193.18s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.7265 val/ndcg@01:     0.4351 val/ndcg@05:     0.4335 val/ndcg@10:     0.4516 *
epoch:     6 step:     3438 elapsed time:   230.42s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.7566 val/ndcg@01:     0.4380 val/ndcg@05:     0.4354 val/ndcg@10:     0.4542 *
epoch:     7 step:     4011 elapsed time:   267.74s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8088 val/ndcg@01:     0.4468 val/ndcg@05:     0.4386 val/ndcg@10:     0.4579 *
epoch:     8 step:     4584 elapsed time:   304.80s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8386 val/ndcg@01:     0.4412 val/ndcg@05:     0.4352 val/ndcg@10:     0.4563
epoch:     9 step:     5157 elapsed time:   342.18s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8561 val/ndcg@01:     0.4458 val/ndcg@05:     0.4412 val/ndcg@10:     0.4598 *
epoch:    10 step:     5730 elapsed time:   379.66s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8709 val/ndcg@01:     0.4493 val/ndcg@05:     0.4384 val/ndcg@10:     0.4573
epoch:    11 step:     6303 elapsed time:   416.99s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8965 val/ndcg@01:     0.4489 val/ndcg@05:     0.4387 val/ndcg@10:     0.4580
epoch:    12 step:     6876 elapsed time:   454.44s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9092 val/ndcg@01:     0.4517 val/ndcg@05:     0.4417 val/ndcg@10:     0.4595 *
epoch:    13 step:     7449 elapsed time:   491.64s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9131 val/ndcg@01:     0.4561 val/ndcg@05:     0.4414 val/ndcg@10:     0.4604
epoch:    14 step:     8022 elapsed time:   528.96s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9286 val/ndcg@01:     0.4504 val/ndcg@05:     0.4423 val/ndcg@10:     0.4607 *
epoch:    15 step:     8595 elapsed time:   566.26s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9458 val/ndcg@01:     0.4554 val/ndcg@05:     0.4431 val/ndcg@10:     0.4631 *
epoch:    16 step:     9168 elapsed time:   603.61s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9637 val/ndcg@01:     0.4552 val/ndcg@05:     0.4434 val/ndcg@10:     0.4609 *
epoch:    17 step:     9741 elapsed time:   641.08s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9777 val/ndcg@01:     0.4489 val/ndcg@05:     0.4412 val/ndcg@10:     0.4616
epoch:    18 step:    10314 elapsed time:   678.55s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9831 val/ndcg@01:     0.4467 val/ndcg@05:     0.4411 val/ndcg@10:     0.4605
epoch:    19 step:    10887 elapsed time:   716.10s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9864 val/ndcg@01:     0.4514 val/ndcg@05:     0.4421 val/ndcg@10:     0.4620
epoch:    20 step:    11460 elapsed time:   753.60s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9827 val/ndcg@01:     0.4567 val/ndcg@05:     0.4441 val/ndcg@10:     0.4628 *
epoch:    21 step:    12033 elapsed time:   790.94s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0061 val/ndcg@01:     0.4557 val/ndcg@05:     0.4447 val/ndcg@10:     0.4639 *
epoch:    22 step:    12606 elapsed time:   828.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0245 val/ndcg@01:     0.4535 val/ndcg@05:     0.4420 val/ndcg@10:     0.4603
epoch:    23 step:    13179 elapsed time:   865.73s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0198 val/ndcg@01:     0.4506 val/ndcg@05:     0.4414 val/ndcg@10:     0.4607
epoch:    24 step:    13752 elapsed time:   903.17s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0212 val/ndcg@01:     0.4553 val/ndcg@05:     0.4450 val/ndcg@10:     0.4627 *
epoch:    25 step:    14325 elapsed time:   940.71s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0330 val/ndcg@01:     0.4550 val/ndcg@05:     0.4454 val/ndcg@10:     0.4637 *
epoch:    26 step:    14898 elapsed time:   978.04s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0471 val/ndcg@01:     0.4421 val/ndcg@05:     0.4403 val/ndcg@10:     0.4606
epoch:    27 step:    15471 elapsed time:  1015.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0571 val/ndcg@01:     0.4523 val/ndcg@05:     0.4412 val/ndcg@10:     0.4624
epoch:    28 step:    16044 elapsed time:  1052.86s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0566 val/ndcg@01:     0.4589 val/ndcg@05:     0.4458 val/ndcg@10:     0.4668 *
epoch:    29 step:    16617 elapsed time:  1090.18s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0674 val/ndcg@01:     0.4542 val/ndcg@05:     0.4455 val/ndcg@10:     0.4652
epoch:    30 step:    17190 elapsed time:  1127.45s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1005 val/ndcg@01:     0.4583 val/ndcg@05:     0.4475 val/ndcg@10:     0.4664 *
epoch:    31 step:    17763 elapsed time:  1164.63s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0791 val/ndcg@01:     0.4508 val/ndcg@05:     0.4433 val/ndcg@10:     0.4625
epoch:    32 step:    18336 elapsed time:  1201.79s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0695 val/ndcg@01:     0.4574 val/ndcg@05:     0.4471 val/ndcg@10:     0.4663
epoch:    33 step:    18909 elapsed time:  1239.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0790 val/ndcg@01:     0.4548 val/ndcg@05:     0.4447 val/ndcg@10:     0.4640
epoch:    34 step:    19482 elapsed time:  1276.69s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0878 val/ndcg@01:     0.4585 val/ndcg@05:     0.4471 val/ndcg@10:     0.4657
epoch:    35 step:    20055 elapsed time:  1314.24s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0940 val/ndcg@01:     0.4522 val/ndcg@05:     0.4449 val/ndcg@10:     0.4642
epoch:    36 step:    20628 elapsed time:  1351.75s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1033 val/ndcg@01:     0.4575 val/ndcg@05:     0.4456 val/ndcg@10:     0.4635
epoch:    37 step:    21201 elapsed time:  1388.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1133 val/ndcg@01:     0.4551 val/ndcg@05:     0.4461 val/ndcg@10:     0.4655
epoch:    38 step:    21774 elapsed time:  1426.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1212 val/ndcg@01:     0.4509 val/ndcg@05:     0.4434 val/ndcg@10:     0.4639
epoch:    39 step:    22347 elapsed time:  1463.43s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1215 val/ndcg@01:     0.4505 val/ndcg@05:     0.4435 val/ndcg@10:     0.4630
epoch:    40 step:    22920 elapsed time:  1500.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1082 val/ndcg@01:     0.4566 val/ndcg@05:     0.4462 val/ndcg@10:     0.4648
epoch:    41 step:    23493 elapsed time:  1538.32s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1498 val/ndcg@01:     0.4610 val/ndcg@05:     0.4465 val/ndcg@10:     0.4653
epoch:    42 step:    24066 elapsed time:  1576.44s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1505 val/ndcg@01:     0.4568 val/ndcg@05:     0.4449 val/ndcg@10:     0.4648
epoch:    43 step:    24639 elapsed time:  1614.07s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1335 val/ndcg@01:     0.4582 val/ndcg@05:     0.4466 val/ndcg@10:     0.4643
epoch:    44 step:    25212 elapsed time:  1652.21s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1488 val/ndcg@01:     0.4500 val/ndcg@05:     0.4412 val/ndcg@10:     0.4620
epoch:    45 step:    25785 elapsed time:  1690.21s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1272 val/ndcg@01:     0.4515 val/ndcg@05:     0.4443 val/ndcg@10:     0.4646
epoch:    46 step:    26358 elapsed time:  1727.89s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1308 val/ndcg@01:     0.4507 val/ndcg@05:     0.4460 val/ndcg@10:     0.4667
epoch:    47 step:    26931 elapsed time:  1765.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1637 val/ndcg@01:     0.4558 val/ndcg@05:     0.4475 val/ndcg@10:     0.4670 *
epoch:    48 step:    27504 elapsed time:  1803.30s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1494 val/ndcg@01:     0.4546 val/ndcg@05:     0.4489 val/ndcg@10:     0.4665 *
epoch:    49 step:    28077 elapsed time:  1840.98s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1480 val/ndcg@01:     0.4587 val/ndcg@05:     0.4470 val/ndcg@10:     0.4660
epoch:    50 step:    28650 elapsed time:  1878.74s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1638 val/ndcg@01:     0.4598 val/ndcg@05:     0.4479 val/ndcg@10:     0.4670
epoch:    51 step:    29223 elapsed time:  1916.80s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1759 val/ndcg@01:     0.4526 val/ndcg@05:     0.4451 val/ndcg@10:     0.4652
epoch:    52 step:    29796 elapsed time:  1954.50s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1610 val/ndcg@01:     0.4459 val/ndcg@05:     0.4427 val/ndcg@10:     0.4630
epoch:    53 step:    30369 elapsed time:  1992.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1555 val/ndcg@01:     0.4526 val/ndcg@05:     0.4452 val/ndcg@10:     0.4642
epoch:    54 step:    30942 elapsed time:  2030.26s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1905 val/ndcg@01:     0.4589 val/ndcg@05:     0.4485 val/ndcg@10:     0.4661
epoch:    55 step:    31515 elapsed time:  2068.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1795 val/ndcg@01:     0.4537 val/ndcg@05:     0.4492 val/ndcg@10:     0.4664 *
epoch:    56 step:    32088 elapsed time:  2106.37s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1818 val/ndcg@01:     0.4606 val/ndcg@05:     0.4505 val/ndcg@10:     0.4678 *
epoch:    57 step:    32661 elapsed time:  2144.35s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1969 val/ndcg@01:     0.4654 val/ndcg@05:     0.4513 val/ndcg@10:     0.4688 *
epoch:    58 step:    33234 elapsed time:  2182.22s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1861 val/ndcg@01:     0.4624 val/ndcg@05:     0.4493 val/ndcg@10:     0.4688
epoch:    59 step:    33807 elapsed time:  2220.20s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1890 val/ndcg@01:     0.4536 val/ndcg@05:     0.4464 val/ndcg@10:     0.4661
epoch:    60 step:    34380 elapsed time:  2257.95s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1850 val/ndcg@01:     0.4638 val/ndcg@05:     0.4503 val/ndcg@10:     0.4693
epoch:    61 step:    34953 elapsed time:  2296.06s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2022 val/ndcg@01:     0.4596 val/ndcg@05:     0.4495 val/ndcg@10:     0.4680
epoch:    62 step:    35526 elapsed time:  2334.07s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2031 val/ndcg@01:     0.4556 val/ndcg@05:     0.4491 val/ndcg@10:     0.4692
epoch:    63 step:    36099 elapsed time:  2372.29s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1950 val/ndcg@01:     0.4529 val/ndcg@05:     0.4444 val/ndcg@10:     0.4644
epoch:    64 step:    36672 elapsed time:  2410.00s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1935 val/ndcg@01:     0.4569 val/ndcg@05:     0.4483 val/ndcg@10:     0.4670
epoch:    65 step:    37245 elapsed time:  2448.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1870 val/ndcg@01:     0.4623 val/ndcg@05:     0.4488 val/ndcg@10:     0.4673
epoch:    66 step:    37818 elapsed time:  2487.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1914 val/ndcg@01:     0.4627 val/ndcg@05:     0.4503 val/ndcg@10:     0.4688
epoch:    67 step:    38391 elapsed time:  2525.15s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2347 val/ndcg@01:     0.4592 val/ndcg@05:     0.4509 val/ndcg@10:     0.4679
epoch:    68 step:    38964 elapsed time:  2563.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2116 val/ndcg@01:     0.4583 val/ndcg@05:     0.4496 val/ndcg@10:     0.4686
epoch:    69 step:    39537 elapsed time:  2601.08s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2063 val/ndcg@01:     0.4622 val/ndcg@05:     0.4514 val/ndcg@10:     0.4704 *
epoch:    70 step:    40110 elapsed time:  2639.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2217 val/ndcg@01:     0.4595 val/ndcg@05:     0.4494 val/ndcg@10:     0.4682
epoch:    71 step:    40683 elapsed time:  2677.31s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2311 val/ndcg@01:     0.4574 val/ndcg@05:     0.4485 val/ndcg@10:     0.4678
epoch:    72 step:    41256 elapsed time:  2715.56s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2395 val/ndcg@01:     0.4545 val/ndcg@05:     0.4508 val/ndcg@10:     0.4677
epoch:    73 step:    41829 elapsed time:  2753.74s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2422 val/ndcg@01:     0.4582 val/ndcg@05:     0.4496 val/ndcg@10:     0.4688
epoch:    74 step:    42402 elapsed time:  2792.18s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2230 val/ndcg@01:     0.4591 val/ndcg@05:     0.4498 val/ndcg@10:     0.4679
epoch:    75 step:    42975 elapsed time:  2830.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2334 val/ndcg@01:     0.4536 val/ndcg@05:     0.4465 val/ndcg@10:     0.4668
epoch:    76 step:    43548 elapsed time:  2868.62s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2297 val/ndcg@01:     0.4511 val/ndcg@05:     0.4444 val/ndcg@10:     0.4645
epoch:    77 step:    44121 elapsed time:  2906.85s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2135 val/ndcg@01:     0.4578 val/ndcg@05:     0.4496 val/ndcg@10:     0.4667
epoch:    78 step:    44694 elapsed time:  2945.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2309 val/ndcg@01:     0.4651 val/ndcg@05:     0.4519 val/ndcg@10:     0.4691 *
epoch:    79 step:    45267 elapsed time:  2983.11s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2346 val/ndcg@01:     0.4586 val/ndcg@05:     0.4495 val/ndcg@10:     0.4684
epoch:    80 step:    45840 elapsed time:  3021.44s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2531 val/ndcg@01:     0.4606 val/ndcg@05:     0.4510 val/ndcg@10:     0.4702
epoch:    81 step:    46413 elapsed time:  3059.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2571 val/ndcg@01:     0.4567 val/ndcg@05:     0.4494 val/ndcg@10:     0.4688
epoch:    82 step:    46986 elapsed time:  3097.57s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2322 val/ndcg@01:     0.4635 val/ndcg@05:     0.4503 val/ndcg@10:     0.4682
epoch:    83 step:    47559 elapsed time:  3135.57s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2514 val/ndcg@01:     0.4578 val/ndcg@05:     0.4488 val/ndcg@10:     0.4678
epoch:    84 step:    48132 elapsed time:  3173.87s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2400 val/ndcg@01:     0.4616 val/ndcg@05:     0.4495 val/ndcg@10:     0.4682
epoch:    85 step:    48705 elapsed time:  3211.88s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2550 val/ndcg@01:     0.4639 val/ndcg@05:     0.4528 val/ndcg@10:     0.4707 *
epoch:    86 step:    49278 elapsed time:  3249.82s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2399 val/ndcg@01:     0.4573 val/ndcg@05:     0.4495 val/ndcg@10:     0.4694
epoch:    87 step:    49851 elapsed time:  3288.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2528 val/ndcg@01:     0.4597 val/ndcg@05:     0.4496 val/ndcg@10:     0.4682
epoch:    88 step:    50424 elapsed time:  3327.27s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2565 val/ndcg@01:     0.4568 val/ndcg@05:     0.4500 val/ndcg@10:     0.4680
epoch:    89 step:    50997 elapsed time:  3366.02s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2689 val/ndcg@01:     0.4593 val/ndcg@05:     0.4503 val/ndcg@10:     0.4697
epoch:    90 step:    51570 elapsed time:  3404.73s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2637 val/ndcg@01:     0.4552 val/ndcg@05:     0.4485 val/ndcg@10:     0.4684
epoch:    91 step:    52143 elapsed time:  3443.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2742 val/ndcg@01:     0.4586 val/ndcg@05:     0.4494 val/ndcg@10:     0.4698
epoch:    92 step:    52716 elapsed time:  3481.31s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2537 val/ndcg@01:     0.4577 val/ndcg@05:     0.4512 val/ndcg@10:     0.4700
epoch:    93 step:    53289 elapsed time:  3519.28s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2758 val/ndcg@01:     0.4622 val/ndcg@05:     0.4510 val/ndcg@10:     0.4707
epoch:    94 step:    53862 elapsed time:  3557.19s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2798 val/ndcg@01:     0.4569 val/ndcg@05:     0.4505 val/ndcg@10:     0.4684
epoch:    95 step:    54435 elapsed time:  3595.72s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2692 val/ndcg@01:     0.4557 val/ndcg@05:     0.4519 val/ndcg@10:     0.4698
epoch:    96 step:    55008 elapsed time:  3633.94s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2911 val/ndcg@01:     0.4603 val/ndcg@05:     0.4503 val/ndcg@10:     0.4695
epoch:    97 step:    55581 elapsed time:  3672.35s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2798 val/ndcg@01:     0.4549 val/ndcg@05:     0.4485 val/ndcg@10:     0.4684
epoch:    98 step:    56154 elapsed time:  3710.39s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2695 val/ndcg@01:     0.4570 val/ndcg@05:     0.4519 val/ndcg@10:     0.4710
epoch:    99 step:    56727 elapsed time:  3748.52s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2724 val/ndcg@01:     0.4557 val/ndcg@05:     0.4494 val/ndcg@10:     0.4678
epoch:   100 step:    57300 elapsed time:  3786.81s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2981 val/ndcg@01:     0.4616 val/ndcg@05:     0.4518 val/ndcg@10:     0.4703
Loading checkpoint from: data/mltr30k/models/
test/ndcg@01:     0.4638 test/ndcg@05:     0.4563 test/ndcg@10:     0.4752