Example 1: Multi-Layer Perceptron

Architecture

../_images/mlp-architecture.svg

Fig. 1 Model architecture for the SimpleScoringNetwork.

The first example will demonstrate how to use the SimpleScoringNetwork. The architecture of this model is shown in Fig. 1 above. In the simplest case, excluding all other features except dense sequential ones, the __call__ function takes inputs of shape (batch size, number of documents, number of features). In the example above the batch size is 2 (queries). Each query has 3 documents and there are 5 features (dense sequence features only).

The core of the neural network is shown in the center column. It has a standard 2 dimensional input layer (batch size, number of features), which is derived from the raw input by reshaping it to (batch size \(\times\) number of document, number of features). The reshaped input is then passed through several hidden layers (2 are shown here, but it is easily configurable) and then to an output layer. Each hidden layer applies batch normalization and a parameterized ReLU activation afterwards. After the activation dropout can optionally be performed. This output is a column vector that has a single score for each document for all the queries in the batch. Before returning the output it is reshaped so that it has the final shape of (batch size, number of documents), so that each row represents a query and each column represents the score for the corresponding document.

Overview

There are 5 basic steps in this example:

  1. Load the initial dataset.

  2. Apply any preprocessing steps.

  3. Configure the neural network model (i.e., the scoring function).

  4. Train the model.

  5. Evaluate the model.

Load The Data

Loading the data can be accomplished by following the steps outlined in the Load The Data page.

Preprocessing

There are a few additional preprocessing steps we will apply for this model.

  1. We will filter out any queries that only have a single document and any queries that do not have at least one document with a relevancy score greater than 0.

  2. We will optionally truncate the document list for each query so that it contains no more than some threshold.

  3. We will ensure that the order of documents for each query are shuffled before each epoch.

  4. We will aggregate individual instances into mini-batches.

Filter Documents

The basic method for filtering documents is provided in the snippet below.

def is_valid_query(x, y):
    # Get the number of documents for this query example
    n_docs = tf.shape(y)[0]

    # Get the total sum of the relevance values for all documents in
    # the query.
    n_rel = tf.math.reduce_sum(tf.cast(tf.math.greater(y, 0), tf.float32))

    # Keep only queries that have more than 1 document and at least one
    # of those documents has a relevance greater than 0.
    return n_docs > 1 and n_rel > 0

dataset = dataset.filter(is_valid_query)

Truncate Documents

While debugging it might be useful to reduce the maximum number of documents a query can have. The method below can be used to keep only some of the documents by truncating the list at some threshold.

def truncate_document_list(x, y, list_size: int):
    # In our data we only have dense sequental features. If we had other sequential features
    # we might need to repeat this process for those as well.
    dense_features = x['sequence_dense']
    n_docs = tf.shape(dense_features)[0]
    keep_docs = tf.minimum(n_docs, list_size)

    x['sequence_dense'] = dense_features[:keep_docs]
    y = y[:keep_docs]

    return x, y

# Simply apply the above function to every query in the dataset using
# tf.Dataset.map. Instead of using partial function or closure I use a
# lambda function instead.
dataset = dataset.map(lambda x, y: truncate_document_list(x, y, list_size))

Shuffle The Queries

Shuffling the training data can be done using tensorflow.data.Dataset.shuffle(). In order to handle large datasets that don’t fit into memory the shuffle method maintains a pool of instances in a buffer that does fit in memory. It then returns instances from this pool in a random order. The larger the buffer is the closer the returned samples will be to a truely random distribution.

# The BUFFER_SIZE variable should be some relatively large number.
# For this dataset the entire dataset could probably be loaded into memory.
dataset = dataset.shuffle(BUFFER_SIZE, reshuffle_each_iteration=True)

Shuffle The Documents

TODO

Batching The Data

Batching the data can also be easily accomplished using tensorflow.data.Dataset.padded_batch().

The only tricky part of using this function is defining the shapes and values parameters correctly. The methods make_padded_shapes() and make_padding_values() will return the appropriate values. The basic code is presented below. The actual code is slightly more complicated to handle some experimental cases I was playing with.

# The padded_shapes parameter must be a tuple that has the exact same structure as what is
# returned by the Dataset object when iterating over it. 'sequence_dense` is the only value
# that is actually used, but the others must be included (with an empty shape). In theory
# you can simply use a python list to specify the shape (and do not need to use a
# tf.TensorShape). However, I was having problems getting the Keras training
# (i.e., model.fit) and evaluation functions to work and eventually got them working
# after using tf.TensorShape. However, it is unclear whether it was using a TensorShape
# or explicitly specifying the number of features (n_features) that the solution. I suspect
# it was making sure that n_features was set explicitly and not None, although I have not
# had a chance to test this yet. When n_features is None, will get a cryptic error about
# how the last dimension of a tensor cannot be None, but it will not tell you which tensor
# the error applies to. After a lot of trial and error, it appears to be here, when defining
# the padded shapes. The list_size size, however, can and usually will be None.
padded_shapes = (
    {
        'context_meta_qid': (),
        'context_one_hot': (),
        'context_multi_hot': (),
        'context_dense': (),
        'sequence_one_hot': (),
        'sequence_multi_hot': (),
        'sequence_dense': tf.TensorShape([list_size, n_features])
    },
    tf.TensorShape([list_size])
)

# The padded_values tuple tells tensorflow what number to use to pad the values.
# MIN_FLOAT is a constant defined elsewhere that is unlikely to ever occur in real data.
padded_values = (
    {
        'context_meta_qid': tf.cast(0, tf.int64),
        'context_one_hot': 0.,
        'context_multi_hot': 0.,
        'context_dense': 0.,
        'sequence_one_hot': 0.,
        'sequence_multi_hot': 0.,
        'sequence_dense': 0.
    },
    MIN_FLOAT_32
)

Putting It Together

The full data loading and preprocessing is implemented in the prepare_data() function. It is reproduced below with some additional comments.

def prepare_data(
    args: argparse.Namespace,
    list_size: Optional[int],
    train_bsz: int,
    eval_bsz: int,
    drop_remainder: bool = False
):
    # Load each dataset and indicate which scaler to use if any.
    train_data = load_dataset(args.train_file, args.scaler)
    valid_data = load_dataset(args.valid_file, args.scaler)
    test_data = load_dataset(args.test_file, args.scaler)

    # Remove queries that don't have at least one non-zero relevancy rating
    train_data = train_data.filter(is_valid_query)
    valid_data = valid_data.filter(is_valid_query)
    test_data = test_data.filter(is_valid_query)

    # Keep only the first ``list_size`` number of documents if we specify a maximum document
    # limit.
    if list_size:
        train_data = train_data.map(lambda x, y: truncate_document_list(x, y, list_size))
        valid_data = valid_data.map(lambda x, y: truncate_document_list(x, y, list_size))
        test_data = test_data.map(lambda x, y: truncate_document_list(x, y, list_size))

    # Cache the datasets at this point. Each time the dataset is restarted we will
    # load all the steps up to this point from memory instead of performing them from scratch.
    train_data = train_data.cache()
    valid_data = valid_data.cache()
    test_data = test_data.cache()

    # Steps after this point will be performed anew each time the dataset is iterated over.
    # Shuffle the order of the documents in each query.
    train_data = train_data.map(lambda x, y: shuffle_documents(x, y))

    # Buffer 1000 documents and draw from this buffer randomly when iterating over the
    # dataset.
    train_data = train_data.shuffle(1000, seed=args.random_seed, reshuffle_each_iteration=True)

    # Define the data structures used to padd the sequences when batching.
    padded_shapes = make_padded_shapes(list_size)
    padding_values = make_padding_values()

    # Now pad and batch.
    train_data = train_data.padded_batch(
        train_bsz, padded_shapes, padding_values, drop_remainder
    )
    valid_data = valid_data.padded_batch(
        eval_bsz, padded_shapes, padding_values, drop_remainder
    )
    test_data = test_data.padded_batch(
        eval_bsz, padded_shapes, padding_values, drop_remainder
    )

    # Specify that we want to prefetch the data into the GPU if possible for additional
    # performance.
    train_data = train_data.prefetch(AUTOTUNE)
    valid_data = valid_data.prefetch(AUTOTUNE)
    test_data = test_data.prefetch(AUTOTUNE)

    return train_data, valid_data, test_data

Setup The Model

This example will use the SimpleScoringNetwork (a basic multi-layer perceptron model) as the scoring function. This model accepts 4 parameters as defined by the ModelParameter class. These are:

  • list_size

    An optional integer (or None for no maximum) that indicates the maximum number of documents a query can have.

  • n_units

    A list of integers that specifies the number of hidden units in each layer (the size of the list determines the number of layers).

  • dropout_rate

    An optional float or list of floats specifying the dropout rate for each layer. If it is a single value then the dropout rate will be the same across all layers. If it is a list it must be the same length as n_units. If it is None then dropout will not be used.

  • random_seed

    An optional integer that can be used to seed random number generation.

The model is created and setup in the function setup_model().

def setup_model(model_params: Dict[str, Any]):
    # Construct the model
    model = SimpleScoringNetwork(model_params)

    # The optimizer is hard coded, but could easily be changed.
    # Ai et al. and Bruch et al. use Adagrad, but any modern optimizer should work with
    # appropriate hyperparameters.
    optimizer = tf.keras.optimizers.Adagrad(0.0075)

    # Other loss functions are possible, but this one works well. If you use Keras instead
    # of the custom training loop, reduce should set to False.
    loss = ApproximateNormalizedDiscountedCumulativeGain(reduce=True)

    # These are the metrics used in the referenced papers, but others are also possible to
    # add here.
    metrics = [
        NormalizedDiscountedCumulativeGain(k=1),
        NormalizedDiscountedCumulativeGain(k=5),
        NormalizedDiscountedCumulativeGain(k=10),
    ]
    model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

    return model

Train The Model

The model can be trained either using Keras or a custom training loop. In this example we use a custom training loop. The high level structure of the loop is implemented in the main() function. However, this loop utilizes two utility methods implemented in the functions train() and evaluate().

Training

The train function is responsible for training the model and is shown below (with some minor omissions for clarity). It is annotated with @tf.function in order to maximize performance by telling tensorflow to automatically convert the function into a computational graph (See Better performance with tf.function).

@tf.function
def train(model: tf.keras.Model, train_data: tf.data.Dataset, train_meta: Dict[str, Any]):
    """
    Train a model with the given training data.

    :param model: The model to train.
    :param train_data: The training data.
    :param train_meta: A dictionary containing variables to store the results in.
    """
    # This is a variable used for experimental behavior. For all the given examples we
    # will assume this is False.
    sample_pre_batch = train_meta['sample_pre_batch']

    # Get the optimizer and loss function that was assigned to the model when we set it up.
    optimizer = model.optimizer
    loss_fn = model.loss

    # Loop through each instance of the training data. Note `x` here is actually a
    # dictionary as described in the Data Preparation guide.
    for x, y in train_data:
        # In some examples we might modify the target data and return a tuple, so we
        # check for that here.
        if isinstance(y, (tuple, list)):
            y, y_sample = y
            y_true = y_sample if sample_pre_batch else y
        else:
            # This will be the path taken for this example.
            y_true = y

        with tf.GradientTape() as tape:
            # Perform the forward pass by running the input through the network.
            train_scores = model(x, training=True)

            # Calculate the loss
            train_loss = loss_fn(y_true, train_scores)

            # Keep track of the losses in a variable.
            train_meta['train_loss'].update_state([train_loss])

        # Compute the gradients using the gradient tape and update the weights using
        # the optimizer.
        weights = model.trainable_weights
        grads = tape.gradient(train_loss, weights)
        optimizer.apply_gradients(zip(grads, weights))

Evaluating

The evaluate function is used to apply a trained model on held out data. It can be used both for validation (in the main training loop) or for test data after training is completed.

@tf.function
def evaluate(net: tf.keras.Model, eval_data: tf.data.Dataset, train_meta: Dict[str, Any]):
    metric_fns = dict()

    # The train_meta dictionary is used to store the results of the evaluation under the
    # 'metric' key. This maps from an integer ``k``, which is used to limit the NDCG to the
    # top k results, to a Keras data structure used to incrementally update the mean of a
    # target value.

    # We first construct the function objects for our metrics, which is the NDCG at various
    # values of ``k``.
    # Then we reset the Keras accumulator.
    for k, metric in train_meta['metrics'].items():
        metric_fns[k] = lwmetrics.NormalizedDiscountedCumulativeGain(k=k)
        metric.reset_states()

    # Now we loop through each data point in the evaluation dataset.
    for x, y in eval_data:
        if isinstance(y, (tuple, list)):
            y, y_sample = y

        # We get the predictions by passing the input data through the model.
        # Setting training to False ensures that any dropout layers are not
        # used and any batch normalization layers are used in prediction mode.
        y_pred = net(x, training=False)

        # Finally we compute the metric value for each NDCG function we defined above
        # and increment the mean value (metric.update_state).
        for k, metric in train_meta['metrics'].items():
            metric.update_state(metric_fns[k](y, y_pred))

Training Loop

With the train and evaluate functions defined above, the core of the main loop is shown below.

# train_meta is a dictionary of tensorflow variables that allow us to track
# statistics about the training process and return the metrics on the
# validation data.

# The checkpoint is used for saving the model. We'll only save it when the ndcg@5
# improves on the validation dataset.
ckpt = tf.train.Checkpoint(epoch=tf.Variable(0), optimizer=model.optimizer, model=model)
manager = tf.train.CheckpointManager(ckpt, args.checkpoint_dir, max_to_keep=1)

# Loop for some number of epochs.
for epoch in range(train_meta['max_epochs']):
    # Train the model
    train(model, train_data, train_meta)

    # Calculate the evaluation metrics on the validation dataset.
    evaluate(model, valid_data, train_meta)

    # Check to see if this is the best result so far and save the model if it is.
    if train_meta['metrics'][5].result() > train_meta['best_result']:
        manager.save()
        train_meta['best_result'].assign(train_meta['metrics'][5].result())

Test The Model

Testing the model is easily accomplisehd the the evaluate method.

# First we load the checkpointed model from disk.
ckpt = tf.train.Checkpoint(epoch=tf.Variable(0), model=model, optimizer=model.optimizer)
ckpt.restore(tf.train.latest_checkpoint(args.checkpoint_dir))

# Then we simply evaluate the model on some other held out test data.

evaluate(model, test_data, train_meta)

# We can retrieve (and print) the metric scores from the ``train_meta`` dictionary.
log.info(
    f"test/ndcg@01: {train_meta['metrics'][1].result():10.4f} "
    f"test/ndcg@05: {train_meta['metrics'][5].result():10.4f} "
    f"test/ndcg@10: {train_meta['metrics'][10].result():10.4f}"
)

Running The Script

A script to train and evaluate the SimpleScoringNetwork can be found in the examples package here.

usage: examples.mlp.mltr30k.py [-h] --train-file TRAIN_FILE --valid-file
                               VALID_FILE --test-file TEST_FILE --max-epochs
                               MAX_EPOCHS --checkpoint-dir CHECKPOINT_DIR
                               [--scaler SCALER SCALER]
                               [--random-seed RANDOM_SEED]

Named Arguments

--train-file

The training tfrecords file.

--valid-file

The validation tfrecords file.

--test-file

The test tfrecords file.

--max-epochs

The number of epochs to train for.

--checkpoint-dir

The directory where model checkpoints will be saved.

--scaler

This argument requires two parameters. The first is the path to a scaler file created with the build dataset script. The second is the name of the scaler to use. Choose one of: minmax, standard, robust, power.

--random-seed

The random seed to use for sampling query results.

Example Usage

>python examples/mlp/mltr30k.py                     \
    --train-file data/mltr30k/train.tfrecords.gz    \
    --valid-file data/mltr30k/valid.tfrecords.gz    \
    --test-file data/mltr30k/test.tfrecords.gz      \
    --scaler data/mltr30k/train.scalers.db power    \
    --max-epochs 100                                \
    --checkpoint-dir data/mltr30k/models/mlp
Successfully opened dynamic library libcuda.so.1
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
Successfully opened dynamic library libcudart.so.10.1
Successfully opened dynamic library libcublas.so.10
Successfully opened dynamic library libcufft.so.10
Successfully opened dynamic library libcurand.so.10
Successfully opened dynamic library libcusolver.so.10
Successfully opened dynamic library libcusparse.so.10
Successfully opened dynamic library libcudnn.so.7
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Adding visible gpu devices: 0, 1
CPU Frequency: 3792880000 Hz
XLA service 0x5644b92a4940 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
StreamExecutor device (0): Host, Default Version
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Found device 1 with properties:
pciBusID: 0000:0a:00.0 name: GeForce GTX 1060 6GB computeCapability: 6.1
coreClock: 1.7085GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
Successfully opened dynamic library libcudart.so.10.1
Successfully opened dynamic library libcublas.so.10
Successfully opened dynamic library libcufft.so.10
Successfully opened dynamic library libcurand.so.10
Successfully opened dynamic library libcusolver.so.10
Successfully opened dynamic library libcusparse.so.10
Successfully opened dynamic library libcudnn.so.7
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Adding visible gpu devices: 0, 1
Successfully opened dynamic library libcudart.so.10.1
Device interconnect StreamExecutor with strength 1 edge matrix:
0 1
0:   N N
1:   N N
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7526 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 750 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1060 6GB, pci bus id: 0000:0a:00.0, compute capability: 6.1)
XLA service 0x5644b9e632f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
StreamExecutor device (1): GeForce GTX 1060 6GB, Compute Capability 6.1
2020-07-15 15:07:42,363 [mlp/mltr30k]:114 INFO Removing existing checkpoint directory: data/mltr30k/models/mlp
The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Successfully opened dynamic library libcublas.so.10
Model: "simple_scoring_network"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                multiple                  35072
_________________________________________________________________
dense_1 (Dense)              multiple                  32896
_________________________________________________________________
dense_2 (Dense)              multiple                  8256
_________________________________________________________________
dense_3 (Dense)              multiple                  2080
_________________________________________________________________
dense_4 (Dense)              multiple                  528
_________________________________________________________________
p_re_lu (PReLU)              multiple                  256
_________________________________________________________________
p_re_lu_1 (PReLU)            multiple                  128
_________________________________________________________________
p_re_lu_2 (PReLU)            multiple                  64
_________________________________________________________________
p_re_lu_3 (PReLU)            multiple                  32
_________________________________________________________________
p_re_lu_4 (PReLU)            multiple                  16
_________________________________________________________________
batch_normalization (BatchNo multiple                  1024
_________________________________________________________________
batch_normalization_1 (Batch multiple                  512
_________________________________________________________________
batch_normalization_2 (Batch multiple                  256
_________________________________________________________________
batch_normalization_3 (Batch multiple                  128
_________________________________________________________________
batch_normalization_4 (Batch multiple                  64
_________________________________________________________________
dense_5 (Dense)              multiple                  17
=================================================================
Total params: 81,329
Trainable params: 80,337
Non-trainable params: 992
_________________________________________________________________
epoch:     1 step:      572 elapsed time:   55.70s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.3153 val/ndcg@01:     0.4346 val/ndcg@05:     0.4204 val/ndcg@10:     0.4339 *
epoch:     2 step:     1144 elapsed time:   65.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.7247 val/ndcg@01:     0.4354 val/ndcg@05:     0.4200 val/ndcg@10:     0.4330
epoch:     3 step:     1716 elapsed time:   74.71s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8346 val/ndcg@01:     0.4420 val/ndcg@05:     0.4249 val/ndcg@10:     0.4382 *
epoch:     4 step:     2288 elapsed time:   84.11s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.8933 val/ndcg@01:     0.4410 val/ndcg@05:     0.4240 val/ndcg@10:     0.4374
epoch:     5 step:     2860 elapsed time:   93.55s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9284 val/ndcg@01:     0.4395 val/ndcg@05:     0.4240 val/ndcg@10:     0.4372
epoch:     6 step:     3432 elapsed time:  102.87s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -22.9926 val/ndcg@01:     0.4449 val/ndcg@05:     0.4258 val/ndcg@10:     0.4380 *
epoch:     7 step:     4004 elapsed time:  112.29s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0165 val/ndcg@01:     0.4373 val/ndcg@05:     0.4222 val/ndcg@10:     0.4347
epoch:     8 step:     4576 elapsed time:  121.80s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0517 val/ndcg@01:     0.4457 val/ndcg@05:     0.4265 val/ndcg@10:     0.4369 *
epoch:     9 step:     5148 elapsed time:  131.23s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0593 val/ndcg@01:     0.4507 val/ndcg@05:     0.4302 val/ndcg@10:     0.4413 *
epoch:    10 step:     5720 elapsed time:  141.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.0921 val/ndcg@01:     0.4538 val/ndcg@05:     0.4307 val/ndcg@10:     0.4418 *
epoch:    11 step:     6292 elapsed time:  150.60s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1165 val/ndcg@01:     0.4502 val/ndcg@05:     0.4325 val/ndcg@10:     0.4451 *
epoch:    12 step:     6864 elapsed time:  160.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1178 val/ndcg@01:     0.4480 val/ndcg@05:     0.4290 val/ndcg@10:     0.4391
epoch:    13 step:     7436 elapsed time:  169.72s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1490 val/ndcg@01:     0.4510 val/ndcg@05:     0.4311 val/ndcg@10:     0.4431
epoch:    14 step:     8008 elapsed time:  179.18s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.1745 val/ndcg@01:     0.4464 val/ndcg@05:     0.4288 val/ndcg@10:     0.4419
epoch:    15 step:     8580 elapsed time:  188.54s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2013 val/ndcg@01:     0.4516 val/ndcg@05:     0.4307 val/ndcg@10:     0.4425
epoch:    16 step:     9152 elapsed time:  197.89s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2076 val/ndcg@01:     0.4500 val/ndcg@05:     0.4312 val/ndcg@10:     0.4422
epoch:    17 step:     9724 elapsed time:  207.25s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2225 val/ndcg@01:     0.4500 val/ndcg@05:     0.4311 val/ndcg@10:     0.4420
epoch:    18 step:    10296 elapsed time:  216.38s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2415 val/ndcg@01:     0.4481 val/ndcg@05:     0.4305 val/ndcg@10:     0.4432
epoch:    19 step:    10868 elapsed time:  225.66s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2323 val/ndcg@01:     0.4468 val/ndcg@05:     0.4275 val/ndcg@10:     0.4384
epoch:    20 step:    11440 elapsed time:  235.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2630 val/ndcg@01:     0.4514 val/ndcg@05:     0.4314 val/ndcg@10:     0.4419
epoch:    21 step:    12012 elapsed time:  244.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2877 val/ndcg@01:     0.4522 val/ndcg@05:     0.4313 val/ndcg@10:     0.4419
epoch:    22 step:    12584 elapsed time:  253.98s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2917 val/ndcg@01:     0.4547 val/ndcg@05:     0.4338 val/ndcg@10:     0.4458 *
epoch:    23 step:    13156 elapsed time:  263.74s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.2975 val/ndcg@01:     0.4489 val/ndcg@05:     0.4332 val/ndcg@10:     0.4457
epoch:    24 step:    13728 elapsed time:  273.34s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3056 val/ndcg@01:     0.4498 val/ndcg@05:     0.4330 val/ndcg@10:     0.4431
epoch:    25 step:    14300 elapsed time:  282.71s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3178 val/ndcg@01:     0.4488 val/ndcg@05:     0.4304 val/ndcg@10:     0.4413
epoch:    26 step:    14872 elapsed time:  292.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3177 val/ndcg@01:     0.4549 val/ndcg@05:     0.4357 val/ndcg@10:     0.4476 *
epoch:    27 step:    15444 elapsed time:  301.58s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3279 val/ndcg@01:     0.4552 val/ndcg@05:     0.4362 val/ndcg@10:     0.4474 *
epoch:    28 step:    16016 elapsed time:  310.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3334 val/ndcg@01:     0.4536 val/ndcg@05:     0.4335 val/ndcg@10:     0.4455
epoch:    29 step:    16588 elapsed time:  320.14s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3563 val/ndcg@01:     0.4539 val/ndcg@05:     0.4329 val/ndcg@10:     0.4420
epoch:    30 step:    17160 elapsed time:  329.37s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3635 val/ndcg@01:     0.4486 val/ndcg@05:     0.4329 val/ndcg@10:     0.4449
epoch:    31 step:    17732 elapsed time:  338.80s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3730 val/ndcg@01:     0.4526 val/ndcg@05:     0.4337 val/ndcg@10:     0.4434
epoch:    32 step:    18304 elapsed time:  348.49s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3800 val/ndcg@01:     0.4538 val/ndcg@05:     0.4334 val/ndcg@10:     0.4441
epoch:    33 step:    18876 elapsed time:  358.01s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.3976 val/ndcg@01:     0.4571 val/ndcg@05:     0.4357 val/ndcg@10:     0.4465
epoch:    34 step:    19448 elapsed time:  368.04s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4098 val/ndcg@01:     0.4560 val/ndcg@05:     0.4372 val/ndcg@10:     0.4487 *
epoch:    35 step:    20020 elapsed time:  377.64s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4139 val/ndcg@01:     0.4543 val/ndcg@05:     0.4332 val/ndcg@10:     0.4446
epoch:    36 step:    20592 elapsed time:  387.52s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4102 val/ndcg@01:     0.4525 val/ndcg@05:     0.4319 val/ndcg@10:     0.4434
epoch:    37 step:    21164 elapsed time:  396.94s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4245 val/ndcg@01:     0.4545 val/ndcg@05:     0.4348 val/ndcg@10:     0.4460
epoch:    38 step:    21736 elapsed time:  406.39s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4284 val/ndcg@01:     0.4547 val/ndcg@05:     0.4354 val/ndcg@10:     0.4460
epoch:    39 step:    22308 elapsed time:  415.73s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4539 val/ndcg@01:     0.4532 val/ndcg@05:     0.4338 val/ndcg@10:     0.4449
epoch:    40 step:    22880 elapsed time:  425.23s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4506 val/ndcg@01:     0.4548 val/ndcg@05:     0.4348 val/ndcg@10:     0.4459
epoch:    41 step:    23452 elapsed time:  434.54s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4627 val/ndcg@01:     0.4553 val/ndcg@05:     0.4373 val/ndcg@10:     0.4481 *
epoch:    42 step:    24024 elapsed time:  444.01s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4617 val/ndcg@01:     0.4589 val/ndcg@05:     0.4357 val/ndcg@10:     0.4459
epoch:    43 step:    24596 elapsed time:  453.67s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4611 val/ndcg@01:     0.4524 val/ndcg@05:     0.4373 val/ndcg@10:     0.4480 *
epoch:    44 step:    25168 elapsed time:  463.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4695 val/ndcg@01:     0.4537 val/ndcg@05:     0.4358 val/ndcg@10:     0.4457
epoch:    45 step:    25740 elapsed time:  472.93s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4830 val/ndcg@01:     0.4488 val/ndcg@05:     0.4322 val/ndcg@10:     0.4437
epoch:    46 step:    26312 elapsed time:  482.46s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4952 val/ndcg@01:     0.4538 val/ndcg@05:     0.4350 val/ndcg@10:     0.4458
epoch:    47 step:    26884 elapsed time:  491.67s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5006 val/ndcg@01:     0.4549 val/ndcg@05:     0.4347 val/ndcg@10:     0.4434
epoch:    48 step:    27456 elapsed time:  501.16s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5132 val/ndcg@01:     0.4576 val/ndcg@05:     0.4364 val/ndcg@10:     0.4452
epoch:    49 step:    28028 elapsed time:  510.30s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.4997 val/ndcg@01:     0.4517 val/ndcg@05:     0.4339 val/ndcg@10:     0.4442
epoch:    50 step:    28600 elapsed time:  519.86s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5144 val/ndcg@01:     0.4553 val/ndcg@05:     0.4349 val/ndcg@10:     0.4452
epoch:    51 step:    29172 elapsed time:  529.36s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5232 val/ndcg@01:     0.4573 val/ndcg@05:     0.4378 val/ndcg@10:     0.4480 *
epoch:    52 step:    29744 elapsed time:  538.90s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5286 val/ndcg@01:     0.4559 val/ndcg@05:     0.4356 val/ndcg@10:     0.4457
epoch:    53 step:    30316 elapsed time:  548.49s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5244 val/ndcg@01:     0.4560 val/ndcg@05:     0.4348 val/ndcg@10:     0.4438
epoch:    54 step:    30888 elapsed time:  557.92s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5391 val/ndcg@01:     0.4554 val/ndcg@05:     0.4349 val/ndcg@10:     0.4462
epoch:    55 step:    31460 elapsed time:  567.47s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5467 val/ndcg@01:     0.4525 val/ndcg@05:     0.4376 val/ndcg@10:     0.4481
epoch:    56 step:    32032 elapsed time:  576.73s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5385 val/ndcg@01:     0.4533 val/ndcg@05:     0.4356 val/ndcg@10:     0.4463
epoch:    57 step:    32604 elapsed time:  586.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5534 val/ndcg@01:     0.4554 val/ndcg@05:     0.4358 val/ndcg@10:     0.4445
epoch:    58 step:    33176 elapsed time:  595.55s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5661 val/ndcg@01:     0.4579 val/ndcg@05:     0.4377 val/ndcg@10:     0.4471
epoch:    59 step:    33748 elapsed time:  604.95s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5698 val/ndcg@01:     0.4547 val/ndcg@05:     0.4355 val/ndcg@10:     0.4447
epoch:    60 step:    34320 elapsed time:  614.23s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5676 val/ndcg@01:     0.4545 val/ndcg@05:     0.4370 val/ndcg@10:     0.4467
epoch:    61 step:    34892 elapsed time:  623.53s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5695 val/ndcg@01:     0.4577 val/ndcg@05:     0.4370 val/ndcg@10:     0.4465
epoch:    62 step:    35464 elapsed time:  632.85s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5760 val/ndcg@01:     0.4556 val/ndcg@05:     0.4355 val/ndcg@10:     0.4461
epoch:    63 step:    36036 elapsed time:  642.41s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5866 val/ndcg@01:     0.4545 val/ndcg@05:     0.4362 val/ndcg@10:     0.4454
epoch:    64 step:    36608 elapsed time:  651.84s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.5965 val/ndcg@01:     0.4538 val/ndcg@05:     0.4330 val/ndcg@10:     0.4427
epoch:    65 step:    37180 elapsed time:  661.32s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6041 val/ndcg@01:     0.4570 val/ndcg@05:     0.4359 val/ndcg@10:     0.4450
epoch:    66 step:    37752 elapsed time:  670.85s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6030 val/ndcg@01:     0.4563 val/ndcg@05:     0.4359 val/ndcg@10:     0.4435
epoch:    67 step:    38324 elapsed time:  680.68s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6110 val/ndcg@01:     0.4534 val/ndcg@05:     0.4352 val/ndcg@10:     0.4433
epoch:    68 step:    38896 elapsed time:  690.12s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6078 val/ndcg@01:     0.4587 val/ndcg@05:     0.4376 val/ndcg@10:     0.4473
epoch:    69 step:    39468 elapsed time:  699.47s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6219 val/ndcg@01:     0.4570 val/ndcg@05:     0.4378 val/ndcg@10:     0.4486
epoch:    70 step:    40040 elapsed time:  709.03s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6286 val/ndcg@01:     0.4560 val/ndcg@05:     0.4369 val/ndcg@10:     0.4470
epoch:    71 step:    40612 elapsed time:  718.48s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6275 val/ndcg@01:     0.4567 val/ndcg@05:     0.4366 val/ndcg@10:     0.4467
epoch:    72 step:    41184 elapsed time:  727.83s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6394 val/ndcg@01:     0.4535 val/ndcg@05:     0.4355 val/ndcg@10:     0.4454
epoch:    73 step:    41756 elapsed time:  737.13s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6322 val/ndcg@01:     0.4561 val/ndcg@05:     0.4361 val/ndcg@10:     0.4468
epoch:    74 step:    42328 elapsed time:  746.64s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6475 val/ndcg@01:     0.4567 val/ndcg@05:     0.4377 val/ndcg@10:     0.4481
epoch:    75 step:    42900 elapsed time:  756.38s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6417 val/ndcg@01:     0.4575 val/ndcg@05:     0.4386 val/ndcg@10:     0.4490 *
epoch:    76 step:    43472 elapsed time:  766.21s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6540 val/ndcg@01:     0.4534 val/ndcg@05:     0.4343 val/ndcg@10:     0.4448
epoch:    77 step:    44044 elapsed time:  775.84s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6434 val/ndcg@01:     0.4586 val/ndcg@05:     0.4369 val/ndcg@10:     0.4463
epoch:    78 step:    44616 elapsed time:  785.16s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6634 val/ndcg@01:     0.4575 val/ndcg@05:     0.4369 val/ndcg@10:     0.4471
epoch:    79 step:    45188 elapsed time:  794.76s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6610 val/ndcg@01:     0.4529 val/ndcg@05:     0.4358 val/ndcg@10:     0.4459
epoch:    80 step:    45760 elapsed time:  804.44s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6525 val/ndcg@01:     0.4567 val/ndcg@05:     0.4363 val/ndcg@10:     0.4456
epoch:    81 step:    46332 elapsed time:  814.03s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6603 val/ndcg@01:     0.4563 val/ndcg@05:     0.4372 val/ndcg@10:     0.4466
epoch:    82 step:    46904 elapsed time:  823.48s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6663 val/ndcg@01:     0.4552 val/ndcg@05:     0.4353 val/ndcg@10:     0.4447
epoch:    83 step:    47476 elapsed time:  832.78s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6595 val/ndcg@01:     0.4569 val/ndcg@05:     0.4363 val/ndcg@10:     0.4463
epoch:    84 step:    48048 elapsed time:  842.22s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6672 val/ndcg@01:     0.4581 val/ndcg@05:     0.4372 val/ndcg@10:     0.4464
epoch:    85 step:    48620 elapsed time:  852.10s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6790 val/ndcg@01:     0.4564 val/ndcg@05:     0.4370 val/ndcg@10:     0.4458
epoch:    86 step:    49192 elapsed time:  861.79s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6808 val/ndcg@01:     0.4540 val/ndcg@05:     0.4354 val/ndcg@10:     0.4450
epoch:    87 step:    49764 elapsed time:  871.32s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6722 val/ndcg@01:     0.4548 val/ndcg@05:     0.4367 val/ndcg@10:     0.4454
epoch:    88 step:    50336 elapsed time:  880.54s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6869 val/ndcg@01:     0.4539 val/ndcg@05:     0.4346 val/ndcg@10:     0.4430
epoch:    89 step:    50908 elapsed time:  889.92s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6948 val/ndcg@01:     0.4560 val/ndcg@05:     0.4362 val/ndcg@10:     0.4451
epoch:    90 step:    51480 elapsed time:  899.43s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7077 val/ndcg@01:     0.4560 val/ndcg@05:     0.4368 val/ndcg@10:     0.4457
epoch:    91 step:    52052 elapsed time:  909.41s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7078 val/ndcg@01:     0.4546 val/ndcg@05:     0.4362 val/ndcg@10:     0.4459
epoch:    92 step:    52624 elapsed time:  919.41s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.6988 val/ndcg@01:     0.4546 val/ndcg@05:     0.4358 val/ndcg@10:     0.4450
epoch:    93 step:    53196 elapsed time:  929.34s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7232 val/ndcg@01:     0.4574 val/ndcg@05:     0.4368 val/ndcg@10:     0.4454
epoch:    94 step:    53768 elapsed time:  939.17s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7228 val/ndcg@01:     0.4566 val/ndcg@05:     0.4376 val/ndcg@10:     0.4473
epoch:    95 step:    54340 elapsed time:  949.05s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7144 val/ndcg@01:     0.4554 val/ndcg@05:     0.4365 val/ndcg@10:     0.4473
epoch:    96 step:    54912 elapsed time:  958.67s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7267 val/ndcg@01:     0.4576 val/ndcg@05:     0.4378 val/ndcg@10:     0.4478
epoch:    97 step:    55484 elapsed time:  967.99s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7260 val/ndcg@01:     0.4571 val/ndcg@05:     0.4370 val/ndcg@10:     0.4460
epoch:    98 step:    56056 elapsed time:  977.39s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7383 val/ndcg@01:     0.4521 val/ndcg@05:     0.4351 val/ndcg@10:     0.4449
epoch:    99 step:    56628 elapsed time:  986.71s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7304 val/ndcg@01:     0.4545 val/ndcg@05:     0.4380 val/ndcg@10:     0.4495
epoch:   100 step:    57200 elapsed time:  996.16s train time:   0.00s secs/step:  0.000 val time:   0.00 train/loss:   -23.7328 val/ndcg@01:     0.4552 val/ndcg@05:     0.4351 val/ndcg@10:     0.4448
test/ndcg@01:     0.4596 test/ndcg@05:     0.4430 test/ndcg@10:     0.4531