Deletor: Deep Learning To Rank

Deletor is a toolkit for developing learning to rank algorithms using deep neural networks. It is similar to tensorflow_ranking (and borrows some code from it), but it is implemented using tensorflow 2, which is generally easier to work with.

A few reasons you might want to use this instead of tensorflow_ranking:

  • It is written using TensorFlow 2, which is easier to use than TensorFlow 1 in general.

  • Writing, training and evaluating modules is done using standard TensorFlow and Keras functionality, instead of complicated and hard to follow callback mechanisms.

  • Because everything is based off of standard TensorFlow/Keras functionality it integrates easily with TensorBoard without extra effort.

  • It achieves state of the art or near state of the art performance for neural network ranking methods.

Despite these advantages, there are also a lot of caveats:

  • It is smaller in scope.

  • It is missing significant functionality.

  • It is in a state of flux with dead with broken code designed for earlier versions, which has since changed.

  • It is not particularly well documented.

  • It is developed by a single developer as a hobby (who’s area of expertise is not information retrieval) and does not have the resources to thoroughly test and document everything.

  • Gradient boosting machines achieve substantially better on this task (for this dataset).

With that said, there are 4 primary building blocks for implementing a ranking system using this toolkit.

  1. Prepare the data (this is probably the hardest part).

  2. Create a Keras Model that produces scores for each document.

  3. Use a loss function from the deletor.losses module to train the model.

  4. Use a set of metric functions from the deletor.metrics module to evaluate the model.

Training and evaluation can be performed using a custom loop or via the built in Keras methods.

Performance

The performance of the 3 primary models implemented in this toolkit, using an approximate NDCG loss, is given in the table below. These results are in line with the reported values in current state of the art neural network ranking algorithms.

Model

NDCG@1

NDCG@5

NDCG@10

MLP

45.96

44.30

45.31

GSF

46.38

45.63

47.52

GASF

44.46

43.97

45.94

Dependencies

The primary dependencies are:

  • sklearn

  • TensorFlow

For a more complete list of dependencies see the requirements.txt file.

Install

The project is not currently on PyPI. The easiest way to use or install the project is to clone it or via pip using its native Git support:

pip install git+https://bitbucket.org/reidswanson/deletor.git

Indices and tables