deletor.random package¶
Submodules¶
deletor.random.sample module¶
Documentation
-
class
deletor.random.sample.DocumentSampler(sample_size: int, n_samples: Optional[int] = None, multiple: Optional[int] = None, sample_pre_batch: bool = False, pad_value: float = - 3.4028235e+38)[source]¶ Bases:
abc.ABCA base class for various methods of sampling a set of documents from a batch.
-
sample(x: Dict[str, tensorflow.python.framework.ops.Tensor], y: tensorflow.python.framework.ops.Tensor, w=None, **kwargs)[source]¶
-
-
class
deletor.random.sample.IndependentMultiOutputSampler(sample_size: int, n_samples: Optional[int] = None, multiple: Optional[int] = None, sample_pre_batch: bool = False, pad_value: float = - 3.4028235e+38)[source]¶ Bases:
deletor.random.sample.DocumentSamplerA sampling method that generates samples independently from the others and guarantees that no document is included in a sample more than once as long as the
sample_size(i.e., group size) is less than the number of documents. However, the frequency of each document is not guaranteed to be the same and there is the potential for some documents to be excluded completely.After applying this sampler, the input data (\(X\)) dictionary will have 3 new entries and the \(y\) value will be a 2 element tuple.
The new \(X\) entries are:
- sample_dense
Contains the sampled documents (the original feature tensor(s) are preserved in the
sequential_denseentry).
- scatter_idx
Contains a set of indexes for use with
tensorflow.scatter_nd()(ortensorflow.gather_nd()) to map the order of sampled documents back to their original order in the query. This is useful/essential for aggregating the scores of each document in a query.
- document_counts
Contains a tensor that keeps track of how many times each document has been sampled. This is useful if we want to average the scores over documents in a query instead of summing them.
- Parameters
sample_size – The number of documents in each sample (i.e., the group size).
n_samples – The number of samples to generate.
multiple – The class will generate \(n H_{n}\) samples by default. See the coupon collector’s problem for the meaning of \(n H_{n}\). You can increase the number of samples by a multiple of this value with this parameter.
sample_pre_batch – Whether the sampler assumes the input has been padded and batched already or not. You almost certainly want this to be
False.pad_value – The value used to pad entries in the tensor (from
tensorflow.data.Dataset.padded_batch()).
-
sample_after_batching(x: Dict[str, tensorflow.python.framework.ops.Tensor], y: tensorflow.python.framework.ops.Tensor, w=None, **kwargs)[source]¶ Each sample is generated independently of the other samples. Each sample is guaranteed to have unique documents, however, not every document is guaranteed to be included in the output and the frequency of some documents may be more (or less) than others.
- Parameters
x –
y –
w –
- Returns
-
sample_before_batching(x: Dict[str, tensorflow.python.framework.ops.Tensor], y: tensorflow.python.framework.ops.Tensor, w=None, **kwargs)[source]¶ Each sample is generated independently of the other samples. Each sample is guaranteed to have unique documents, however, not every document is guaranteed to be included in the output and the frequency of some documents may be more (or less) than others.
- Parameters
x –
y –
w –
- Returns
-
class
deletor.random.sample.IndependentSingleOutputSampler(sample_size: int, n_samples: Optional[int] = None, multiple: Optional[int] = None, sample_pre_batch: bool = False, pad_value: float = - 3.4028235e+38)[source]¶ Bases:
deletor.random.sample.IndependentMultiOutputSamplerA sampling method that generates samples independently from the others and guarantees that no document is included in a sample more than once as long as the
sample_size(i.e., group size) is less than the number of documents. However, the frequency of each document is not guaranteed to be the same and there is the potential for some documents to be excluded completely.After applying this sampler, the input data (\(X\)) dictionary will have 3 new entries and the \(y\) value will be a 2 element tuple.
The new \(X\) entries are:
- sample_dense
Contains the sampled documents (the original feature tensor(s) are preserved in the
sequential_denseentry).
- scatter_idx
Contains a set of indexes for use with
tensorflow.scatter_nd()(ortensorflow.gather_nd()) to map the order of sampled documents back to their original order in the query. This is useful/essential for aggregating the scores of each document in a query.
- document_counts
Contains a tensor that keeps track of how many times each document has been sampled. This is useful if we want to average the scores over documents in a query instead of summing them.
- Parameters
sample_size – The number of documents in each sample (i.e., the group size).
n_samples – The number of samples to generate.
multiple –
The class will generate \(n H_{n}\) samples by default. See the coupon collector’s problem for the meaning of \(n H_{n}\). You can increase the number of samples by a multiple of this value with this parameter.
sample_pre_batch – Whether the sampler assumes the input has been padded and batched already or not. You almost certainly want this to be
False.pad_value – The value used to pad entries in the tensor (from
tensorflow.data.Dataset.padded_batch()).