DRecPy.Recommender package

DRecPy.Recommender.cdae module

Implementation of the CDAE model (Collaborative Denoising Auto-Encoder). Paper: Wu, Yao, et al. “Collaborative denoising auto-encoders for top-n recommender systems.” Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 2016.

Note: gradients are evaluated for all output units (the output unit selection step discussed in the paper is not done).

class DRecPy.Recommender.cdae.CDAE(hidden_factors=50, corruption_level=0.2, loss='bce', **kwds)

Bases: DRecPy.Recommender.recommender_abc.RecommenderABC

Collaborative Denoising Auto-Encoder (CDAE) recommender model.

Parameters:
  • hidden_factors – An integer defining the number of units for the hidden layer.
  • corruption_level – A decimal value representing the level of corruption to apply to the given interactions / ratings during training.
  • loss – A string that represents the loss function used to optimize the model. Supported: mse, bce. Default: bce.

For more arguments, refer to the base class: DRecPy.Recommender.RecommenderABC.

DRecPy.Recommender.dmf module

Implementation of the DMF model (Deep Matrix Factorization) Paper: Xue, Hong-Jian, et al. “Deep Matrix Factorization Models for Recommender Systems.” IJCAI. 2017.

class DRecPy.Recommender.dmf.DMF(user_factors=None, item_factors=None, use_nce=True, l2_norm_vectors=True, **kwds)

Bases: DRecPy.Recommender.recommender_abc.RecommenderABC

Deep Matrix Factorization (DMF) recommender model.

Parameters:
  • user_factors – A list containing the number of hidden neurons in each layer of the user NN. Default: [64, 32].
  • item_factors – A list containing the number of hidden neurons in each layer of the item NN. Default: [64, 32].
  • use_nce – A boolean indicating whether to use the normalized cross-entropy described in the paper as the loss function or the regular cross-entropy. Default: true.
  • l2_norm_vectors – A boolean indicating if user and item interaction vectors should be l2 normalized before being used as input for their respective NNs or not. Default: true.

For more arguments, refer to the base class: DRecPy.Recommender.RecommenderABC.

DRecPy.Recommender.caser module

Implementation of the Caser model. Paper: Tang, Jiaxi, and Ke Wang. “Personalized top-n sequential recommendation via convolutional sequence embedding.” Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018.

class DRecPy.Recommender.caser.Caser(L=5, T=3, d=50, n_v=4, n_h=16, act_h=<function relu>, act_mlp=<function relu>, dropout_rate=0.5, sort_column='timestamp', **kwds)

Bases: DRecPy.Recommender.recommender_abc.RecommenderABC

Caser recommender model.

Parameters:
  • L – An integer representing the sequence length. Default: 5.
  • T – An integer representing the number of targets. Default: 3.
  • d – An integer representing the number of latent dimensions. Default: 50.
  • n_v – An integer representing the number of vertical filters. Default: 4.
  • n_h – An integer representing the number of horizontal filters. Default: 16.
  • act_h – The activation function used for the horizontal convolutional layer. Default: tf.nn.relu.
  • act_mlp – The activation function used for the dense layer. Default: tf.nn.relu.
  • dropout_rate – The dropout ratio when performing dropout between the convolutional and dense layers. Default: 0.5.
  • sort_column – An optional string representing the name of the column used to sort the sequence records. If none is provided, the natural order (present in the data set) will be preserved. Default: ‘timestamp’.

For more arguments, refer to the base class: DRecPy.Recommender.RecommenderABC.

DRecPy.Recommender.recommender_abc module

class DRecPy.Recommender.recommender_abc.RecommenderABC(**kwds)

Bases: abc.ABC

Base recommender abstract class.

This class implements the skeleton methods required for building a recommender. It provides id-abstraction (handles conversion between raw to internal ids, auto identifier validation (if a given user/item is known or not), automatic progress logging, weight updates, tracking loss per epoch and support for other features such as epoch callbacks and early stopping. It has a structure that allows it to be fully extensible, whilst promoting model specific behavior for improved flexibility. All private methods are called with internal ids only, and all public methods must be called with raw ids only.

The following methods are still required to be implemented: _pre_fit(), _sample_batch(), _predict_batch(), _compute_batch_loss() and _predict(). If there are no trainable variables set during the _pre_fit(), batch training is skipped (useful for non-deep learning models). Trainable variables can be registered via _register_trainable() or _register_trainables().

If the provided trainable variables are of type tf.keras.models.Model or tf.keras.layers.Layer, then the regularizers added when instantiating these variables (e.g. via kernel_regularizer attribute of a layer) will be used to automatically compute the regularization loss. To add a custom regularization loss (or if the implemented model uses tf.Variables), implement the self._compute_reg_loss().

Optionally, these methods can be overridden: _rank() and _recommend().

Parameters:
  • verbose – Optional boolean indicating if the recommender should print progress logs or not. Default: True.
  • log_file – Optional boolean indicating if a file containing all produced logs should be created or not. It will be created on the current directory, following the pattern: drecpy__DATE_TIME_RECNAME.log. Default: False.
  • interaction_threshold – An optional integer that is used as the boundary interaction value between positive and negative interaction pairs. All values above or equal interaction_threshold are considered positive, and all values bellow are considered negative. Default: 0.001.
  • seed (max_rating) – Optional integer representing the seed value for the model pseudo-random number generator. Default: None.
fit(interaction_dataset, epochs=50, batch_size=32, learning_rate=0.001, neg_ratio=5, reg_rate=0.001, copy_dataset=False, **kwds)

Processes the provided dataframe and builds id-abstraction, infers min. and max. interactions (if not passed through constructor) and calls _fit() to fit the current model.

Parameters:
  • interaction_dataset – A interactionsDataset instance containing the training data.
  • epochs – Optional number of epochs to train the model. Default: 50.
  • batch_size – Optional number of data points to use for each epoch to train the model. Default: 32.
  • learning_rate – Optional decimal representing the learning rate of the model. Default: 0.001.
  • neg_ratio – Optional integer that represents the number of negative instances for each positive one. Default: 5.
  • reg_rate – Optional decimal representing the model regularization rate. Default: 0.01.
  • epoch_callback_fn – Optional function that is called, for each epoch_callback_freq, with the model at its current state. It receives one argument - the model at its current state - and should return a dict mapping each metric’s name to the corresponding value. The results will be displayed in a graph at the end of the model fit and during the fit process on the logged progress bar description only if verbose is set to True. Default: None
  • epoch_callback_freq – Optional integer representing the frequency in which the epoch_callback_fn is called. If epoch_callback_fn is not defined, this parameter is ignored. Default: 5 (called every 5 epochs).
  • early_stopping_rule – Optional instance of EarlyStoppingRuleABC that will be used to compute the early stopping epoch and according to how the rule works, it might stop the training before achieving the total number of epochs. Since this uses epoch callbacks, the rule will only be used if an epoch_callback_fn is defined. Default: None
  • early_stopping_freq – Optional integer representing the frequency in which the early_stopping_rule is computed. If early_stopping_rule is not defined, this parameter is ignored. Default: 5 (called every 5 epochs).
  • copy_dataset – Optional boolean indicating weather a copy of the given dataset should be made. If set to False, the given dataset instance is used. Default: False.
  • optimizer – Optional instance of a tf/keras optimizer that will be used even if there’s a model specific optimizer. Default: Adam optimizer with the learning rate set with the value provided in the learning_rate argument; if there’s a model specific optimizer (set during the model’s _pre_fit), this default optimizer will not be used.
Returns:

None.

static load(load_path)

Load/import a saved/exported model.

Parameters:load_path – A string that represents the path to the saved/exported model.
Returns:Recommender model.
predict(user_id, item_id, skip_errors=False, **kwds)

Performs a prediction using the provided user_id and item_id.

Parameters:
  • user_id – An integer representing the raw user id.
  • item_id – An integer representing the raw item id.
  • skip_errors – A boolean that controls if errors should be avoided or if they should be be thrown. Default: False. An example would be calling predict(None, None): If skip_errors is True, then it would return None; else it would throw an error.
Returns:

A float value representing the predicted interaction for the provided item, user pair. Or None, if an error occurs and skip_errors = True.

rank(user_id, item_ids, novelty=True, skip_invalid_items=True, **kwds)

Ranks the provided item list for the given user and with the requested characteristics.

Parameters:
  • user_id – A string or integer representing the user id.
  • item_ids – A list of strings or integers representing the ids of the items to rank.
  • novelty – Optional boolean indicating if we only novelty recommendations or not. Default: True.
  • n – Optional integer representing the number of best items to return. Default: len(item_ids).
  • skip_invalid_items – Optional boolean indicating if invalid items should be skipped. If set to False, will throw an exception when one is found. Default: True.
Returns:

A ranked item list in the form of (similarity, item) tuples.

recommend(user_id, n=None, novelty=True, interaction_threshold=None, **kwds)

Computes a recommendation list for the given user and with the requested characteristics.

Parameters:
  • user_id – A string or integer representing the user id.
  • n – An integer representing the number of recommended items.
  • novelty – An optional boolean indicating if we only novelty recommendations or not. Default: True.
  • interaction_threshold – Optional float value that represents the value required to consider an item to be a useful recommendation Default: None.
Returns:

A list containing recommendations in the form of (similarity, item) tuples.

save(save_path)

Save/export the current model.

Parameters:save_path – A string that represents the path in which the model will be saved.
Returns:None.