DRecPy.Evaluation.Processes package¶

DRecPy.Evaluation.Processes.predictive_evaluation module¶

DRecPy.Evaluation.Processes.predictive_evaluation.predictive_evaluation(model, ds_test=None, count_none_predictions=False, n_test_predictions=None, skip_errors=True, **kwds)¶

Executes a predictive evaluation process, where the given model will be evaluated under the provided settings.

Parameters:

model – An instance of a Recommender to be evaluated.
ds_test – An optional test InteractionDataset. If none is provided, then the test data will be the model training data. Evaluating on train data is not ideal for assessing the model’s performance.
count_none_predictions – An optional boolean indicating whether to count none predictions (i.e. when the model predicts None, count it as being a 0) or not (i.e. skip that user-item pair). Default: False.
n_test_predictions – An optional integer representing the number of predictions to evaluate. Default: predict for every (user, item) pair on the test dataset.
skip_errors – A boolean indicating whether to ignore errors produced during the predict calls, or not. Default: False.
metrics – An optional list containing instances of PredictiveMetricABC. Default: [RMSE(), MSE()].
verbose – A boolean indicating whether state logs should be produced or not. Default: true.

Returns:

A dict containing each metric name mapping to the corresponding metric value.

DRecPy.Evaluation.Processes.ranking_evaluation module¶

DRecPy.Evaluation.Processes.ranking_evaluation.ranking_evaluation(model, ds_test=None, n_test_users=None, k=10, n_pos_interactions=None, n_neg_interactions=None, generate_negative_pairs=False, novelty=False, seed=0, max_concurrent_threads=4, **kwds)¶

Executes a ranking evaluation process, where the given model will be evaluated under the provided settings. This function is not thread-safe (i.e. concurrent calls might produce unexpected results). Instead of trying this, increase the max_concurrent_threads argument to speed up the process (if you’ve the available cores).

Parameters:

model – An instance of a Recommender to be evaluated.
ds_test – An optional test InteractionDataset. If none is provided, then the test data will be the model training data. Evaluating on train data is not ideal for assessing the model’s performance.
n_test_users – An optional integer representing the number of users to evaluate the produced rankings. Default: Number of unique users of the provided test dataset.
k – An optional integer (or a list of integers) representing the truncation factor (keep the first k elements for each ranked list), which then affects the produced metric evaluation. Default: 10.
n_pos_interactions – The number of positive interactions to sample into the list that is going to be ranked and evaluated for each user. If for a given user, there’s less than n_pos_interactions positive interactions, the user’s evaluation will be skipped. When this argument is not provided, all positive interactions on the test set from each user will be sampled. Default: None.
n_neg_interactions – The max. number of negative interactions to sample into the list that is going to be ranked and evaluated for each user. If a float value is provided, then the max. number of sampled negative interactions will be the percentage of positive interactions present on each user’s test set. If this argument is not defined, the sampled negative interactions will be all negative interactions present on each user’s test set. Default: None.
generate_negative_pairs – An optional boolean that controls whether negative interaction pairs should also be generated (interaction pairs not present on the train or test data sets are also sampled) or not (i.e. only gathered from the test data set, where interaction values are bellow than the interaction_threshold). If this parameter is set to True, then the number of sampled negative interactions for each user will always match the n_neg_interactions. Default: False.
interaction_threshold – The interaction value threshold to consider an interaction value positive or negative. All values above or equal interaction_threshold are considered positive, and all values bellow are considered negative. Default: model.interaction_threshold.
novelty – A boolean indicating whether only novel recommendations should be taken into account or not. Default: False.
metrics – An optional list containing instances of RankingMetricABC. Default: [Precision(), Recall(), HitRatio(), NDCG()].
max_concurrent_threads – An optional integer representing the max concurrent threads to use. Default: 4.
seed – An optional, integer representing the seed for the random number generator used to sample positive and negative interaction pairs. Default: 0.
verbose – A boolean indicating whether state logs and a final graph should be produced or not. Default: true.
block – A boolean indicating whether the displayed graph block code execution or not. Note that this graph is only displayed when verbose=True. Default: true.

Returns:

A dict containing each metric name mapping to the corresponding metric value.

DRecPy.Evaluation.Processes.recommendation_evaluation module¶

DRecPy.Evaluation.Processes.recommendation_evaluation.recommendation_evaluation(model, ds_test=None, n_test_users=None, k=10, n_pos_interactions=None, novelty=False, ignore_low_predictions_threshold=None, seed=0, max_concurrent_threads=4, **kwds)¶

Executes a recommendation evaluation process, where the given model will be evaluated under the provided settings. This function is not thread-safe (i.e. concurrent calls might produce unexpected results). Instead of trying this, increase the max_concurrent_threads argument to speed up the process (if you’ve the available cores).

Parameters:

model – An instance of a Recommender to be evaluated.
ds_test – An optional test InteractionDataset. If none is provided, then the test data will be the model training data. Evaluating on train data is not ideal for assessing the model’s performance.
n_test_users – An optional integer representing the number of users to evaluate the produced rankings. Defaults to the number of unique users of the provided test dataset.
k – An optional integer (or a list of integers) representing the truncation factor (keep the first k elements for each ranked list), which then affects the produced metric evaluation. Default: 10.
n_pos_interactions – The number of positive interactions to sample into the list that contains the positive items to be considered when evaluating the model provided recommendations, for each user. If for a given user, there’s less than n_pos_interactions positive interactions, the user’s evaluation will be skipped. When this argument is not provided, all positive interactions on the test set from each user will be sampled. Default: None.
interaction_threshold – The interaction value threshold to consider an interaction value positive or negative. All values above or equal interaction_threshold are considered positive, and all values bellow are considered negative. Default: model.interaction_threshold.
novelty – A boolean indicating whether only novel recommendations should be taken into account or not. Default: False.
ignore_low_predictions_threshold – An optional value representing the interaction value threshold to consider a model prediction, if the item-prediction is higher or equal than the specified, or to skip it if it is less than the specified, i.e. remove those items with low model prediction from the recommendation list. Default: None.
metrics – An optional list containing instances of RankingMetricABC. Default: [Precision(), Recall(), HitRatio(), NDCG()].
max_concurrent_threads – An optional integer representing the max concurrent threads to use. Default: 4.
seed – An optional, integer representing the seed for the random number generator used to sample positive and negative interaction pairs. Default: 0.
verbose – A boolean indicating whether state logs and a final graph should be produced or not. Default: true.
block – A boolean indicating whether the displayed graph block code execution or not. Note that this graph is only displayed when verbose=True. Default: true.

Returns:

A dict containing each metric name mapping to the corresponding metric value.