dhg.metrics
Basic Metrics
Classification
- dhg.metrics.available_classification_metrics()[source]
Return available metrics for the classification task.
The available metrics are:
accuracy
,f1_score
,confusion_matrix
.
- dhg.metrics.classification.accuracy(y_true, y_pred)[source]
Calculate the accuracy score for the classification task.
\[\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} \mathcal{I}(y_i, \hat{y}_i),\]where \(\mathcal{I}(\cdot, \cdot)\) is the indicator function, which is 1 if the two inputs are equal, and 0 otherwise. \(y_i\) and \(\hat{y}_i\) are the ground truth and predicted labels for the i-th sample.
- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([3, 2, 4]) >>> y_pred = torch.tensor([ [0.2, 0.3, 0.5, 0.4, 0.3], [0.8, 0.2, 0.3, 0.5, 0.4], [0.2, 0.4, 0.5, 0.2, 0.8], ]) >>> dm.classification.accuracy(y_true, y_pred) 0.3333333432674408
- dhg.metrics.classification.f1_score(y_true, y_pred, average='macro')[source]
Calculate the F1 score for the classification task.
- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).average (
str
) – The average method. Must be one of “macro”, “micro”, “weighted”.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([3, 2, 4, 0]) >>> y_pred = torch.tensor([ [0.2, 0.3, 0.5, 0.4, 0.3], [0.8, 0.2, 0.3, 0.5, 0.4], [0.2, 0.4, 0.5, 0.2, 0.8], [0.8, 0.4, 0.5, 0.2, 0.8] ]) >>> dm.classification.f1_score(y_true, y_pred, "macro") 0.41666666666666663 >>> dm.classification.f1_score(y_true, y_pred, "micro") 0.5 >>> dm.classification.f1_score(y_true, y_pred, "weighted") 0.41666666666666663
- dhg.metrics.classification.confusion_matrix(y_true, y_pred)[source]
Calculate the confusion matrix for the classification task.
- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([3, 2, 4, 0]) >>> y_pred = torch.tensor([ [0.2, 0.3, 0.5, 0.4, 0.3], [0.8, 0.2, 0.3, 0.5, 0.4], [0.2, 0.4, 0.5, 0.2, 0.8], [0.8, 0.4, 0.5, 0.2, 0.8] ]) >>> dm.classification.confusion_matrix(y_true, y_pred) array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
Recommender
- dhg.metrics.available_recommender_metrics()[source]
Return available metrics for the recommender task.
The available metrics are:
precision
,recall
, andndcg
.
- dhg.metrics.recommender.precision(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Precision score for the recommender task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([0, 1, 0, 0, 1, 1]) >>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5]) >>> dm.recommender.precision(y_true, y_pred, k=2) 0.5
- dhg.metrics.recommender.recall(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Recall score for the recommender task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([0, 1, 0, 0, 1, 1]) >>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5]) >>> dm.recommender.recall(y_true, y_pred, k=5) 0.6666666666666666
- dhg.metrics.recommender.ndcg(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Normalized Discounted Cumulative Gain (NDCG) for the recommender task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Default to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([10, 0, 0, 1, 5]) >>> y_pred = torch.tensor([.1, .2, .3, 4, 70]) >>> dm.recommender.ndcg(y_true, y_pred) 0.695694088935852 >>> dm.recommender.ndcg(y_true, y_pred, k=3) 0.4123818874359131
Retrieval
- dhg.metrics.available_retrieval_metrics()[source]
Return available metrics for the retrieval task.
The available metrics are:
precision
,recall
,map
,ndcg
,mrr
,pr_curve
.
- dhg.metrics.retrieval.precision(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Precision score for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([0, 1, 0, 0, 1, 1]) >>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5]) >>> dm.retrieval.precision(y_true, y_pred, k=2) 0.5
- dhg.metrics.retrieval.recall(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Recall score for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([0, 1, 0, 0, 1, 1]) >>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5]) >>> dm.retrieval.recall(y_true, y_pred, k=5) 0.6666666666666666
- dhg.metrics.retrieval.ap(y_true, y_pred, k=None, ratio=None, method='pascal_voc')[source]
Calculate the Average Precision (AP) for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor. Size \((N_{target},)\).y_pred (
torch.Tensor
) – A 1-D tensor. Size \((N_{target},)\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.method (
str
) – The method to compute the AP can belegacy
orpascal_voc
. Defaults topascal_voc
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([True, False, True]) >>> y_pred = torch.tensor([0.2, 0.3, 0.5]) >>> dm.retrieval.ap(y_true, y_pred, method="legacy") 0.8333333730697632
- dhg.metrics.retrieval.map(y_true, y_pred, k=None, ratio=None, method='pascal_voc', ret_batch=False)[source]
Calculate the mean Average Precision (mAP) for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.method (
str
) – The specified method:legacy
orpascal_voc
. Defaults topascal_voc
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([ [True, False, True, False, True], [False, False, False, True, True], [True, True, False, True, False], [False, True, True, False, True], ]) >>> y_pred = torch.tensor([ [0.2, 0.8, 0.5, 0.4, 0.3], [0.8, 0.2, 0.3, 0.9, 0.4], [0.2, 0.4, 0.5, 0.9, 0.8], [0.8, 0.2, 0.9, 0.3, 0.7], ]) >>> dm.retrieval.map(y_true, y_pred, k=2, method="legacy") 0.7055555880069733 >>> dm.retrieval.map(y_true, y_pred, k=2, method="pascal_voc") 0.7305555790662766
- dhg.metrics.retrieval.ndcg(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the Normalized Discounted Cumulative Gain (NDCG) for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([10, 0, 0, 1, 5]) >>> y_pred = torch.tensor([.1, .2, .3, 4, 70]) >>> dm.retrieval.ndcg(y_true, y_pred) 0.695694088935852 >>> dm.retrieval.ndcg(y_true, y_pred, k=3) 0.4123818874359131
- dhg.metrics.retrieval.rr(y_true, y_pred, k=None, ratio=None)[source]
Calculate the Reciprocal Rank (RR) for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor. Size \((N_{target},)\).y_pred (
torch.Tensor
) – A 1-D tensor. Size \((N_{target},)\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([False, True, False, True]) >>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2]) >>> dm.retrieval.rr(y_true, y_pred) 0.375 >>> dm.retrieval.rr(y_true, y_pred, k=2) 0.5
- dhg.metrics.retrieval.mrr(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]
Calculate the mean Reciprocal Rank (MRR) for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor([False, True, False, True]) >>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2]) >>> dm.retrieval.mrr(y_true, y_pred) 0.375 >>> dm.retrieval.mrr(y_true, y_pred, k=2) 0.5
- dhg.metrics.retrieval.pr_curve(y_true, y_pred, k=None, ratio=None, method='pascal_voc', n_points=11, ret_batch=False)[source]
Calculate the Precision-Recall Curve for the retrieval task.
- Parameters
y_true (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).y_pred (
torch.Tensor
) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).k (
int
, optional) – The specified top-k value. Defaults to \(N_{target}\).ratio (
float
, optional) – The specified ratio of top-k value. Ifratio
is notNone
,k
will be ignored. Defaults toNone
.method (
str
, optional) – The method to compute the PR curve can be"legacy"
or"pascal_voc"
. Defaults to"pascal_voc"
.n_points (
int
) – The number of points to compute the PR curve. Defaults to11
.ret_batch (
bool
) – Whether to return the raw score list. Defaults toFalse
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> y_true = torch.tensor( [ [0, 1, 0, 1, 0, 0, 1, 0, 1, 0], [1, 0, 1, 0, 0, 1, 0, 1, 0, 0], [0, 1, 0, 0, 1, 0, 0, 0, 1, 1], ] ) >>> y_pred = torch.tensor( [ [0.23, 0.76, 0.01, 0.91, 0.13, 0.45, 0.12, 0.03, 0.38, 0.11], [0.33, 0.47, 0.21, 0.87, 0.23, 0.65, 0.22, 0.13, 0.58, 0.21], [0.43, 0.57, 0.31, 0.77, 0.33, 0.85, 0.32, 0.23, 0.78, 0.31], ] ) >>> precision_coor, recall_coor = dm.retrieval.pr_curve(y_true, y_pred, method="legacy") >>> precision_coor [0.6666, 0.6666, 0.6666, 0.6666, 0.6333, 0.6333, 0.6333, 0.5416, 0.5416, 0.5416, 0.4719] >>> recall_coor [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] >>> precision_coor, recall_coor = dm.retrieval.pr_curve(y_true, y_pred, method="pascal_voc") >>> precision_coor [0.6666, 0.6666, 0.6666, 0.6666, 0.6333, 0.6333, 0.6333, 0.5500, 0.5500, 0.5500, 0.4719] >>> recall_coor [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
Evaluators for Different Tasks
- dhg.metrics.build_evaluator(task, metric_configs, validate_index=0)[source]
Return the metric evaluator for the given task.
- Parameters
task (
str
) – The type of the task. The supported types include:graph_vertex_classification
,hypergraph_vertex_classification
, anduser_item_recommender
.metric_configs (
List[Union[str, Dict[str, dict]]]
) – The list of metric names.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
Base Class
- class dhg.metrics.BaseEvaluator(task, metric_configs, validate_index=0)[source]
The base class for task-specified metric evaluators.
- Parameters
task (
str
) – The type of the task. The supported types include:classification
,retrieval
andrecommender
.metric_configs (
List[Union[str, Dict[str, dict]]]
) – The metric configurations. The key is the metric name and the value is the metric parameters.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
- test(y_true, y_pred)[source]
Return results of the evaluation on all the metrics in
metric_configs
.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, -)\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, -)\).
- test_add_batch(batch_y_true, batch_y_pred)[source]
Add batch data for testing.
- Parameters
batch_y_true (
torch.Tensor
) – The ground truth data. Size \((N_{batch}, -)\).batch_y_pred (
torch.Tensor
) – The predicted data. Size \((N_{batch}, -)\).
- test_epoch_res()[source]
For all added batch data, return results of the evaluation on all the metrics in
metric_configs
.
- validate(y_true, y_pred)[source]
Return the result of the evaluation on the specified
validate_index
-th metric.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, -)\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, -)\).
Vertex Classification Task
On Graph
- class dhg.metrics.GraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]
Bases:
dhg.metrics.classification.VertexClassificationEvaluator
Return the metric evaluator for vertex classification task on the graph structure. The supported metrics includes:
accuracy
,f1_score
,confusion_matrix
.- Parameters
metric_configs (
List[Union[str, Dict[str, dict]]]
) – The metric configurations. The key is the metric name and the value is the metric parameters.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> evaluator = dm.GraphVertexClassificationEvaluator( [ "accuracy", {"f1_score": {"average": "macro"}}, ], 0 ) >>> y_true = torch.tensor([0, 0, 1, 1, 2, 2]) >>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2]) >>> evaluator.validate(y_true, y_pred) 0.5 >>> evaluator.test(y_true, y_pred) { 'accuracy': 0.5, 'f1_score -> average@macro': 0.5222222222222221 }
- test(y_true, y_pred)[source]
Return results of the evaluation on all the metrics in
metric_configs
.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
- validate(y_true, y_pred)[source]
Return the result of the evaluation on the specified
validate_index
-th metric.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
On Hypergraph
- class dhg.metrics.HypergraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]
Bases:
dhg.metrics.classification.VertexClassificationEvaluator
Return the metric evaluator for vertex classification task on the hypergraph structure. The supported metrics includes:
accuracy
,f1_score
,confusion_matrix
.- Parameters
metric_configs (
List[Union[str, Dict[str, dict]]]
) – The metric configurations. The key is the metric name and the value is the metric parameters.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> evaluator = dm.HypergraphVertexClassificationEvaluator( [ "accuracy", {"f1_score": {"average": "macro"}}, ], 0 ) >>> y_true = torch.tensor([0, 0, 1, 1, 2, 2]) >>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2]) >>> evaluator.validate(y_true, y_pred) 0.5 >>> evaluator.test(y_true, y_pred) { 'accuracy': 0.5, 'f1_score -> average@macro': 0.5222222222222221 }
- test(y_true, y_pred)[source]
Return results of the evaluation on all the metrics in
metric_configs
.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
- validate(y_true, y_pred)[source]
Return the result of the evaluation on the specified
validate_index
-th metric.- Parameters
y_true (
torch.LongTensor
) – The ground truth labels. Size \((N_{samples}, )\).y_pred (
torch.Tensor
) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
Recommender Task
On User-Item Bipartite Graph
- class dhg.metrics.UserItemRecommenderEvaluator(metric_configs, validate_index=0)[source]
Bases:
dhg.metrics.base.BaseEvaluator
Return the metric evaluator for recommender task on user-item bipartite graph. The supported metrics includes:
precision
,recall
,ndcg
.- Parameters
metric_configs (
List[Union[str, Dict[str, dict]]]
) – The metric configurations. The key is the metric name and the value is the metric parameters.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> evaluator = dm.UserItemRecommenderEvaluator( [ {"ndcg": {"k": 2}}, {"recall": {"k": 4}}, {"precision": {"k": 2}}, "precision", {"precision": {"k": 6}}, ], 0, ) >>> y_true = torch.tensor([ [0, 1, 0, 0, 1, 1], [0, 0, 1, 0, 1, 0], [0, 1, 1, 1, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.6, 0.7, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.validate_add_batch(y_true, y_pred) >>> y_true = torch.tensor([ [0, 1, 0, 1, 0, 1], [1, 1, 0, 0, 1, 0], [1, 0, 1, 0, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.9, 0.4, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.validate_add_batch(y_true, y_pred) >>> evaluator.validate_epoch_res() 0.37104907135168713 >>> y_true = torch.tensor([ [0, 1, 0, 0, 1, 1], [0, 0, 1, 0, 1, 0], [0, 1, 1, 1, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.6, 0.7, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.test_add_batch(y_true, y_pred) >>> y_true = torch.tensor([ [0, 1, 0, 1, 0, 1], [1, 1, 0, 0, 1, 0], [1, 0, 1, 0, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.9, 0.4, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.test_add_batch(y_true, y_pred) >>> evaluator.test_epoch_res() { 'ndcg -> k@2': 0.37104907135168713, 'recall -> k@4': 0.638888900478681, 'precision -> k@2': 0.3333333333333333, 'precision': 0.5000000049670538, 'precision -> k@6': 0.5000000049670538 }
- test_add_batch(batch_y_true, batch_y_pred)[source]
Add batch data for testing.
- Parameters
batch_y_true (
torch.Tensor
) – The ground truth data. Size \((N_{batch}, -)\).batch_y_pred (
torch.Tensor
) – The predicted data. Size \((N_{batch}, -)\).
- test_epoch_res()[source]
For all added batch data, return results of the evaluation on all the metrics in
metric_configs
.
Retrieval Task
- class dhg.metrics.RetrievalEvaluator(metric_configs, validate_index=0)[source]
Bases:
dhg.metrics.base.BaseEvaluator
Return the metric evaluator for retrieval task. The supported metrics includes:
precision
,recall
,map
,ndcg
,mrr
,pr_curve
.- Parameters
metric_configs (
List[Union[str, Dict[str, dict]]]
) – The metric configurations. The key is the metric name and the value is the metric parameters.validate_index (
int
) – The specified metric index used for validation. Defaults to0
.
Examples
>>> import torch >>> import dhg.metrics as dm >>> evaluator = dm.RetrievalEvaluator( [ {"recall": {"k": 2}}, {"recall": {"k": 4}}, {"recall": {"ratio": 0.1}}, {"precision": {"k": 4}}, {"ndcg": {"k": 4}}, "pr_curve", {"pr_curve": {"k": 4, "method": "legacy"}}, {"pr_curve": {"k": 4, "method": "pascal_voc", "n_points": 21}}, ], 0, ) >>> y_true = torch.tensor([ [0, 1, 0, 0, 1, 1], [0, 0, 1, 0, 1, 0], [0, 1, 1, 1, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.6, 0.7, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.validate_add_batch(y_true, y_pred) >>> y_true = torch.tensor([ [0, 1, 0, 1, 0, 1], [1, 1, 0, 0, 1, 0], [1, 0, 1, 0, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.9, 0.4, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.validate_add_batch(y_true, y_pred) >>> evaluator.validate_epoch_res() 0.2222222238779068 >>> y_true = torch.tensor([ [0, 1, 0, 0, 1, 1], [0, 0, 1, 0, 1, 0], [0, 1, 1, 1, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.6, 0.7, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.test_add_batch(y_true, y_pred) >>> y_true = torch.tensor([ [0, 1, 0, 1, 0, 1], [1, 1, 0, 0, 1, 0], [1, 0, 1, 0, 0, 1], ]) >>> y_pred = torch.tensor([ [0.8, 0.9, 0.9, 0.4, 0.4, 0.5], [0.2, 0.6, 0.3, 0.3, 0.4, 0.6], [0.7, 0.4, 0.3, 0.2, 0.8, 0.4], ]) >>> evaluator.test_add_batch(y_true, y_pred) >>> evaluator.test_epoch_res() { 'recall -> k@2': 0.2222222238779068, 'recall -> k@4': 0.6388888955116272, 'recall -> ratio@0.1000': 0.1666666716337204, 'precision -> k@4': 0.4583333432674408, 'ndcg -> k@4': 0.5461218953132629, 'pr_curve': [ [0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5611111223697662], [0.0, 0.09999999999999999, 0.19999999999999998, 0.30000000000000004, 0.39999999999999997, 0.5, 0.6000000000000001, 0.7000000000000001, 0.7999999999999999, 0.9, 1.0] ], 'pr_curve -> k@4 | method@legacy': [ [0.6944444477558136, 0.6944444477558136, 0.6944444477558136, 0.6944444477558136, 0.7222222238779068, 0.4833333392937978, 0.4833333392937978, 0.5000000099341074, 0.5000000099341074, 0.5000000099341074, 0.5611111223697662], [0.0, 0.09999999999999999, 0.19999999999999998, 0.30000000000000004, 0.39999999999999997, 0.5, 0.6000000000000001, 0.7000000000000001, 0.7999999999999999, 0.9, 1.0] ], 'pr_curve -> k@4 | method@pascal_voc | n_points@21': [ [0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5611111223697662], [0.0, 0.049999999999999996, 0.09999999999999999, 0.15000000000000002, 0.19999999999999998, 0.25, 0.30000000000000004, 0.35000000000000003, 0.39999999999999997, 0.45, 0.5, 0.5499999999999999, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.7999999999999999, 0.85, 0.9, 0.9500000000000001, 1.0] ] }
- test_add_batch(batch_y_true, batch_y_pred)[source]
Add batch data for testing.
- Parameters
batch_y_true (
torch.Tensor
) – The ground truth data. Size \((N_{batch}, -)\).batch_y_pred (
torch.Tensor
) – The predicted data. Size \((N_{batch}, -)\).
- test_epoch_res()[source]
For all added batch data, return results of the evaluation on all the metrics in
metric_configs
.