dhg.metrics

Basic Metrics

Classification

dhg.metrics.available_classification_metrics()[source]

Return available metrics for the classification task.

The available metrics are: accuracy, f1_score, confusion_matrix.

dhg.metrics.classification.accuracy(y_true, y_pred)[source]

Calculate the accuracy score for the classification task.

\[\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} \mathcal{I}(y_i, \hat{y}_i),\]

where \(\mathcal{I}(\cdot, \cdot)\) is the indicator function, which is 1 if the two inputs are equal, and 0 otherwise. \(y_i\) and \(\hat{y}_i\) are the ground truth and predicted labels for the i-th sample.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
    ])
>>> dm.classification.accuracy(y_true, y_pred)
0.3333333432674408

dhg.metrics.classification.f1_score(y_true, y_pred, average='macro')[source]

Calculate the F1 score for the classification task.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
average (str) – The average method. Must be one of “macro”, “micro”, “weighted”.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4, 0])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
        [0.8, 0.4, 0.5, 0.2, 0.8]
    ])
>>> dm.classification.f1_score(y_true, y_pred, "macro")
0.41666666666666663
>>> dm.classification.f1_score(y_true, y_pred, "micro")
0.5
>>> dm.classification.f1_score(y_true, y_pred, "weighted")
0.41666666666666663

dhg.metrics.classification.confusion_matrix(y_true, y_pred)[source]

Calculate the confusion matrix for the classification task.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4, 0])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
        [0.8, 0.4, 0.5, 0.2, 0.8]
    ])
>>> dm.classification.confusion_matrix(y_true, y_pred)
array([[1, 0, 0, 0],
       [1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 0, 1]])

Recommender

dhg.metrics.available_recommender_metrics()[source]

Return available metrics for the recommender task.

The available metrics are: precision, recall, and ndcg.

dhg.metrics.recommender.precision(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Precision score for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.recommender.precision(y_true, y_pred, k=2)
0.5

dhg.metrics.recommender.recall(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Recall score for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.recommender.recall(y_true, y_pred, k=5)
0.6666666666666666

dhg.metrics.recommender.ndcg(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Normalized Discounted Cumulative Gain (NDCG) for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([10, 0, 0, 1, 5])
>>> y_pred = torch.tensor([.1, .2, .3, 4, 70])
>>> dm.recommender.ndcg(y_true, y_pred)
0.695694088935852
>>> dm.recommender.ndcg(y_true, y_pred, k=3)
0.4123818874359131

Retrieval

dhg.metrics.available_retrieval_metrics()[source]

Return available metrics for the retrieval task.

The available metrics are: precision, recall, map, ndcg, mrr, pr_curve.

dhg.metrics.retrieval.precision(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Precision score for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.retrieval.precision(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.recall(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Recall score for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.retrieval.recall(y_true, y_pred, k=5)
0.6666666666666666

dhg.metrics.retrieval.ap(y_true, y_pred, k=None, ratio=None, method='pascal_voc')[source]

Calculate the Average Precision (AP) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
y_pred (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
method (str) – The method to compute the AP can be legacy or pascal_voc. Defaults to pascal_voc.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5])
>>> dm.retrieval.ap(y_true, y_pred, method="legacy")
0.8333333730697632

dhg.metrics.retrieval.map(y_true, y_pred, k=None, ratio=None, method='pascal_voc', ret_batch=False)[source]

Calculate the mean Average Precision (mAP) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
method (str) – The specified method: legacy or pascal_voc. Defaults to pascal_voc.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([
        [True, False, True, False, True],
        [False, False, False, True, True],
        [True, True, False, True, False],
        [False, True, True, False, True],
    ])
>>> y_pred = torch.tensor([
        [0.2, 0.8, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.9, 0.4],
        [0.2, 0.4, 0.5, 0.9, 0.8],
        [0.8, 0.2, 0.9, 0.3, 0.7],
    ])
>>> dm.retrieval.map(y_true, y_pred, k=2, method="legacy")
0.7055555880069733
>>> dm.retrieval.map(y_true, y_pred, k=2, method="pascal_voc")
0.7305555790662766

dhg.metrics.retrieval.ndcg(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the Normalized Discounted Cumulative Gain (NDCG) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([10, 0, 0, 1, 5])
>>> y_pred = torch.tensor([.1, .2, .3, 4, 70])
>>> dm.retrieval.ndcg(y_true, y_pred)
0.695694088935852
>>> dm.retrieval.ndcg(y_true, y_pred, k=3)
0.4123818874359131

dhg.metrics.retrieval.rr(y_true, y_pred, k=None, ratio=None)[source]

Calculate the Reciprocal Rank (RR) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
y_pred (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([False, True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2])
>>> dm.retrieval.rr(y_true, y_pred)
0.375
>>> dm.retrieval.rr(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.mrr(y_true, y_pred, k=None, ratio=None, ret_batch=False)[source]

Calculate the mean Reciprocal Rank (MRR) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([False, True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2])
>>> dm.retrieval.mrr(y_true, y_pred)
0.375
>>> dm.retrieval.mrr(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.pr_curve(y_true, y_pred, k=None, ratio=None, method='pascal_voc', n_points=11, ret_batch=False)[source]

Calculate the Precision-Recall Curve for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ratio (float, optional) – The specified ratio of top-k value. If ratio is not None, k will be ignored. Defaults to None.
method (str, optional) – The method to compute the PR curve can be "legacy" or "pascal_voc". Defaults to "pascal_voc".
n_points (int) – The number of points to compute the PR curve. Defaults to 11.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor(
        [
            [0, 1, 0, 1, 0, 0, 1, 0, 1, 0],
            [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
            [0, 1, 0, 0, 1, 0, 0, 0, 1, 1],
        ]
    )
>>> y_pred = torch.tensor(
        [
            [0.23, 0.76, 0.01, 0.91, 0.13, 0.45, 0.12, 0.03, 0.38, 0.11],
            [0.33, 0.47, 0.21, 0.87, 0.23, 0.65, 0.22, 0.13, 0.58, 0.21],
            [0.43, 0.57, 0.31, 0.77, 0.33, 0.85, 0.32, 0.23, 0.78, 0.31],
        ]
    )
>>> precision_coor, recall_coor = dm.retrieval.pr_curve(y_true, y_pred, method="legacy")
>>> precision_coor
[0.6666, 0.6666, 0.6666, 0.6666, 0.6333, 0.6333, 0.6333, 0.5416, 0.5416, 0.5416, 0.4719]
>>> recall_coor
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
>>> precision_coor, recall_coor = dm.retrieval.pr_curve(y_true, y_pred, method="pascal_voc")
>>> precision_coor
[0.6666, 0.6666, 0.6666, 0.6666, 0.6333, 0.6333, 0.6333, 0.5500, 0.5500, 0.5500, 0.4719]
>>> recall_coor
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

Evaluators for Different Tasks

dhg.metrics.build_evaluator(task, metric_configs, validate_index=0)[source]

Return the metric evaluator for the given task.

Parameters

task (str) – The type of the task. The supported types include: graph_vertex_classification, hypergraph_vertex_classification, and user_item_recommender.
metric_configs (List[Union[str, Dict[str, dict]]]) – The list of metric names.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Base Class

class dhg.metrics.BaseEvaluator(task, metric_configs, validate_index=0)[source]

The base class for task-specified metric evaluators.

Parameters

task (str) – The type of the task. The supported types include: classification, retrieval and recommender.
metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, -)\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, -)\).

test_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for testing.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

test_epoch_res()[source]: For all added batch data, return results of the evaluation on all the metrics in metric_configs.

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, -)\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, -)\).

validate_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for validation.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

validate_epoch_res()[source]: For all added batch data, return the result of the evaluation on the specified validate_index-th metric.

Vertex Classification Task

On Graph

class dhg.metrics.GraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.classification.VertexClassificationEvaluator

Return the metric evaluator for vertex classification task on the graph structure. The supported metrics includes: accuracy, f1_score, confusion_matrix.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.GraphVertexClassificationEvaluator(
        [
            "accuracy",
            {"f1_score": {"average": "macro"}},
        ],
        0
    )
>>> y_true = torch.tensor([0, 0, 1, 1, 2, 2])
>>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2])
>>> evaluator.validate(y_true, y_pred)
0.5
>>> evaluator.test(y_true, y_pred)
{
    'accuracy': 0.5,
    'f1_score -> average@macro': 0.5222222222222221
}

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

On Hypergraph

class dhg.metrics.HypergraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.classification.VertexClassificationEvaluator

Return the metric evaluator for vertex classification task on the hypergraph structure. The supported metrics includes: accuracy, f1_score, confusion_matrix.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.HypergraphVertexClassificationEvaluator(
        [
            "accuracy",
            {"f1_score": {"average": "macro"}},
        ],
        0
    )
>>> y_true = torch.tensor([0, 0, 1, 1, 2, 2])
>>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2])
>>> evaluator.validate(y_true, y_pred)
0.5
>>> evaluator.test(y_true, y_pred)
{
    'accuracy': 0.5,
    'f1_score -> average@macro': 0.5222222222222221
}

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Recommender Task

On User-Item Bipartite Graph

class dhg.metrics.UserItemRecommenderEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.base.BaseEvaluator

Return the metric evaluator for recommender task on user-item bipartite graph. The supported metrics includes: precision, recall, ndcg.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.UserItemRecommenderEvaluator(
        [
            {"ndcg": {"k": 2}},
            {"recall": {"k": 4}},
            {"precision": {"k": 2}},
            "precision",
            {"precision": {"k": 6}},
        ],
        0,
    )
>>> y_true = torch.tensor([
        [0, 1, 0, 0, 1, 1],
        [0, 0, 1, 0, 1, 0],
        [0, 1, 1, 1, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.6, 0.7, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([
        [0, 1, 0, 1, 0, 1],
        [1, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.9, 0.4, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> evaluator.validate_epoch_res()
0.37104907135168713
>>> y_true = torch.tensor([
        [0, 1, 0, 0, 1, 1],
        [0, 0, 1, 0, 1, 0],
        [0, 1, 1, 1, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.6, 0.7, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([
        [0, 1, 0, 1, 0, 1],
        [1, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.9, 0.4, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> evaluator.test_epoch_res()
{
    'ndcg -> k@2': 0.37104907135168713,
    'recall -> k@4': 0.638888900478681,
    'precision -> k@2': 0.3333333333333333,
    'precision': 0.5000000049670538,
    'precision -> k@6': 0.5000000049670538
}

test_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for testing.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

test_epoch_res()[source]: For all added batch data, return results of the evaluation on all the metrics in metric_configs.

validate_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for validation.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

validate_epoch_res()[source]: For all added batch data, return the result of the evaluation on the specified validate_index-th metric.

Retrieval Task

class dhg.metrics.RetrievalEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.base.BaseEvaluator

Return the metric evaluator for retrieval task. The supported metrics includes: precision, recall, map, ndcg, mrr, pr_curve.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.RetrievalEvaluator(
        [
            {"recall": {"k": 2}},
            {"recall": {"k": 4}},
            {"recall": {"ratio": 0.1}},
            {"precision": {"k": 4}},
            {"ndcg": {"k": 4}},
            "pr_curve",
            {"pr_curve": {"k": 4, "method": "legacy"}},
            {"pr_curve": {"k": 4, "method": "pascal_voc", "n_points": 21}},
        ],
        0,
    )
>>> y_true = torch.tensor([
        [0, 1, 0, 0, 1, 1],
        [0, 0, 1, 0, 1, 0],
        [0, 1, 1, 1, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.6, 0.7, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([
        [0, 1, 0, 1, 0, 1],
        [1, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.9, 0.4, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> evaluator.validate_epoch_res()
0.2222222238779068
>>> y_true = torch.tensor([
        [0, 1, 0, 0, 1, 1],
        [0, 0, 1, 0, 1, 0],
        [0, 1, 1, 1, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.6, 0.7, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([
        [0, 1, 0, 1, 0, 1],
        [1, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 0, 1],
    ])
>>> y_pred = torch.tensor([
        [0.8, 0.9, 0.9, 0.4, 0.4, 0.5],
        [0.2, 0.6, 0.3, 0.3, 0.4, 0.6],
        [0.7, 0.4, 0.3, 0.2, 0.8, 0.4],
    ])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> evaluator.test_epoch_res()
{
    'recall -> k@2': 0.2222222238779068,
    'recall -> k@4': 0.6388888955116272,
    'recall -> ratio@0.1000': 0.1666666716337204,
    'precision -> k@4': 0.4583333432674408,
    'ndcg -> k@4': 0.5461218953132629,
    'pr_curve': [
        [0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5611111223697662],
        [0.0, 0.09999999999999999, 0.19999999999999998, 0.30000000000000004, 0.39999999999999997, 0.5, 0.6000000000000001, 0.7000000000000001, 0.7999999999999999, 0.9, 1.0]
    ],
    'pr_curve -> k@4 | method@legacy': [
        [0.6944444477558136, 0.6944444477558136, 0.6944444477558136, 0.6944444477558136, 0.7222222238779068, 0.4833333392937978, 0.4833333392937978, 0.5000000099341074, 0.5000000099341074, 0.5000000099341074, 0.5611111223697662],
        [0.0, 0.09999999999999999, 0.19999999999999998, 0.30000000000000004, 0.39999999999999997, 0.5, 0.6000000000000001, 0.7000000000000001, 0.7999999999999999, 0.9, 1.0]
    ],
    'pr_curve -> k@4 | method@pascal_voc | n_points@21': [
        [0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.7944444517294565, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5888889034589132, 0.5611111223697662],
        [0.0, 0.049999999999999996, 0.09999999999999999, 0.15000000000000002, 0.19999999999999998, 0.25, 0.30000000000000004, 0.35000000000000003, 0.39999999999999997, 0.45, 0.5, 0.5499999999999999, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.7999999999999999, 0.85, 0.9, 0.9500000000000001, 1.0]
    ]
}

test_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for testing.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

test_epoch_res()[source]: For all added batch data, return results of the evaluation on all the metrics in metric_configs.

validate_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for validation.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

validate_epoch_res()[source]: For all added batch data, return the result of the evaluation on the specified validate_index-th metric.