dhg.metrics

Basic Metrics

Classification

dhg.metrics.available_classification_metrics()[source]

Return available metrics for the classification task.

The available metrics are: accuracy, f1_score, confusion_matrix.

dhg.metrics.classification.accuracy(y_true, y_pred)[source]

Calculate the accuracy score for the classification task.

\[\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} \mathcal{I}(y_i, \hat{y}_i),\]

where \(\mathcal{I}(\cdot, \cdot)\) is the indicator function, which is 1 if the two inputs are equal, and 0 otherwise. \(y_i\) and \(\hat{y}_i\) are the ground truth and predicted labels for the i-th sample.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
    ])
>>> dm.classification.accuracy(y_true, y_pred)
0.3333333432674408

dhg.metrics.classification.f1_score(y_true, y_pred, average='macro')[source]

Calculate the F1 score for the classification task.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).
average (str) – The average method. Must be one of “macro”, “micro”, “weighted”.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4, 0])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
        [0.8, 0.4, 0.5, 0.2, 0.8]
    ])
>>> dm.classification.f1_score(y_true, y_pred, "macro")
0.41666666666666663
>>> dm.classification.f1_score(y_true, y_pred, "micro")
0.5
>>> dm.classification.f1_score(y_true, y_pred, "weighted")
0.41666666666666663

dhg.metrics.classification.confusion_matrix(y_true, y_pred)[source]

Calculate the confusion matrix for the classification task.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([3, 2, 4, 0])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
        [0.8, 0.4, 0.5, 0.2, 0.8]
    ])
>>> dm.classification.confusion_matrix(y_true, y_pred)
array([[1, 0, 0, 0],
       [1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 0, 1]])

Recommender

dhg.metrics.available_recommender_metrics()[source]

Return available metrics for the recommender task.

The available metrics are: precision, recall, and ndcg.

dhg.metrics.recommender.precision(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Precision score for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.recommender.precision(y_true, y_pred, k=2)
0.5

dhg.metrics.recommender.recall(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Recall score for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.recommender.recall(y_true, y_pred, k=5)
0.6666666666666666

dhg.metrics.recommender.ndcg(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Normalized Discounted Cumulative Gain (NDCG) for the recommender task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([10, 0, 0, 1, 5])
>>> y_pred = torch.tensor([.1, .2, .3, 4, 70])
>>> dm.recommender.ndcg(y_true, y_pred)
0.695694088935852
>>> dm.recommender.ndcg(y_true, y_pred, k=3)
0.4123818874359131

Retrieval

dhg.metrics.available_retrieval_metrics()[source]

Return available metrics for the retrieval task.

The available metrics are: precision, recall, ap, map, ndcg, rr, mrr, pr_curve.

dhg.metrics.retrieval.precision(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Precision score for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.retrieval.precision(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.recall(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Recall score for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> dm.retrieval.recall(y_true, y_pred, k=5)
0.6666666666666666

dhg.metrics.retrieval.ap(y_true, y_pred, k=None, method='pascal_voc')[source]

Calculate the Average Precision (AP) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
y_pred (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
k (int, optional) – The specified top-k value. Defaults to \(N_{target}\).
method (str) – The method to compute the AP can be legacy or pascal_voc. Defaults to pascal_voc.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5])
>>> dm.retrieval.ap(y_true, y_pred, method="legacy")
0.8333333730697632

dhg.metrics.retrieval.map(y_true, y_pred, k=None, method='pascal_voc', ret_batch=False)[source]

Calculate the mean Average Precision (mAP) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
method (str) – The specified method: legacy or pascal_voc. Defaults to pascal_voc.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([
        [True, False, True, False, True],
        [False, False, False, True, True],
        [True, True, False, True, False],
    ])
>>> y_pred = torch.tensor([
        [0.2, 0.3, 0.5, 0.4, 0.3],
        [0.8, 0.2, 0.3, 0.5, 0.4],
        [0.2, 0.4, 0.5, 0.2, 0.8],
    ])
>>> dm.retrieval.map(y_true, y_pred, method="legacy")
0.587037056684494

dhg.metrics.retrieval.ndcg(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the Normalized Discounted Cumulative Gain (NDCG) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([10, 0, 0, 1, 5])
>>> y_pred = torch.tensor([.1, .2, .3, 4, 70])
>>> dm.retrieval.ndcg(y_true, y_pred)
0.695694088935852
>>> dm.retrieval.ndcg(y_true, y_pred, k=3)
0.4123818874359131

dhg.metrics.retrieval.rr(y_true, y_pred, k=None)[source]

Calculate the Reciprocal Rank (RR) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
y_pred (torch.Tensor) – A 1-D tensor. Size \((N_{target},)\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([False, True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2])
>>> dm.retrieval.rr(y_true, y_pred)
0.375
>>> dm.retrieval.rr(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.mrr(y_true, y_pred, k=None, ret_batch=False)[source]

Calculate the mean Reciprocal Rank (MRR) for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([False, True, False, True])
>>> y_pred = torch.tensor([0.2, 0.3, 0.5, 0.2])
>>> dm.retrieval.mrr(y_true, y_pred)
0.375
>>> dm.retrieval.mrr(y_true, y_pred, k=2)
0.5

dhg.metrics.retrieval.pr_curve(y_true, y_pred, k=None, method='pascal_voc', n_points=11, ret_batch=False)[source]

Calculate the Precision-Recall Curve for the retrieval task.

Parameters

y_true (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
y_pred (torch.Tensor) – A 1-D tensor or 2-D tensor. Size \((N_{target},)\) or \((N_{samples}, N_{target})\).
k (int, optional) – The specified top-k value. Default to \(N_{target}\).
method (str, optional) – The method to compute the PR curve can be “legacy” or “pascal_voc”. Default to “pascal_voc”.
n_points (int) – The number of points to compute the PR curve. Default to 11.
ret_batch (bool) – Whether to return the raw score list. Defaults to False.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> y_true = torch.tensor([0, 1, 0, 1, 0, 0, 1, 0, 1, 0])
>>> y_pred = torch.tensor([0.23, 0.76, 0.01, 0.91, 0.13, 0.45, 0.12, 0.03, 0.38, 0.11])
>>> precision_coor, recall_coor = dm.retrieval.pr_curve(y_true, y_pred)
>>> precision_coor
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 0.75, 0.75, 0.5714285969734192]
>>> recall_coor
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

Evaluators for Different Tasks

dhg.metrics.build_evaluator(task, metric_configs, validate_index=0)[source]

Return the metric evaluator for the given task.

Parameters

task (str) – The type of the task. The supported types include: graph_vertex_classification, hypergraph_vertex_classification, and user_item_recommender.
metric_configs (List[Union[str, Dict[str, dict]]]) – The list of metric names.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Base Class

class dhg.metrics.BaseEvaluator(task, metric_configs, validate_index=0)[source]

The base class for task-specified metric evaluators.

Parameters

task (str) – The type of the task. The supported types include: classification, retrieval and recommender.
metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, -)\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, -)\).

test_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for testing.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

test_epoch_res()[source]: For all added batch data, return results of the evaluation on all the metrics in metric_configs.

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, -)\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, -)\).

validate_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for validation.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

validate_epoch_res()[source]: For all added batch data, return the result of the evaluation on the specified validate_index-th metric.

Vertex Classification Task

On Graph

class dhg.metrics.GraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.classification.VertexClassificationEvaluator

Return the metric evaluator for vertex classification task on the graph structure. The supported metrics includes: accuracy, f1_score, confusion_matrix.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.GraphVertexClassificationEvaluator(
        [
            "accuracy",
            {"f1_score": {"average": "macro"}},
        ],
        0
    )
>>> y_true = torch.tensor([0, 0, 1, 1, 2, 2])
>>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2])
>>> evaluator.validate(y_true, y_pred)
0.5
>>> evaluator.test(y_true, y_pred)
{'accuracy': 0.5, 'f1_score -> macro': 0.5222222222222221}

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

On Hypergraph

class dhg.metrics.HypergraphVertexClassificationEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.classification.VertexClassificationEvaluator

Return the metric evaluator for vertex classification task on the hypergraph structure. The supported metrics includes: accuracy, f1_score, confusion_matrix.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.HypergraphVertexClassificationEvaluator(
        [
            "accuracy",
            {"f1_score": {"average": "macro"}},
        ],
        0
    )
>>> y_true = torch.tensor([0, 0, 1, 1, 2, 2])
>>> y_pred = torch.tensor([0, 2, 1, 2, 1, 2])
>>> evaluator.validate(y_true, y_pred)
0.5
>>> evaluator.test(y_true, y_pred)
{'accuracy': 0.5, 'f1_score -> macro': 0.5222222222222221}

test(y_true, y_pred)[source]

Return results of the evaluation on all the metrics in metric_configs.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

validate(y_true, y_pred)[source]

Return the result of the evaluation on the specified validate_index-th metric.

Parameters

y_true (torch.LongTensor) – The ground truth labels. Size \((N_{samples}, )\).
y_pred (torch.Tensor) – The predicted labels. Size \((N_{samples}, N_{class})\) or \((N_{samples}, )\).

Recommender Task

On User-Item Bipartite Graph

class dhg.metrics.UserItemRecommenderEvaluator(metric_configs, validate_index=0)[source]

Bases: dhg.metrics.base.BaseEvaluator

Return the metric evaluator for recommender task on user-item bipartite graph. The supported metrics includes: precision, recall, ndcg.

Parameters

metric_configs (List[Union[str, Dict[str, dict]]]) – The metric configurations. The key is the metric name and the value is the metric parameters.
validate_index (int) – The specified metric index used for validation. Defaults to 0.

Examples

>>> import torch
>>> import dhg.metrics as dm
>>> evaluator = dm.UserItemRecommenderEvaluator(
        [
            "precision",
            "recall",
            "ndcg",
        ],
        0
    )
>>> y_true = torch.tensor([0, 1, 0, 0, 1, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([0, 1, 0, 1, 0, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.9, 0.4, 0.4, 0.5])
>>> evaluator.validate_add_batch(y_true, y_pred)
>>> evaluator.validate_epoch_res()
0.5
>>> y_true = torch.tensor([0, 1, 1, 1, 0, 1])
>>> y_pred = torch.tensor([0.8, 0.9, 0.6, 0.7, 0.4, 0.5])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> y_true = torch.tensor([1, 1, 0, 0, 1, 0])
>>> y_pred = torch.tensor([0.8, 0.9, 0.9, 0.4, 0.4, 0.5])
>>> evaluator.test_add_batch(y_true, y_pred)
>>> evaluator.test_epoch_res()
{'precision': 0.5833333432674408, 'recall': 1.0, 'ndcg': 0.8878978490829468}

test_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for testing.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

test_epoch_res()[source]: For all added batch data, return results of the evaluation on all the metrics in metric_configs.

validate_add_batch(batch_y_true, batch_y_pred)[source]

Add batch data for validation.

Parameters

batch_y_true (torch.Tensor) – The ground truth data. Size \((N_{batch}, -)\).
batch_y_pred (torch.Tensor) – The predicted data. Size \((N_{batch}, -)\).

validate_epoch_res()[source]: For all added batch data, return the result of the evaluation on the specified validate_index-th metric.