Building Evaluator

Hint

Author: Yifan Feng (丰一帆)
Proof: Xinwei Zhang

Currently, DHG supports classification, recommender, and retrieval tasks. The detailed metrics are:

Classification -> Accuracy F1 Score Confusion Matrix
Recommender -> Precision Recall NDCG
Retrieval -> Precision Recall NDCG mAP MRR Precision-Recall Curve

Generally speaking, the evaluation strategy can be divided into two categories:

Epoch Evaluation

like vertex classification in graph, the evaluation is performed on the whole graph at each epoch.
Add Batches Then Do Epoch Evaluation

like recommender systems, one epoch consists of multiple batches, and the evaluation is performed on each batch, then those batch results are aggregated to get the epoch result.

Initialization

All evaluators in DHG can be created with the same parameters as the following code:

>>> import dhg.metrics as dm
>>> evaluator = dm.GraphVertexClassificationEvaluator(
        metric_configs = [
            "accuracy",
            {"f1_score": {"average": "macro"}},
        ],
        validate_index = 0
    )
>>> evaluator = dm.UserItemRecommenderEvaluator(
        metric_configs = [
            {"precision": {"k": 20}},
            {"recall": {"k": 20}},
            {"ndcg": {"k": 20}},
        ],
        validate_index = 2
    )

The first parameter metric_configs is the metric configuration, which is a list of metric names or metric configurations. The second parameter validate_index is the index of the metric that is used to validate the model, which is used to compute the results in the validation set.

Epoch Evaluation

Currently, DHG implements two <Epoch Evaluation> tasks: vertex classification on graph and hypergraph. As for validation and testing, you can directly call the validate(y_true, y_pred) method and test(y_true, y_pred) method as follows:

Note

The evaluator.validate(y_true, y_pred) will only return i-th metric value, where i is specified by validate_index. The evaluator.test(y_true, y_pred) will return a result dictionary of all metrics specified in metric_configs.

The following example shows a graph with 5 vertices and each vertex belongs to one of 3 classes.

>>> evaluator = dm.GraphVertexClassificationEvaluator(
        metric_configs = [
            "accuracy",
            {"f1_score": {"average": "micro"}},
            {"f1_score": {"average": "macro"}},
            "confusion_matrix",
        ],
        validate_index = 0
    )
>>> y_true = torch.tensor([0, 2, 1, 0, 1])
>>> y_pred = torch.tensor([0, 1, 0, 0, 1])
>>> evaluator.validate(y_true, y_pred)
0.6000000238418579
>>> evaluator.test(y_true, y_pred)
{
    'accuracy': 0.6000000238418579,
    'f1_score -> average@micro': 0.6,
    'f1_score -> average@macro': 0.43333333333333335,
    'confusion_matrix': array([
        [2, 0, 0],
        [1, 1, 0],
        [0, 1, 0]
    ])
}
>>> y_pred = torch.tensor([[0.7, 0.1, 0.2],
                            [0.1, 0.8, 0.1],
                            [0.7, 0.1, 0.2],
                            [0.6, 0.2, 0.2],
                            [0.2, 0.7, 0.1],])
>>> evaluator.validate(y_true, y_pred)
0.6000000238418579
>>> evaluator.test(y_true, y_pred)
{
    'accuracy': 0.6000000238418579,
    'f1_score -> average@micro': 0.6,
    'f1_score -> average@macro': 0.43333333333333335,
    'confusion_matrix': array([
        [2, 0, 0],
        [1, 1, 0],
        [0, 1, 0]
    ])
}

Add Batches Then Do Epoch Evaluation

Currently, DHG implements only one <Add Batches Then Do Epoch Evaluation> task: recommender systems. As for validation, you can call the validate_add_batch(y_true, y_pred) method to add batch data and then call the validate_epoch_res() method to get the epoch result in the validation set. As for testing, you can call the test_add_batch(y_true, y_pred) method to add batch data and then call the test_epoch_res() method to get the epoch result in the testing set.

Note

The evaluator.validate_epoch_res() will only return i-th metric value, where i is specified by validate_index. The evaluator.test_epoch_res() will return a result dictionary of all metrics specified in metric_configs.

The following example shows a User-Item bipartite graph with 4 users and 6 items, and the batch size is 2.

>>> evaluator = dm.UserItemRecommenderEvaluator(
        metric_configs = [
            {"precision": {"k": 20}},
            {"recall": {"k": 20}},
            {"ndcg": {"k": 20}},
        ],
        validate_index = 2
    )
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 0, 0],
                                [0, 0, 1, 1, 0, 0]])
>>> batch_y_pred = torch.tensor([[0.7, 0.9, 0.1, 0.1, 0.2, 0.0],
                                 [0.1, 0.2, 0.5, 0.3, 0.6, 0.0]])
>>> evaluator.validate_add_batch(batch_y_true, batch_y_pred)
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 1, 0],
                                [0, 0, 1, 0, 1, 1]])
>>> batch_y_pred = torch.tensor([[0.3, 0.2, 0.1, 0.5, 0.2, 0.3],
                                 [0.3, 0.5, 0.7, 0.2, 0.1, 0.5]])
>>> evaluator.validate_add_batch(batch_y_true, batch_y_pred)
>>> evaluator.validate_epoch_res()
0.816944420337677
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 0, 0],
                                [0, 0, 1, 1, 0, 0]])
>>> batch_y_pred = torch.tensor([[0.7, 0.9, 0.1, 0.1, 0.2, 0.0],
                                 [0.1, 0.2, 0.5, 0.3, 0.6, 0.0]])
>>> evaluator.test_add_batch(batch_y_true, batch_y_pred)
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 1, 0],
                                [0, 0, 1, 0, 1, 1]])
>>> batch_y_pred = torch.tensor([[0.3, 0.2, 0.1, 0.5, 0.2, 0.3],
                                 [0.3, 0.5, 0.7, 0.2, 0.1, 0.5]])
>>> evaluator.test_add_batch(batch_y_true, batch_y_pred)
>>> evaluator.test_epoch_res()
{
    'precision -> k@20': 0.4166666716337204,
    'recall -> k@20': 1.0,
    'ndcg -> k@20': 0.816944420337677
}