Building Evaluator
Hint
Author: Yifan Feng (丰一帆)
Proof: Xinwei Zhang
Currently, DHG supports classification, recommender, and retrieval tasks. The detailed metrics are:
Classification ->
AccuracyF1 ScoreConfusion MatrixRetrieval ->
PrecisionRecallNDCGmAPMRRPrecision-Recall Curve
Generally speaking, the evaluation strategy can be divided into two categories:
-
like vertex classification in graph, the evaluation is performed on the whole graph at each epoch.
Add Batches Then Do Epoch Evaluation
like recommender systems, one epoch consists of multiple batches, and the evaluation is performed on each batch, then those batch results are aggregated to get the epoch result.
Initialization
All evaluators in DHG can be created with the same parameters as the following code:
>>> import dhg.metrics as dm
>>> evaluator = dm.GraphVertexClassificationEvaluator(
metric_configs = [
"accuracy",
{"f1_score": {"average": "macro"}},
],
validate_index = 0
)
>>> evaluator = dm.UserItemRecommenderEvaluator(
metric_configs = [
{"precision": {"k": 20}},
{"recall": {"k": 20}},
{"ndcg": {"k": 20}},
],
validate_index = 2
)
The first parameter metric_configs is the metric configuration, which is a list of metric names or metric configurations.
The second parameter validate_index is the index of the metric that is used to validate the model, which is used to compute the results in the validation set.
Epoch Evaluation
Currently, DHG implements two <Epoch Evaluation> tasks: vertex classification on graph and hypergraph.
As for validation and testing, you can directly call the validate(y_true, y_pred) method and
test(y_true, y_pred) method as follows:
Note
The evaluator.validate(y_true, y_pred) will only return i-th metric value, where i is specified by validate_index.
The evaluator.test(y_true, y_pred) will return a result dictionary of all metrics specified in metric_configs.
The following example shows a graph with 5 vertices and each vertex belongs to one of 3 classes.
>>> evaluator = dm.GraphVertexClassificationEvaluator(
metric_configs = [
"accuracy",
{"f1_score": {"average": "micro"}},
{"f1_score": {"average": "macro"}},
"confusion_matrix",
],
validate_index = 0
)
>>> y_true = torch.tensor([0, 2, 1, 0, 1])
>>> y_pred = torch.tensor([0, 1, 0, 0, 1])
>>> evaluator.validate(y_true, y_pred)
0.6000000238418579
>>> evaluator.test(y_true, y_pred)
{
'accuracy': 0.6000000238418579,
'f1_score -> average@micro': 0.6,
'f1_score -> average@macro': 0.43333333333333335,
'confusion_matrix': array([
[2, 0, 0],
[1, 1, 0],
[0, 1, 0]
])
}
>>> y_pred = torch.tensor([[0.7, 0.1, 0.2],
[0.1, 0.8, 0.1],
[0.7, 0.1, 0.2],
[0.6, 0.2, 0.2],
[0.2, 0.7, 0.1],])
>>> evaluator.validate(y_true, y_pred)
0.6000000238418579
>>> evaluator.test(y_true, y_pred)
{
'accuracy': 0.6000000238418579,
'f1_score -> average@micro': 0.6,
'f1_score -> average@macro': 0.43333333333333335,
'confusion_matrix': array([
[2, 0, 0],
[1, 1, 0],
[0, 1, 0]
])
}
Add Batches Then Do Epoch Evaluation
Currently, DHG implements only one <Add Batches Then Do Epoch Evaluation> task: recommender systems.
As for validation, you can call the validate_add_batch(y_true, y_pred) method to add batch data
and then call the validate_epoch_res() method to get the epoch result in the validation set.
As for testing, you can call the test_add_batch(y_true, y_pred) method to add batch data
and then call the test_epoch_res() method to get the epoch result in the testing set.
Note
The evaluator.validate_epoch_res() will only return i-th metric value, where i is specified by validate_index.
The evaluator.test_epoch_res() will return a result dictionary of all metrics specified in metric_configs.
The following example shows a User-Item bipartite graph with 4 users and 6 items, and the batch size is 2.
>>> evaluator = dm.UserItemRecommenderEvaluator(
metric_configs = [
{"precision": {"k": 20}},
{"recall": {"k": 20}},
{"ndcg": {"k": 20}},
],
validate_index = 2
)
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 0]])
>>> batch_y_pred = torch.tensor([[0.7, 0.9, 0.1, 0.1, 0.2, 0.0],
[0.1, 0.2, 0.5, 0.3, 0.6, 0.0]])
>>> evaluator.validate_add_batch(batch_y_true, batch_y_pred)
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 1, 0],
[0, 0, 1, 0, 1, 1]])
>>> batch_y_pred = torch.tensor([[0.3, 0.2, 0.1, 0.5, 0.2, 0.3],
[0.3, 0.5, 0.7, 0.2, 0.1, 0.5]])
>>> evaluator.validate_add_batch(batch_y_true, batch_y_pred)
>>> evaluator.validate_epoch_res()
0.816944420337677
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 0]])
>>> batch_y_pred = torch.tensor([[0.7, 0.9, 0.1, 0.1, 0.2, 0.0],
[0.1, 0.2, 0.5, 0.3, 0.6, 0.0]])
>>> evaluator.test_add_batch(batch_y_true, batch_y_pred)
>>> batch_y_true = torch.tensor([[0, 1, 0, 1, 1, 0],
[0, 0, 1, 0, 1, 1]])
>>> batch_y_pred = torch.tensor([[0.3, 0.2, 0.1, 0.5, 0.2, 0.3],
[0.3, 0.5, 0.7, 0.2, 0.1, 0.5]])
>>> evaluator.test_add_batch(batch_y_true, batch_y_pred)
>>> evaluator.test_epoch_res()
{
'precision -> k@20': 0.4166666716337204,
'recall -> k@20': 1.0,
'ndcg -> k@20': 0.816944420337677
}