News20

class dhg.data.News20(data_root=None)[source]

Bases: dhg.data.base.BaseData

The 20 Newsgroups dataset is a newspaper network dataset for vertex classification task. The vertex features are the TF-IDF representations of news messages. More details see the YOU ARE ALLSET: A MULTISET LEARNING FRAMEWORK FOR HYPERGRAPH NEURAL NETWORKS paper.

The content of the 20 Newsgroups dataset includes the following:

  • num_classes: The number of classes: \(4\).

  • num_vertices: The number of vertices: \(16,342\).

  • num_edges: The number of edges: \(100\).

  • dim_features: The dimension of features: \(1,433\).

  • features: The vertex feature matrix. torch.Tensor with size \((16,342 \times 100)\).

  • edge_list: The edge list. List with length \(100\).

  • labels: The label list. torch.LongTensor with size \((16,342, )\).

Parameters

data_root (str, optional) – The data_root has stored the data. If set to None, this function will auto-download from server and save into the default direction ~/.dhg/datasets/. Defaults to None.