PubmedBiGraph

class dhg.data.PubmedBiGraph(data_root=None)[source]

Bases: dhg.data.base.BaseData

The PubmedBiGraph dataset is a citation network dataset for vertex classification task. These are synthetic bipartite graph datasets that are generated from citation networks (single graph) where documents and citation links between them are treated as nodes and undirected edges, respectively. More details see the Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs paper.

The content of the PubmedBiGraph dataset includes the following:

  • num_u_classes: The number of classes in set \(U\) : \(3\).

  • num_u_vertices: The number of vertices in set \(U\) : \(13,424\).

  • num_v_vertices: The number of vertices in set \(V\) : \(3,435\).

  • num_edges: The number of edges: \(18,782\).

  • dim_u_features: The dimension of features in set \(U\) : \(400\).

  • dim_v_features: The dimension of features in set \(V\) : \(500\).

  • u_features: The vertex feature matrix in set \(U\). torch.Tensor with size \((13,424 \times 400)\).

  • v_features: The vertex feature matrix in set \(V\) . torch.Tensor with size \((3,435 \times 500)\).

  • edge_list: The edge list. List with length \((2,314 \times 2)\).

  • u_labels: The label list in set \(U\) . torch.LongTensor with size \((13,424, )\).

Parameters

data_root (str, optional) – The data_root has stored the data. If set to None, this function will auto-download from server and save into the default direction ~/.dhg/datasets/. Defaults to None.