TencentBiGraph

class dhg.data.TencentBiGraph(data_root=None)[source]

Bases: dhg.data.base.BaseData

The TencentBiGraph dataset is a social network dataset for vertex classification task. This is a large-scale real-world social network represented by a bipartite graph. Nodes in set \(U\) are social network users, and nodes in set \(V\) are social communities (e.g., a subset of social network users who share the same interests in electrical products may join the same shopping community). Both users and communities are described by dense off-the-shelf feature vectors. The edge connection between two sets indicates that the user belongs to the community. Note that this dataset provides classification labels for research purposes. In real-world applications, labeling every node is impractical. More details see the Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs paper.

The content of the TencentBiGraph dataset includes the following:

  • num_u_classes: The number of classes in set \(U\) : \(2\).

  • num_u_vertices: The number of vertices in set \(U\) : \(619,030\).

  • num_v_vertices: The number of vertices in set \(V\) : \(90,044\).

  • num_edges: The number of edges: \(144,501\).

  • dim_u_features: The dimension of features in set \(U\) : \(8\).

  • dim_v_features: The dimension of features in set \(V\) : \(16\).

  • u_features: The vertex feature matrix in set \(U\). torch.Tensor with size \((619,030 \times 8)\).

  • v_features: The vertex feature matrix in set \(V\) . torch.Tensor with size \((90,044 \times 16)\).

  • edge_list: The edge list. List with length \((991,713 \times 2)\).

  • u_labels: The label list in set \(U\) . torch.LongTensor with size \((619,030, )\).

Parameters

data_root (str, optional) – The data_root has stored the data. If set to None, this function will auto-download from server and save into the default direction ~/.dhg/datasets/. Defaults to None.