TencentBiGraph
- class dhg.data.TencentBiGraph(data_root=None)[source]
Bases:
dhg.data.base.BaseData
The TencentBiGraph dataset is a social network dataset for vertex classification task. This is a large-scale real-world social network represented by a bipartite graph. Nodes in set \(U\) are social network users, and nodes in set \(V\) are social communities (e.g., a subset of social network users who share the same interests in electrical products may join the same shopping community). Both users and communities are described by dense off-the-shelf feature vectors. The edge connection between two sets indicates that the user belongs to the community. Note that this dataset provides classification labels for research purposes. In real-world applications, labeling every node is impractical. More details see the Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs paper.
The content of the TencentBiGraph dataset includes the following:
num_u_classes
: The number of classes in set \(U\) : \(2\).num_u_vertices
: The number of vertices in set \(U\) : \(619,030\).num_v_vertices
: The number of vertices in set \(V\) : \(90,044\).num_edges
: The number of edges: \(144,501\).dim_u_features
: The dimension of features in set \(U\) : \(8\).dim_v_features
: The dimension of features in set \(V\) : \(16\).u_features
: The vertex feature matrix in set \(U\).torch.Tensor
with size \((619,030 \times 8)\).v_features
: The vertex feature matrix in set \(V\) .torch.Tensor
with size \((90,044 \times 16)\).edge_list
: The edge list.List
with length \((991,713 \times 2)\).u_labels
: The label list in set \(U\) .torch.LongTensor
with size \((619,030, )\).
- Parameters
data_root (
str
, optional) – Thedata_root
has stored the data. If set toNone
, this function will auto-download from server and save into the default direction~/.dhg/datasets/
. Defaults toNone
.