YelpRestaurant

class dhg.data.YelpRestaurant(data_root=None)[source]

Bases: dhg.data.base.BaseData

The Yelp-Restaurant dataset is a restaurant-review network dataset for vertex classification task. All businesses in the “restaurant” catalog are selected as our nodes, and formed hyperedges by selecting restaurants visited by the same user. We use the number of stars in the average review of a restaurant as the corresponding node label, starting from 1 and going up to 5 stars, with an interval of 0.5 stars. We then form the node features from the latitude, longitude, one-hot encoding of city and state, and bag-of-word encoding of the top-1000 words in the name of the corresponding restaurants. More details see the YOU ARE ALLSET: A MULTISET LEARNING FRAMEWORK FOR HYPERGRAPH NEURAL NETWORKS paper.

The content of the Yelp-Restaurant dataset includes the following:

  • num_classes: The number of classes: \(11\).

  • num_vertices: The number of vertices: \(50,758\).

  • num_edges: The number of edges: \(679,302\).

  • dim_features: The dimension of features: \(1,862\).

  • features: The vertex feature matrix. torch.Tensor with size \((50,758 \times 1,862)\).

  • edge_list: The edge list. List with length \(679,302\).

  • labels: The label list. torch.LongTensor with size \((50,758, )\).

  • state: The state list. torch.LongTensor with size \((50,758, )\).

  • city: The city list. torch.LongTensor with size \((50,758, )\).

Parameters

data_root (str, optional) – The data_root has stored the data. If set to None, this function will auto-download from server and save into the default direction ~/.dhg/datasets/. Defaults to None.