dhg.data

Base Class

class dhg.data.BaseData(name, data_root=None)[source]

The Base Class of all datasets.

self._content = {
    'item': {
        'upon': [
            {'filename': 'part1.pkl', 'md5': 'xxxxx',},
            {'filename': 'part2.pkl', 'md5': 'xxxxx',},
        ],
        'loader': loader_function,
        'preprocess': [datapipe1, datapipe2],
    },
    ...
}
property content

Return the content of the dataset.

fetch_files(files)[source]

Download and check the files if they are not exist.

Parameters

files (List[Dict[str, str]]) – The files to download, each element in the list is a dict with at lease two keys: filename and md5. If extra key bk_url is provided, it will be used to download the file from the backup url.

needs_to_load(item_name)[source]

Return whether the item_name of the dataset needs to be loaded.

Parameters

item_name (str) – The name of the item in the dataset.

raw(key)[source]

Return the key of the dataset with un-preprocessed format.

Graph Datasets

dhg.data.Cora

The Cora dataset is a citation network dataset for vertex classification task.

dhg.data.Pubmed

The PubMed dataset is a citation network dataset for vertex classification task.

dhg.data.Citeseer

The Citeseer dataset is a citation network dataset for vertex classification task.

dhg.data.BlogCatalog

The BlogCatalog dataset is a social network dataset for vertex classification task.

dhg.data.Flickr

The Flickr dataset is a social network dataset for vertex classification task.

dhg.data.Github

The Github dataset is a collaboration network dataset for vertex classification task.

dhg.data.Facebook

The Facebook dataset is a social network dataset for vertex classification task.

Bipartite Graph Datasets

dhg.data.MovieLens1M

The MovieLens1M dataset is collected for user-item recommendation task.

dhg.data.AmazonBook

The AmazonBook dataset is collected for user-item recommendation task.

dhg.data.Yelp2018

The Yelp2018 dataset is collected for user-item recommendation task.

dhg.data.Gowalla

The Gowalla dataset is collected for user-item recommendation task.

dhg.data.TencentBiGraph

The TencentBiGraph dataset is a social network dataset for vertex classification task.

dhg.data.CoraBiGraph

The CoraBiGraph dataset is a citation network dataset for vertex classification task.

dhg.data.PubmedBiGraph

The PubmedBiGraph dataset is a citation network dataset for vertex classification task.

dhg.data.CiteseerBiGraph

The CiteseerBiGraph dataset is a citation network dataset for vertex classification task.

Hypergraph Datasets

dhg.data.Cooking200

The Cooking 200 dataset is collected from Yummly.com for vertex classification task.

dhg.data.CoauthorshipCora

The Co-authorship Cora dataset is a citation network dataset for vertex classification task.

dhg.data.CoauthorshipDBLP

The Co-authorship DBLP dataset is a citation network dataset for vertex classification task.

dhg.data.CocitationCora

The Co-citation Cora dataset is a citation network dataset for vertex classification task.

dhg.data.CocitationCiteseer

The Co-citation Citeseer dataset is a citation network dataset for vertex classification task.

dhg.data.CocitationPubmed

The Co-citation PubMed dataset is a citation network dataset for vertex classification task.

dhg.data.YelpRestaurant

The Yelp-Restaurant dataset is a restaurant-review network dataset for vertex classification task.

dhg.data.WalmartTrips

The Walmart Trips dataset is a user-product network dataset for vertex classification task.

dhg.data.HouseCommittees

The House Committees dataset is a committee network dataset for vertex classification task.

dhg.data.News20

The 20 Newsgroups dataset is a newspaper network dataset for vertex classification task.

dhg.data.DBLP4k

The DBLP-4k dataset is a citation network dataset for node classification task.

dhg.data.DBLP8k

The DBLP-8k dataset is a citation network dataset for link prediction task.

dhg.data.IMDB4k

The IMDB-4k dataset is a movie dataset for node classification task.

dhg.data.Recipe100k

The Recipe100k dataset is a recipe-ingredient network dataset for vertex classification task.

dhg.data.Recipe200k

The Recipe200k dataset is a recipe-ingredient network dataset for vertex classification task.

dhg.data.Yelp3k

The Yelp3k dataset is a subset of Yelp-Restaurant dataset for vertex classification task.

dhg.data.Tencent2k

The Tencent2k dataset is a social network dataset for vertex classification task.

Welcome to contribute datasets!