dhg.data

Base Class

class dhg.data.BaseData(name, data_root=None)[source]

The Base Class of all datasets.

self._content = {
    'item': {
        'upon': [
            {'filename': 'part1.pkl', 'md5': '', bk_url: None},
            {'filename': 'part2.pkl', 'md5': '', bk_url: None},
        ],
        'loader': loader_function,
        'preprocess': [datapipe1, datapipe2],
    },
    ...
}
property content

Return the content of the dataset.

fetch_files(files)[source]

Download and check the files if they are not exist.

Parameters

files (List[Dict[str, str]]) – The files to download, each element in the list is a dict with at lease two keys: filename and md5. If extra key bk_url is provided, it will be used to download the file from the backup url.

needs_to_load(item_name)[source]

Return whether the item_name of the dataset needs to be loaded.

Parameters

item_name (str) – The name of the item in the dataset.

raw(key)[source]

Return the key of the dataset with un-preprocessed format.

Vertex Classification Datasets

dhg.data.Cora

The Cora dataset is a citation network dataset for vertex classification task.

dhg.data.Pubmed

The PubMed dataset is a citation network dataset for vertex classification task.

dhg.data.Citeseer

The Citeseer dataset is a citation network dataset for vertex classification task.

dhg.data.Cooking200

The Cooking 200 dataset is collected from Yummly.com for vertex classification task.

User-Item Recommender Datasets

dhg.data.MovieLens1M

The MovieLens1M dataset is collected for user-item recommendation task.

dhg.data.AmazonBook

The AmazonBook dataset is collected for user-item recommendation task.

dhg.data.Yelp2018

The Yelp2018 dataset is collected for user-item recommendation task.

dhg.data.Gowalla

The Gowalla dataset is collected for user-item recommendation task.

Welcome to contribute datasets!