dhg.utils
Structure Helpers
- dhg.utils.remap_edge_list(e_list, bipartite_graph=False, ret_map=False)[source]
Remap the vertex markers to numbers of an ordered and continuous range.
Note
This function can support both low-order structures and high-order structures.
- Parameters
e_list (
List[tuple]) – Edge list of low-order structures or high-order structures.bipartite_graph (
bool) – Whether the structure is bipartite graph. Defaults toFalse.ret_map (
bool) – Whether to return the map dictionary of raw marker to new index. Defaults toFalse.
- dhg.utils.remap_edge_lists(*e_lists, bipartite_graph=False, ret_map=False)[source]
Remap the vertex markers to numbers of an ordered and continuous range for given multiple edge lists.
Note
This function can support both low-order structures and high-order structures.
- Parameters
e_lists (
List[List[tuple]]) – The list of edge list of low-order structures or high-order structures.bipartite_graph (
bool) – Whether the structure is bipartite graph. Defaults toFalse.ret_map (
bool) – Whether to return the map dictionary of raw marker to new index. Defaults toFalse.
- dhg.utils.remap_adj_list(adj_list, bipartite_graph=False, ret_map=False)[source]
Remap the vertex markers to numbers of an ordered and continuous range.
Note
This function can only support low-order structures like graph, directed graph, and bipartite graph.
- Parameters
adj_list (
List[List[int]]) – Adjacency list of low-order structures.bipartite_graph (
bool) – Whether the structure is bipartite graph. Defaults toFalse.ret_map (
bool) – Whether to return the map dictionary of raw marker to new index. Defaults toFalse.
- dhg.utils.remap_adj_lists(*adj_lists, bipartite_graph=False, ret_map=False)[source]
Remap the vertex markers to numbers of an ordered and continuous range for given multiple adjacency lists.
Note
This function can only support low-order structures like graph, directed graph, and bipartite graph.
- Parameters
adj_lists (
List[List[List[int]]]) – The list of adjacency list of low-order structures.bipartite_graph (
bool) – Whether the structure is bipartite graph. Defaults toFalse.ret_map (
bool) – Whether to return the map dictionary of raw marker to new index. Defaults toFalse.
- dhg.utils.edge_list_to_adj_list(e_list)[source]
Convert edge list to adjacency list for low-order structures.
Note
Adjacency list can only represent low-order structures like graph, directed graph, and bipartite graph.
- Parameters
e_list (
List[Tuple[int, int]]) – Edge list.
Sparse Operations
- dhg.utils.sparse_dropout(sp_mat, p, fill_value=0.0)[source]
Dropout function for sparse matrix. This function will return a new sparse matrix with the same shape as the input sparse matrix, but with some elements dropped out.
- Parameters
sp_mat (
torch.Tensor) – The sparse matrix with formattorch.sparse_coo_tensor.p (
float) – Probability of an element to be dropped.fill_value (
float) – The fill value for dropped elements. Defaults to0.0.
Dataset Splitting
- dhg.utils.split_by_num(num_v, v_label, train_num, val_num=None, test_num=None)[source]
Split the dataset by the number of vertices in each category, and return the masks of [
train_maskandtest_mask] or [train_mask,val_maskandtest_mask].- Parameters
num_v (
int) – The number of vertices.v_label (
Union[list, torch.Tensor, np.ndarray]) – The vertex labels.train_num (
int) – The number of vertices in the training set for each category.val_num (
Optional[int], optional) – The number of vertices in the validation set for each category. If set toNone, this function will only return the masks oftrain_maskandtest_mask. Defaults toNone.test_num (
Optional[int], optional) – The number of vertices in the test set for each category. If set toNone, except for the training and validation sets, the remaining all vertices will be used for testing. Defaults toNone.
Examples
>>> import numpy as np >>> from dhg.utils import split_by_num >>> num_v = 100 >>> v_label = np.random.randint(0, 3, num_v) # 3 categories >>> train_num, val_num, test_num = 10, 2, 5 >>> train_mask, val_mask, test_mask = split_by_num(num_v, v_label, train_num, val_num, test_num) >>> train_mask.sum(), val_mask.sum(), test_mask.sum() (tensor(30), tensor(6), tensor(15)) >>> train_mask, val_mask, test_mask = split_by_num(num_v, v_label, train_num, val_num) >>> train_mask.sum(), val_mask.sum(), test_mask.sum() (tensor(30), tensor(6), tensor(64))
- dhg.utils.split_by_ratio(num_v, v_label, train_ratio, val_ratio=None, test_ratio=None)[source]
Split the dataset by the ratio of vertices in each category, and return the masks of [
train_maskandtest_mask] or [train_mask,val_maskandtest_mask].- Parameters
num_v (
int) – The number of vertices.v_label (
Union[list, torch.Tensor, np.ndarray]) – The vertex labels.train_ratio (
float) – The ratio of vertices in the training set for each category.val_ratio (
Optional[float], optional) – The ratio of vertices in the validation set for each category. If set toNone, this function will only return the masks oftrain_maskandtest_mask. Defaults toNone.test_ratio (
Optional[float], optional) – The ratio of vertices in the test set for each category. If set toNone, except for the training and validation sets, the remaining all vertices will be used for testing. Defaults toNone.
Examples
>>> import numpy as np >>> from dhg.utils import split_by_ratio >>> num_v = 100 >>> v_label = np.random.randint(0, 3, num_v) # 3 categories >>> train_ratio, val_ratio, test_ratio = 0.6, 0.1, 0.2 >>> train_mask, val_mask, test_mask = split_by_ratio(num_v, v_label, train_ratio, val_ratio, test_ratio) >>> train_mask.sum(), val_mask.sum(), test_mask.sum() (tensor(59), tensor(9), tensor(18)) >>> train_mask, val_mask, test_mask = split_by_ratio(num_v, v_label, train_ratio, val_ratio) >>> train_mask.sum(), val_mask.sum(), test_mask.sum() (tensor(59), tensor(9), tensor(32))
- dhg.utils.split_by_num_for_UI_bigraph(g, train_num)[source]
Split the User-Item bipartite graph by the number of the items connected to each user. This function will return two adjacency matrices for training and testing, respectively.
- Parameters
g (
BiGraph) – The User-Item bipartite graph.train_num (
int) – The number of items for the training set for each user.
Examples
>>> import dhg >>> from dhg.utils import edge_list_to_adj_list, split_by_num_for_UI_bigraph >>> g = dhg.random.bigraph_Gnm(5, 8, 20) >>> edge_list_to_adj_list(g.e[0]) [[3, 4, 0, 6, 5], [0, 5, 1, 4, 3, 6], [2, 2, 5, 1], [1, 0, 6, 5, 1, 4, 7], [4, 5, 7]] >>> train_num = 3 >>> train_adj, test_adj = split_by_num_for_UI_bigraph(g, train_num) >>> train_adj [[0, 1, 3, 4], [1, 6, 0, 5], [2, 1, 2, 5], [3, 6, 4, 5], [4, 5, 7]] >>> test_adj [[0, 5, 6], [1, 1, 4, 7], [3, 0]]
- dhg.utils.split_by_ratio_for_UI_bigraph(g, train_ratio)[source]
Split the User-Item bipartite graph by ratio of the items connected to each user. This function will return two adjacency matrices for training and testing, respectively.
- Parameters
g (
BiGraph) – The User-Item bipartite graph.train_ratio (
float) – The ratio of items for the training set for each user.
Examples
>>> import dhg >>> from dhg.utils import edge_list_to_adj_list, split_by_ratio_for_UI_bigraph >>> g = dhg.random.bigraph_Gnm(5, 8, 20) >>> edge_list_to_adj_list(g.e[0]) [[4, 0, 6, 5, 4], [3, 4, 7, 0, 3, 6, 2], [2, 2, 5, 0, 6], [1, 0, 3, 1, 7], [0, 3, 6]] >>> train_ratio = 0.8 >>> train_adj, test_adj = split_by_ratio_for_UI_bigraph(g, train_ratio) >>> train_adj [[0, 6], [1, 3, 0, 1], [2, 2, 6, 5], [3, 0, 4, 3, 6], [4, 0, 4, 6]] >>> test_adj [[0, 3], [1, 7], [2, 0], [3, 2, 7], [4, 5]]
Dataset Wrapers
- class dhg.utils.UserItemDataset(*args, **kwargs)[source]
Bases:
torch.utils.data.DatasetThe dataset class of user-item bipartite graph for recommendation task.
- Parameters
num_users (
int) – The number of users.num_items (
int) – The number of items.user_item_list (
List[Tuple[int, int]]) – The list of user-item pairs.train_user_item_list (
List[Tuple[int, int]], optional) – The list of user-item pairs for training. This is only needed for testing to mask those seen items in training. Defaults toNone.strict_link (
bool) – Whether to iterate through all interactions in the dataset. If set toFalse, in training phase the dataset will keep randomly sampling interactions until meeting the same number of original interactions. Defaults toTrue.phase (
str) – The phase of the dataset can be either"train"or"test". Defaults to"train".
- __getitem__(index)[source]
Return the item at the index. If the phase is
"train", return the (User-PositiveItem-NegativeItem) triplet. If the phase is"test", return all true positive items for each user.- Parameters
index (
int) – The index of the item.
- __len__()[source]
Return the length of the dataset. If the phase is
"train", return the number of interactions. If the phase is"test", return the number of users.
Log Helpers
Download Helpers
- dhg.utils.download_file(url, file_path)[source]
Download a file from a url.
- Parameters
url (
str) – the url of the filefile_path (
str) – the path to the file