dhg.utils

Structure Helpers

dhg.utils.remap_edge_list(e_list, bipartite_graph=False, ret_map=False)[source]

Remap the vertex markers to numbers of an ordered and continuous range.

Note

This function can support both low-order structures and high-order structures.

Parameters

e_list (List[tuple]) – Edge list of low-order structures or high-order structures.
bipartite_graph (bool) – Whether the structure is bipartite graph. Defaults to False.
ret_map (bool) – Whether to return the map dictionary of raw marker to new index. Defaults to False.

dhg.utils.remap_edge_lists(*e_lists, bipartite_graph=False, ret_map=False)[source]

Remap the vertex markers to numbers of an ordered and continuous range for given multiple edge lists.

Note

This function can support both low-order structures and high-order structures.

Parameters

e_lists (List[List[tuple]]) – The list of edge list of low-order structures or high-order structures.
bipartite_graph (bool) – Whether the structure is bipartite graph. Defaults to False.
ret_map (bool) – Whether to return the map dictionary of raw marker to new index. Defaults to False.

dhg.utils.remap_adj_list(adj_list, bipartite_graph=False, ret_map=False)[source]

Remap the vertex markers to numbers of an ordered and continuous range.

Note

This function can only support low-order structures like graph, directed graph, and bipartite graph.

Parameters

adj_list (List[List[int]]) – Adjacency list of low-order structures.
bipartite_graph (bool) – Whether the structure is bipartite graph. Defaults to False.
ret_map (bool) – Whether to return the map dictionary of raw marker to new index. Defaults to False.

dhg.utils.remap_adj_lists(*adj_lists, bipartite_graph=False, ret_map=False)[source]

Remap the vertex markers to numbers of an ordered and continuous range for given multiple adjacency lists.

Note

This function can only support low-order structures like graph, directed graph, and bipartite graph.

Parameters

adj_lists (List[List[List[int]]]) – The list of adjacency list of low-order structures.
bipartite_graph (bool) – Whether the structure is bipartite graph. Defaults to False.
ret_map (bool) – Whether to return the map dictionary of raw marker to new index. Defaults to False.

dhg.utils.edge_list_to_adj_list(e_list)[source]

Convert edge list to adjacency list for low-order structures.

Note

Adjacency list can only represent low-order structures like graph, directed graph, and bipartite graph.

Parameters: e_list (List[Tuple[int, int]]) – Edge list.

dhg.utils.edge_list_to_adj_dict(e_list)[source]

Convert edge list to adjacency dictionary for low-order structures.

Note

Adjacency list can only represent low-order structures like graph, directed graph, and bipartite graph.

Parameters: e_list (List[Tuple[int, int]]) – Edge list.

dhg.utils.adj_list_to_edge_list(adj_list)[source]

Convert adjacency list to edge list for low-order structures.

Note

Adjacency list can only represent low-order structures like graph, directed graph, and bipartite graph.

Parameters: adj_list (List[List[int]]) – Adjacency list.

Sparse Operations

dhg.utils.sparse_dropout(sp_mat, p, fill_value=0.0)[source]

Dropout function for sparse matrix. This function will return a new sparse matrix with the same shape as the input sparse matrix, but with some elements dropped out.

Parameters

sp_mat (torch.Tensor) – The sparse matrix with format torch.sparse_coo_tensor.
p (float) – Probability of an element to be dropped.
fill_value (float) – The fill value for dropped elements. Defaults to 0.0.

Dataset Splitting

dhg.utils.split_by_num(num_v, v_label, train_num, val_num=None, test_num=None)[source]

Split the dataset by the number of vertices in each category, and return the masks of [train_mask and test_mask] or [train_mask, val_mask and test_mask].

Parameters

num_v (int) – The number of vertices.
v_label (Union[list, torch.Tensor, np.ndarray]) – The vertex labels.
train_num (int) – The number of vertices in the training set for each category.
val_num (Optional[int], optional) – The number of vertices in the validation set for each category. If set to None, this function will only return the masks of train_mask and test_mask. Defaults to None.
test_num (Optional[int], optional) – The number of vertices in the test set for each category. If set to None, except for the training and validation sets, the remaining all vertices will be used for testing. Defaults to None.

Examples

>>> import numpy as np
>>> from dhg.utils import split_by_num
>>> num_v = 100
>>> v_label = np.random.randint(0, 3, num_v) # 3 categories
>>> train_num, val_num, test_num = 10, 2, 5
>>> train_mask, val_mask, test_mask = split_by_num(num_v, v_label, train_num, val_num, test_num)
>>> train_mask.sum(), val_mask.sum(), test_mask.sum()
(tensor(30), tensor(6), tensor(15))
>>> train_mask, val_mask, test_mask = split_by_num(num_v, v_label, train_num, val_num)
>>> train_mask.sum(), val_mask.sum(), test_mask.sum()
(tensor(30), tensor(6), tensor(64))

dhg.utils.split_by_ratio(num_v, v_label, train_ratio, val_ratio=None, test_ratio=None)[source]

Split the dataset by the ratio of vertices in each category, and return the masks of [train_mask and test_mask] or [train_mask, val_mask and test_mask].

Parameters

num_v (int) – The number of vertices.
v_label (Union[list, torch.Tensor, np.ndarray]) – The vertex labels.
train_ratio (float) – The ratio of vertices in the training set for each category.
val_ratio (Optional[float], optional) – The ratio of vertices in the validation set for each category. If set to None, this function will only return the masks of train_mask and test_mask. Defaults to None.
test_ratio (Optional[float], optional) – The ratio of vertices in the test set for each category. If set to None, except for the training and validation sets, the remaining all vertices will be used for testing. Defaults to None.

Examples

>>> import numpy as np
>>> from dhg.utils import split_by_ratio
>>> num_v = 100
>>> v_label = np.random.randint(0, 3, num_v) # 3 categories
>>> train_ratio, val_ratio, test_ratio = 0.6, 0.1, 0.2
>>> train_mask, val_mask, test_mask = split_by_ratio(num_v, v_label, train_ratio, val_ratio, test_ratio)
>>> train_mask.sum(), val_mask.sum(), test_mask.sum()
(tensor(59), tensor(9), tensor(18))
>>> train_mask, val_mask, test_mask = split_by_ratio(num_v, v_label, train_ratio, val_ratio)
>>> train_mask.sum(), val_mask.sum(), test_mask.sum()
(tensor(59), tensor(9), tensor(32))

dhg.utils.split_by_num_for_UI_bigraph(g, train_num)[source]

Split the User-Item bipartite graph by the number of the items connected to each user. This function will return two adjacency matrices for training and testing, respectively.

Parameters

g (BiGraph) – The User-Item bipartite graph.
train_num (int) – The number of items for the training set for each user.

Examples

>>> import dhg
>>> from dhg.utils import edge_list_to_adj_list, split_by_num_for_UI_bigraph
>>> g = dhg.random.bigraph_Gnm(5, 8, 20)
>>> edge_list_to_adj_list(g.e[0])
[[3, 4, 0, 6, 5], [0, 5, 1, 4, 3, 6], [2, 2, 5, 1], [1, 0, 6, 5, 1, 4, 7], [4, 5, 7]]
>>> train_num = 3
>>> train_adj, test_adj = split_by_num_for_UI_bigraph(g, train_num)
>>> train_adj
[[0, 1, 3, 4], [1, 6, 0, 5], [2, 1, 2, 5], [3, 6, 4, 5], [4, 5, 7]]
>>> test_adj
[[0, 5, 6], [1, 1, 4, 7], [3, 0]]

dhg.utils.split_by_ratio_for_UI_bigraph(g, train_ratio)[source]

Split the User-Item bipartite graph by ratio of the items connected to each user. This function will return two adjacency matrices for training and testing, respectively.

Parameters

g (BiGraph) – The User-Item bipartite graph.
train_ratio (float) – The ratio of items for the training set for each user.

Examples

>>> import dhg
>>> from dhg.utils import edge_list_to_adj_list, split_by_ratio_for_UI_bigraph
>>> g = dhg.random.bigraph_Gnm(5, 8, 20)
>>> edge_list_to_adj_list(g.e[0])
[[4, 0, 6, 5, 4], [3, 4, 7, 0, 3, 6, 2], [2, 2, 5, 0, 6], [1, 0, 3, 1, 7], [0, 3, 6]]
>>> train_ratio = 0.8
>>> train_adj, test_adj = split_by_ratio_for_UI_bigraph(g, train_ratio)
>>> train_adj
[[0, 6], [1, 3, 0, 1], [2, 2, 6, 5], [3, 0, 4, 3, 6], [4, 0, 4, 6]]
>>> test_adj
[[0, 3], [1, 7], [2, 0], [3, 2, 7], [4, 5]]

Dataset Wrapers

class dhg.utils.UserItemDataset(*args, **kwargs)[source]

Bases: torch.utils.data.Dataset

The dataset class of user-item bipartite graph for recommendation task.

Parameters

num_users (int) – The number of users.
num_items (int) – The number of items.
user_item_list (List[Tuple[int, int]]) – The list of user-item pairs.
train_user_item_list (List[Tuple[int, int]], optional) – The list of user-item pairs for training. This is only needed for testing to mask those seen items in training. Defaults to None.
strict_link (bool) – Whether to iterate through all interactions in the dataset. If set to False, in training phase the dataset will keep randomly sampling interactions until meeting the same number of original interactions. Defaults to True.
phase (str) – The phase of the dataset can be either "train" or "test". Defaults to "train".

__getitem__(index)[source]

Return the item at the index. If the phase is "train", return the (User-PositiveItem-NegativeItem) triplet. If the phase is "test", return all true positive items for each user.

Parameters: index (int) – The index of the item.

__len__()[source]: Return the length of the dataset. If the phase is "train", return the number of interactions. If the phase is "test", return the number of users.

sample_neg_item(user)[source]

Sample a negative item for the sepcified user.

Parameters: user (int) – The index of the specified user.

sample_triplet()[source]: Sample a triple of user, positive item, and negtive item from all interactions.

Log Helpers

dhg.utils.default_log_formatter()[source]: Create a default formatter of log messages for logging.

dhg.utils.simple_stdout2file(file_path)[source]

This function simply wraps the sys.stdout stream, and outputs messages to the sys.stdout and a specified file, simultaneously.

Parameters: file_path (file_path: Union[str, Path]) – The path of the file to output the messages.

Download Helpers

dhg.utils.download_file(url, file_path)[source]

Download a file from a url.

Parameters

url (str) – the url of the file
file_path (str) – the path to the file

dhg.utils.check_file(file_path, md5)[source]

Check if a file is valid.

Parameters

file_path (Path) – The local path of the file.
md5 (str) – The md5 of the file.

Raises

FileNotFoundError – Not found the file.

dhg.utils.download_and_check(url, file_path, md5)[source]

Download a file from a url and check its integrity.

Parameters

url (str) – The url of the file.
file_path (Path) – The path to the file.
md5 (str) – The md5 of the file.