dhg.datapipe

We have implemented some datapipes to help you with the data processing.

Compose Datapipes

dhg.datapipe.compose_pipes(*pipes)[source]

Compose datapipe functions.

Parameters: pipes (Callable) – Datapipe functions to compose.

Transforms

dhg.datapipe.norm_ft(X, ord=None)[source]

Normalize the input feature matrix with specified ord refer to pytorch’s torch.linalg.norm function.

Note

The input feature matrix is expected to be a 1D vector or a 2D tensor with shape (num_samples, num_features).

Parameters

X (torch.Tensor) – The input feature.
ord (Union[int, float], optional) – The order of the norm can be either an int, float. If ord is None, the norm is computed with the 2-norm. Defaults to None.

Examples

>>> import dhg.datapipe as dd
>>> import torch
>>> X = torch.tensor([
            [0.1, 0.2, 0.5],
            [0.5, 0.2, 0.3],
            [0.3, 0.2, 0]
        ])
>>> dd.norm_ft(X)
tensor([[0.1826, 0.3651, 0.9129],
        [0.8111, 0.3244, 0.4867],
        [0.8321, 0.5547, 0.0000]])

dhg.datapipe.min_max_scaler(X, ft_min, ft_max)[source]

Normalize the input feature matrix with min-max scaling.

Parameters

X (torch.Tensor) – The input feature.
ft_min (float) – The minimum value of the output feature.
ft_max (float) – The maximum value of the output feature.

Examples

>>> import dhg.datapipe as dd
>>> import torch
>>> X = torch.tensor([
            [0.1, 0.2, 0.5],
            [0.5, 0.2, 0.3],
            [0.3, 0.2, 0.0]
        ])
>>> dd.min_max_scaler(X, -1, 1)
tensor([[-0.6000, -0.2000,  1.0000],
        [ 1.0000, -0.2000,  0.2000],
        [ 0.2000, -0.2000, -1.0000]])

dhg.datapipe.to_tensor(X)[source]

Convert List, numpy.ndarray, scipy.sparse.csr_matrix to torch.Tensor.

Parameters: X (Union[List, np.ndarray, torch.Tensor, scipy.sparse.csr_matrix]) – Input.

Examples

>>> import dhg.datapipe as dd
>>> X = [[0.1, 0.2, 0.5],
         [0.5, 0.2, 0.3],
         [0.3, 0.2, 0]]
>>> dd.to_tensor(X)
tensor([[0.1000, 0.2000, 0.5000],
        [0.5000, 0.2000, 0.3000],
        [0.3000, 0.2000, 0.0000]])

dhg.datapipe.to_bool_tensor(X)[source]

Convert List, numpy.ndarray, torch.Tensor to torch.BoolTensor.

Parameters: X (Union[List, np.ndarray, torch.Tensor]) – Input.

Examples

>>> import dhg.datapipe as dd
>>> X = [[0.1, 0.2, 0.5],
         [0.5, 0.2, 0.3],
         [0.3, 0.2, 0]]
>>> dd.to_bool_tensor(X)
tensor([[ True,  True,  True],
        [ True,  True,  True],
        [ True,  True, False]])

dhg.datapipe.to_long_tensor(X)[source]

Convert List, numpy.ndarray, torch.Tensor to torch.LongTensor.

Parameters: X (Union[List, np.ndarray, torch.Tensor]) – Input.

Examples

>>> import dhg.datapipe as dd
>>> X = [[1, 2, 5],
         [5, 2, 3],
         [3, 2, 0]]
>>> dd.to_long_tensor(X)
tensor([[1, 2, 5],
        [5, 2, 3],
        [3, 2, 0]])

Loaders

dhg.datapipe.load_from_pickle(file_path, keys=None, **kwargs)[source]

Load data from a pickle file.

Parameters

file_path (Path) – The local path of the file.
keys (Union[str, List[str]], optional) – The keys of the data. Defaults to None.

dhg.datapipe.load_from_txt(file_path, dtype, sep=',| |\t', ignore_header=0)[source]

Load data from a txt file.

Note

The separator is a regular expression of re module. Multiple separators can be separated by |. More details can refer to re.split.

Parameters

file_path (Path) – The local path of the file.
dtype (Union[str, Callable]) – The data type of the data can be either a string or a callable function.
sep (str, optional) – The separator of each line in the file. Defaults to ",| |\t".
ignore_header (int, optional) – The number of lines to ignore in the header of the file. Defaults to 0.

dhg.datapipe.load_from_json(file_path, **kwargs)[source]

Load data from a json file.

Parameters: file_path (Path) – The local path of the file.