dhg.datapipe
We have implemented some datapipes to help you with the data processing.
Compose Datapipes
Transforms
- dhg.datapipe.norm_ft(X, ord=None)[source]
Normalize the input feature matrix with specified
ordrefer to pytorch’s torch.linalg.norm function.Note
The input feature matrix is expected to be a 1D vector or a 2D tensor with shape (num_samples, num_features).
- Parameters
X (
torch.Tensor) – The input feature.ord (
Union[int, float], optional) – The order of the norm can be either anint,float. IfordisNone, the norm is computed with the 2-norm. Defaults toNone.
Examples
>>> import dhg.datapipe as dd >>> import torch >>> X = torch.tensor([ [0.1, 0.2, 0.5], [0.5, 0.2, 0.3], [0.3, 0.2, 0] ]) >>> dd.norm_ft(X) tensor([[0.1826, 0.3651, 0.9129], [0.8111, 0.3244, 0.4867], [0.8321, 0.5547, 0.0000]])
- dhg.datapipe.min_max_scaler(X, ft_min, ft_max)[source]
Normalize the input feature matrix with min-max scaling.
- Parameters
X (
torch.Tensor) – The input feature.ft_min (
float) – The minimum value of the output feature.ft_max (
float) – The maximum value of the output feature.
Examples
>>> import dhg.datapipe as dd >>> import torch >>> X = torch.tensor([ [0.1, 0.2, 0.5], [0.5, 0.2, 0.3], [0.3, 0.2, 0.0] ]) >>> dd.min_max_scaler(X, -1, 1) tensor([[-0.6000, -0.2000, 1.0000], [ 1.0000, -0.2000, 0.2000], [ 0.2000, -0.2000, -1.0000]])
- dhg.datapipe.to_tensor(X)[source]
Convert
List,numpy.ndarray,scipy.sparse.csr_matrixtotorch.Tensor.- Parameters
X (
Union[List, np.ndarray, torch.Tensor, scipy.sparse.csr_matrix]) – Input.
Examples
>>> import dhg.datapipe as dd >>> X = [[0.1, 0.2, 0.5], [0.5, 0.2, 0.3], [0.3, 0.2, 0]] >>> dd.to_tensor(X) tensor([[0.1000, 0.2000, 0.5000], [0.5000, 0.2000, 0.3000], [0.3000, 0.2000, 0.0000]])
- dhg.datapipe.to_bool_tensor(X)[source]
Convert
List,numpy.ndarray,torch.Tensortotorch.BoolTensor.- Parameters
X (
Union[List, np.ndarray, torch.Tensor]) – Input.
Examples
>>> import dhg.datapipe as dd >>> X = [[0.1, 0.2, 0.5], [0.5, 0.2, 0.3], [0.3, 0.2, 0]] >>> dd.to_bool_tensor(X) tensor([[ True, True, True], [ True, True, True], [ True, True, False]])
- dhg.datapipe.to_long_tensor(X)[source]
Convert
List,numpy.ndarray,torch.Tensortotorch.LongTensor.- Parameters
X (
Union[List, np.ndarray, torch.Tensor]) – Input.
Examples
>>> import dhg.datapipe as dd >>> X = [[1, 2, 5], [5, 2, 3], [3, 2, 0]] >>> dd.to_long_tensor(X) tensor([[1, 2, 5], [5, 2, 3], [3, 2, 0]])
Loaders
- dhg.datapipe.load_from_pickle(file_path, keys=None, **kwargs)[source]
Load data from a pickle file.
- Parameters
file_path (
Path) – The local path of the file.keys (
Union[str, List[str]], optional) – The keys of the data. Defaults toNone.
- dhg.datapipe.load_from_txt(file_path, dtype, sep=',| |\t', ignore_header=0)[source]
Load data from a txt file.
Note
The separator is a regular expression of
remodule. Multiple separators can be separated by|. More details can refer to re.split.- Parameters
file_path (
Path) – The local path of the file.dtype (
Union[str, Callable]) – The data type of the data can be either a string or a callable function.sep (
str, optional) – The separator of each line in the file. Defaults to",| |\t".ignore_header (
int, optional) – The number of lines to ignore in the header of the file. Defaults to0.