Hive_ML.utilities.feature_utils module#
- Hive_ML.utilities.feature_utils.data_shuffling(feature_set, label_set, seed_val, test_size=0.2, split_file=None)[source]#
Function to randomly shuffle the feature set and the corresponding label set along the subject dimension. If a split file is provided, the feature set and label set will be split according to the split file. :type feature_set:
ndarray:param feature_set: Feature set to shuffle. :type label_set:ndarray:param label_set: Label set to shuffle. :type seed_val:int:param seed_val: Random seed generator. :type test_size:float:param test_size: Proportion of dataset to include in the Test set. Defaults to 0.2.- Return type:
Tuple[ndarray,ndarray,ndarray,ndarray]- Returns:
Shuffled Feature set and Label Set
- Hive_ML.utilities.feature_utils.feature_normalization(x_train, x_val=None, x_test=None)[source]#
Normalize each feature in the range 0 to 1
- Parameters:
x_train (
ndarray) – Feature matrix of training set.x_val (
ndarray) – Feature matrix of validation set.x_test (
ndarray) – Feature matrix of test set. The default is None.
- Return type:
Tuple[ndarray,ndarray,ndarray]- Returns:
normalized feature sets based on the statistics of training features.
- Hive_ML.utilities.feature_utils.flatten_4D_features(feature_list, feature_names)[source]#
Function to flatten a 3D Feature set (with size [ n_sequences x n_subjects x n_features ] ) into a 2D Feature Set [ n_subjects, n_flatten_features ], where the features are flattened along axis 1. The total number of flattened features is equal to n_features x n sequences .
- Parameters:
feature_list (
List[List[Any]]) – 3D Feature Set to flattenfeature_names (
List[str]) – List of feature names
- Return type:
Tuple[ndarray,List[str]]- Returns:
Flatten 2D Feature set, updated Feature name list (appending the corresponding sequence index at each feature name).
- Hive_ML.utilities.feature_utils.get_4D_feature_stats(feature_list)[source]#
Function to accumulate the feature set (with size [ n_sequences x n_subjects x n_features ] ) along the sequence dimension. Returns 4 Numpy arrays (each with size [ n_subjects x n_features ]), including average, sum, standard deviation and mean delta values along the sequence dimension.
\[ \begin{align}\begin{aligned}Mean F_{s,f} = \frac{1}{T} \sum_{t=0}^{T} F_{t,s,f}\\Sum F_{s,f} = \sum_{t=0}^{T} F_{t,s,f}\\SD F_{s,f} = \sqrt{ \frac{1}{T} \sum_{t=0}^{T} (F_{t,s,f} - Mean F_{s,f})^2 }\\Mean Delta F_{s,f} = \frac{1}{T} \sum_{t=0}^{T} | F_{t,s,f} - Mean F_{s,f} |\end{aligned}\end{align} \]- Parameters:
feature_list (
List[List[Any]]) – Pandas DataFrame- Return type:
Tuple[ndarray,ndarray,ndarray,ndarray]- Returns:
Mean Sequence Array, SD Sequence Array, Sum Sequence Array, Mean Delta Array.
- Hive_ML.utilities.feature_utils.get_feature_set_details(feature_set)[source]#
Function to extract details from a DataFrame, including the Subject IDs, the Subject Labels,and a list of Feature names. The 2-D Feature Set [ (n_subjects x n_sequences) x n_features ] is converted to a List of Lists, where the external list contains one element per sequence, and the internal one contains a list of features per each subject.
Example
\[feature_{set}[sequence][subject][feature].\]- Parameters:
feature_set (
DataFrame) – Pandas DataFrame containing the feature set and the details.- Return type:
Tuple[List[Any],List[Any],List[Any],List[Any]]- Returns:
3D Feature List [ n_sequences x n_subjects x n_features ], List of Subject IDS, List of Subject Labels and List of Feature names.
- Hive_ML.utilities.feature_utils.prepare_features(feature_set, label_set, train_index, aggregation, val_index=None, val_feature_set=None, val_label_set=None)[source]#
Function to prepare a feature set into a train/test split, according to the provided indexes. If the
feature_set.shapeis 3D, performs a channel-wise (axis=1) normalization, optionally followed by a reduction ( mean, std ) along the same axis.- Parameters:
feature_set (
ndarray) – Original feature set to split.label_set (
ndarray) – Original label set to split.train_index (
List[int]) – Train indexes to extract train split from the feature set. Ignored ifval_label_setandval_feature_setare provided.aggregation (
str) – Aggregation type performed on the feature set. IfMean_NormorSD_Norm, perform reduction along axis 1.val_index (
List[int]) – Validation indexes to extract validation split from the feature set. Ignored ifval_label_setandval_feature_setare provided.val_feature_set (
ndarray) – Optional Validation Feature set, to directly provide the validation split data.val_label_set (
ndarray) – Optional Validation Label set, to directly provide the validation split data.
- Return type:
Tuple[ndarray,ndarray,ndarray,ndarray]- Returns:
Train/Validation Feature and Label Data.