Hive_ML.utilities.feature_utils module#

Hive_ML.utilities.feature_utils.data_shuffling(feature_set, label_set, seed_val, test_size=0.2, split_file=None)[source]#

Function to randomly shuffle the feature set and the corresponding label set along the subject dimension. If a split file is provided, the feature set and label set will be split according to the split file. :type feature_set: ndarray :param feature_set: Feature set to shuffle. :type label_set: ndarray :param label_set: Label set to shuffle. :type seed_val: int :param seed_val: Random seed generator. :type test_size: float :param test_size: Proportion of dataset to include in the Test set. Defaults to 0.2.

Return type:: Tuple[ndarray, ndarray, ndarray, ndarray]
Returns:: Shuffled Feature set and Label Set

Hive_ML.utilities.feature_utils.feature_normalization(x_train, x_val=None, x_test=None)[source]#

Normalize each feature in the range 0 to 1

Parameters:

x_train (ndarray) – Feature matrix of training set.
x_val (ndarray) – Feature matrix of validation set.
x_test (ndarray) – Feature matrix of test set. The default is None.

Return type:

Tuple[ndarray, ndarray, ndarray]

Returns:

normalized feature sets based on the statistics of training features.

Hive_ML.utilities.feature_utils.flatten_4D_features(feature_list, feature_names)[source]#

Function to flatten a 3D Feature set (with size [ n_sequences x n_subjects x n_features ] ) into a 2D Feature Set [ n_subjects, n_flatten_features ], where the features are flattened along axis 1. The total number of flattened features is equal to n_features x n sequences .

Parameters:

feature_list (List[List[Any]]) – 3D Feature Set to flatten
feature_names (List[str]) – List of feature names

Return type:

Tuple[ndarray, List[str]]

Returns:

Flatten 2D Feature set, updated Feature name list (appending the corresponding sequence index at each feature name).

Hive_ML.utilities.feature_utils.get_4D_feature_stats(feature_list)[source]#

Function to accumulate the feature set (with size [ n_sequences x n_subjects x n_features ] ) along the sequence dimension. Returns 4 Numpy arrays (each with size [ n_subjects x n_features ]), including average, sum, standard deviation and mean delta values along the sequence dimension.

\[ \begin{align}\begin{aligned}Mean F_{s,f} = \frac{1}{T} \sum_{t=0}^{T} F_{t,s,f}\\Sum F_{s,f} = \sum_{t=0}^{T} F_{t,s,f}\\SD F_{s,f} = \sqrt{ \frac{1}{T} \sum_{t=0}^{T} (F_{t,s,f} - Mean F_{s,f})^2 }\\Mean Delta F_{s,f} = \frac{1}{T} \sum_{t=0}^{T} | F_{t,s,f} - Mean F_{s,f} |\end{aligned}\end{align} \]

Parameters:: feature_list (List[List[Any]]) – Pandas DataFrame
Return type:: Tuple[ndarray, ndarray, ndarray, ndarray]
Returns:: Mean Sequence Array, SD Sequence Array, Sum Sequence Array, Mean Delta Array.

Hive_ML.utilities.feature_utils.get_feature_set_details(feature_set)[source]#

Function to extract details from a DataFrame, including the Subject IDs, the Subject Labels,and a list of Feature names. The 2-D Feature Set [ (n_subjects x n_sequences) x n_features ] is converted to a List of Lists, where the external list contains one element per sequence, and the internal one contains a list of features per each subject.

Example

\[feature_{set}[sequence][subject][feature].\]

Parameters:: feature_set (DataFrame) – Pandas DataFrame containing the feature set and the details.
Return type:: Tuple[List[Any], List[Any], List[Any], List[Any]]
Returns:: 3D Feature List [ n_sequences x n_subjects x n_features ], List of Subject IDS, List of Subject Labels and List of Feature names.

Hive_ML.utilities.feature_utils.prepare_features(feature_set, label_set, train_index, aggregation, val_index=None, val_feature_set=None, val_label_set=None)[source]#

Function to prepare a feature set into a train/test split, according to the provided indexes. If the feature_set.shape is 3D, performs a channel-wise (axis=1) normalization, optionally followed by a reduction ( mean, std ) along the same axis.

Parameters:

feature_set (ndarray) – Original feature set to split.
label_set (ndarray) – Original label set to split.
train_index (List[int]) – Train indexes to extract train split from the feature set. Ignored if val_label_set and val_feature_set are provided.
aggregation (str) – Aggregation type performed on the feature set. If Mean_Norm or SD_Norm, perform reduction along axis 1.
val_index (List[int]) – Validation indexes to extract validation split from the feature set. Ignored if val_label_set and val_feature_set are provided.
val_feature_set (ndarray) – Optional Validation Feature set, to directly provide the validation split data.
val_label_set (ndarray) – Optional Validation Label set, to directly provide the validation split data.

Return type:

Tuple[ndarray, ndarray, ndarray, ndarray]

Returns:

Train/Validation Feature and Label Data.