Hive_ML.evaluation.model_evaluation module#

Hive_ML.evaluation.model_evaluation.YB_Visualizer(clf, visualizer, x_train, y_train, x_test, y_test, kwargs)[source]#

Creates and Finalize a YellowBrick visualizer, given the classifier and the train/test features and corresponding labels to use for fitting and scoring.

Parameters:
  • clf (ClassifierMixin) – Classifier used by the Visualizer.

  • visualizer (str) – visualizer name to create. Must match a value in YB_VISUALIZERS.

  • x_train (ndarray) – Train Feature set used for the classifiers fitting.

  • y_train (ndarray) – Train Label set used for the classifiers fitting.

  • x_test (ndarray) – Test Feature set used for the classifiers scoring.

  • y_test (ndarray) – Test Label set used for the classifiers scoring.

  • kwargs (Dict) – Dictionary of kwargs for the YellowBrick Visualizer.

Return type:

Visualizer

Returns:

YellowBrick Visualizer finalized.

Hive_ML.evaluation.model_evaluation.evaluate_classifiers(ensemble_configuration_df, classifier_kwargs_list, train_feature_set, train_label_set, test_feature_set, test_label_set, aggregation, feature_selection, visualizers=None, output_file=None, plot_title='', random_state=None)[source]#

Evaluate ensemble Classification performance of provided classifiers, weighting and combining the single classifier predictions. If a list of YellowBrick Visualizers is provided, generates a single multi-plot report file.

Parameters:
  • ensemble_configuration_df (DataFrame) – Dataframe containing the ensemble configuration. Each row should include Classifier , N_Features ( Number of Features to select), and weight ( weighting of the classifier prediction in the ensemble).

  • classifier_kwargs_list (List[Dict]) – List of classifiers kwargs Dict, used to configure the classifiers.

  • train_feature_set (ndarray) – Train Feature set used for the classifiers fitting.

  • train_label_set (ndarray) – Train Label set used for the classifiers fitting.

  • test_feature_set (ndarray) – Test Feature set used for the classifiers evaluations.

  • test_label_set (ndarray) – Test Label set used for the classifiers evaluations.

  • feature_selection (str) – Type of Feature Selection to perform ( SFFS or PCA).

  • aggregation (str) – Type of Feature Aggregation.

  • visualizers (List[Dict]) – List of YellowBrick Visualizers to use in the report plot generation.

  • output_file (Union[str, PathLike]) – File location where to save the YellowBrick Plot Report.

  • plot_title (str) – String used in the YellowBrick plots as title.

Return type:

Dict

Returns:

Dictionary with the ensemble classifier report ( including the classification metrics ).

Hive_ML.evaluation.model_evaluation.select_best_classifiers(df_summary, metric, reduction, k=1)[source]#

Given a DataFrame containing Validation scores for different Classifiers and Number of Selected Features, returns the k-best combinations and their respective reduced score (mean or median over the validation splits).

Parameters:
  • df_summary (DataFrame) – Validation DataFrame Summary.

  • metric (str) – Metric to consider to select the best performance.

  • reduction (str) – Reduction to apply to the validation splits to select the best performance.

  • k (int) – Number of the best combinations to select.

Return type:

Tuple[List[Tuple[str, str]], List[float]]

Returns:

Selected best combinations [(N_Features, Classifier), (N_Features, Classifier), … ] and corresponding reduced validation scores.