Metrics

class moosefs.metrics.performance_metrics.BaseMetric(name: str, task: str)[source]

Bases: object

Base class for computing evaluation metrics.

Trains a small battery of models and aggregates per-model metric values.

__init__(name: str, task: str) → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

model_signature() → str[source]: Return a stable signature describing the internal model set.

_initialize_models() → dict[source]

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict[source]

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float[source]: Compute the metric (implemented by subclasses).

class moosefs.metrics.performance_metrics.RegressionMetric(name: str)[source]

Bases: BaseMetric

Base class for regression metrics.

__init__(name: str) → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float[source]: Average the metric over the internal model set.

aggregate_from_results(y_test: ndarray, results: dict) → float[source]: Aggregate metric value from cached prediction results.

_metric_func(y_true: ndarray, y_pred: ndarray) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.R2Score[source]

Bases: RegressionMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.MeanAbsoluteError[source]

Bases: RegressionMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.MeanSquaredError[source]

Bases: RegressionMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.ClassificationMetric(name: str)[source]

Bases: BaseMetric

Base class for classification metrics.

__init__(name: str) → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float[source]: Average the metric over the internal model set.

aggregate_from_results(y_test: ndarray, results: dict) → float[source]: Aggregate metric value from cached prediction results.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: ndarray | None = None) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.LogLoss[source]

Bases: ClassificationMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: ndarray) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.F1Score[source]

Bases: ClassificationMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: None = None) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.Accuracy[source]

Bases: ClassificationMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: None = None) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.PrecisionScore[source]

Bases: ClassificationMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: None = None) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

class moosefs.metrics.performance_metrics.RecallScore[source]

Bases: ClassificationMetric

__init__() → None[source]

Initialize the metric with a task type.

Parameters:

name – Human-readable metric name.
task – Either “classification” or “regression”.

_metric_func(y_true: ndarray, y_pred: ndarray, y_proba: None = None) → float[source]: Metric function to be overridden by subclasses.

_initialize_models() → dict

Initialize task-specific models.

Returns:: Mapping from model label to estimator instance.

aggregate_from_results(y_test: ndarray, results: dict) → float: Aggregate metric value from cached prediction results.

compute(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → float: Average the metric over the internal model set.

model_signature() → str: Return a stable signature describing the internal model set.

train_and_predict(X_train: Any, y_train: Any, X_test: Any, y_test: Any) → dict

Train all models and generate predictions.

Parameters:

X_train – Training features.
y_train – Training targets.
X_test – Test features.
y_test – Test targets.

Returns:

Dict keyed by model name with predictions and optional probabilities.

moosefs.metrics.stability_metrics.compute_stability_metrics(features_list: list) → float[source]

Compute stability SH(S) across selections.

Parameters:: features_list – Selected feature names per selector.
Returns:: Stability in [0, 1].

moosefs.metrics.stability_metrics._jaccard(a: set, b: set) → float[source]: Return Jaccard similarity, handling empty sets as 1.0 if both empty.

moosefs.metrics.stability_metrics.diversity_agreement(selectors: list, merged: list, alpha: float = 0.5) → float[source]

Blend diversity and agreement into a single score.

Parameters:

selectors – List of selected feature lists (one per selector).
merged – Merged/core feature names for the group.
alpha – Weight on agreement (0 → pure diversity, 1 → pure agreement).

Returns:

Score in [0, 1] (higher is better).