StandardSemiSupervisedLearner

class pyprophet.scoring.semi_supervised.StandardSemiSupervisedLearner(inner_learner, xeval_fraction, xeval_num_iter, ss_initial_fdr, ss_iteration_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, test, main_score_selection_report, outfile, level, ss_use_dynamic_main_score)[source]

Bases: AbstractSemiSupervisedLearner

Implements a standard semi-supervised learning workflow.

inner_learner

The base learner used for training.

Type:

AbstractLearner

ss_initial_fdr

Initial FDR threshold for training.

Type:

float

ss_iteration_fdr

FDR threshold for iterative learning.

Type:

float

parametric

Whether to use parametric FDR estimation.

Type:

bool

pfdr

Whether to use pFDR estimation.

Type:

bool

pi0_lambda

Lambda values for pi0 estimation.

Type:

list

pi0_method

Method for pi0 estimation.

Type:

str

pi0_smooth_df

Degrees of freedom for pi0 smoothing.

Type:

int

pi0_smooth_log_pi0

Whether to log-transform pi0 values.

Type:

bool

ss_use_dynamic_main_score

Whether to dynamically select the main score.

Type:

bool

__init__(inner_learner, xeval_fraction, xeval_num_iter, ss_initial_fdr, ss_iteration_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, test, main_score_selection_report, outfile, level, ss_use_dynamic_main_score)[source]
averaged_learner(params, **kwargs)[source]

Creates an averaged learner from multiple parameter sets.

Parameters:
  • params (list) – List of parameter sets.

  • kwargs – Additional arguments.

Returns:

The averaged learner.

Return type:

AbstractLearner

classmethod from_config(config: RunnerIOConfig, base_learner)[source]

Creates a StandardSemiSupervisedLearner instance from a configuration object.

Parameters:
Returns:

The initialized learner.

Return type:

StandardSemiSupervisedLearner

get_delta_td_bt_feature_size(train, col, mapper, working_thread_number)[source]

Calculates the difference in feature size between top decoy peaks and best target peaks.

Parameters:
  • train (Experiment) – Training data.

  • col (str) – Column used for selection.

  • mapper (dict) – Mapping of column aliases to feature names.

  • working_thread_number (int) – Number of threads to use.

Returns:

The absolute difference in feature size.

Return type:

int

iter_semi_supervised_learning(train, score_columns, working_thread_number)[source]

Performs iterative semi-supervised learning.

Parameters:
  • train (Experiment) – Training data.

  • score_columns (list) – List of score column names.

  • working_thread_number (int) – Number of threads to use.

Returns:

Model parameters and classifier scores.

Return type:

tuple

score(df, params)[source]

Scores the given data using the trained model.

Parameters:
  • df (pd.DataFrame) – Input data.

  • params (dict) – Model parameters.

Returns:

Classifier scores.

Return type:

np.ndarray

select_train_peaks(train, sel_column, cutoff_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, mapper=None, main_score_selection_report=False, outfile=None, level=None, working_thread_number=None)[source]

Selects the best target peaks and top decoy peaks based on FDR thresholds.

Parameters:
  • train (Experiment) – Training data.

  • sel_column (str) – Column used for selection.

  • cutoff_fdr (float) – FDR threshold for selection.

  • parametric (bool) – Whether to use parametric FDR estimation.

  • pfdr (bool) – Whether to use pFDR estimation.

  • pi0_lambda (list) – Lambda values for pi0 estimation.

  • pi0_method (str) – Method for pi0 estimation.

  • pi0_smooth_df (int) – Degrees of freedom for pi0 smoothing.

  • pi0_smooth_log_pi0 (bool) – Whether to log-transform pi0 values.

  • mapper (dict, optional) – Mapping of column aliases to feature names.

  • main_score_selection_report (bool, optional) – Whether to generate a score selection report.

  • outfile (str, optional) – Path to the output file.

  • level (str, optional) – Analysis level (e.g., peptide, protein).

  • working_thread_number (int, optional) – Number of threads to use.

Returns:

Top decoy peaks and best target peaks.

Return type:

tuple

set_learner(model)[source]

Sets the parameters of the inner learner.

Parameters:

model (object) – The model parameters.

start_semi_supervised_learning(train, score_columns, working_thread_number)[source]

Starts the semi-supervised learning process.

Parameters:
  • train (Experiment) – Training data.

  • score_columns (list) – List of score column names.

  • working_thread_number (int) – Number of threads to use.

Returns:

Model parameters, classifier scores, and selected main score column.

Return type:

tuple

tune_semi_supervised_learning(train)[source]

Tunes the semi-supervised learning model.

Parameters:

train (Experiment) – Training data.

Returns:

Model parameters and classifier scores.

Return type:

tuple