StandardSemiSupervisedLearner

class pyprophet.scoring.semi_supervised.StandardSemiSupervisedLearner(inner_learner, xeval_fraction, xeval_num_iter, ss_initial_fdr, ss_iteration_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, test, main_score_selection_report, outfile, level, ss_use_dynamic_main_score)[source]

Bases: AbstractSemiSupervisedLearner

Implements a standard semi-supervised learning workflow.

inner_learner

The base learner used for training.

Type:: AbstractLearner

ss_initial_fdr

Initial FDR threshold for training.

Type:: float

ss_iteration_fdr

FDR threshold for iterative learning.

Type:: float

parametric

Whether to use parametric FDR estimation.

Type:: bool

pfdr

Whether to use pFDR estimation.

Type:: bool

pi0_lambda

Lambda values for pi0 estimation.

Type:: list

pi0_method

Method for pi0 estimation.

Type:: str

pi0_smooth_df

Degrees of freedom for pi0 smoothing.

Type:: int

pi0_smooth_log_pi0

Whether to log-transform pi0 values.

Type:: bool

ss_use_dynamic_main_score

Whether to dynamically select the main score.

Type:: bool

__init__(inner_learner, xeval_fraction, xeval_num_iter, ss_initial_fdr, ss_iteration_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, test, main_score_selection_report, outfile, level, ss_use_dynamic_main_score)[source]

averaged_learner(params, **kwargs)[source]

Creates an averaged learner from multiple parameter sets.

Parameters:

params (list) – List of parameter sets.
kwargs – Additional arguments.

Returns:

The averaged learner.

Return type:

AbstractLearner

classmethod from_config(config: RunnerIOConfig, base_learner)[source]

Creates a StandardSemiSupervisedLearner instance from a configuration object.

Parameters:

config (RunnerIOConfig) – The configuration object.
base_learner (AbstractLearner) – The base learner used for training.

Returns:

The initialized learner.

Return type:

StandardSemiSupervisedLearner

get_delta_td_bt_feature_size(train, col, mapper, working_thread_number)[source]

Calculates the difference in feature size between top decoy peaks and best target peaks.

Parameters:

train (Experiment) – Training data.
col (str) – Column used for selection.
mapper (dict) – Mapping of column aliases to feature names.
working_thread_number (int) – Number of threads to use.

Returns:

The absolute difference in feature size.

Return type:

int

iter_semi_supervised_learning(train, score_columns, working_thread_number)[source]

Performs iterative semi-supervised learning.

Parameters:

train (Experiment) – Training data.
score_columns (list) – List of score column names.
working_thread_number (int) – Number of threads to use.

Returns:

Model parameters and classifier scores.

Return type:

tuple

score(df, params)[source]

Scores the given data using the trained model.

Parameters:

df (pd.DataFrame) – Input data.
params (dict) – Model parameters.

Returns:

Classifier scores.

Return type:

np.ndarray

select_train_peaks(train, sel_column, cutoff_fdr, parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, mapper=None, main_score_selection_report=False, outfile=None, level=None, working_thread_number=None)[source]

Selects the best target peaks and top decoy peaks based on FDR thresholds.

Parameters:

train (Experiment) – Training data.
sel_column (str) – Column used for selection.
cutoff_fdr (float) – FDR threshold for selection.
parametric (bool) – Whether to use parametric FDR estimation.
pfdr (bool) – Whether to use pFDR estimation.
pi0_lambda (list) – Lambda values for pi0 estimation.
pi0_method (str) – Method for pi0 estimation.
pi0_smooth_df (int) – Degrees of freedom for pi0 smoothing.
pi0_smooth_log_pi0 (bool) – Whether to log-transform pi0 values.
mapper (dict, optional) – Mapping of column aliases to feature names.
main_score_selection_report (bool, optional) – Whether to generate a score selection report.
outfile (str, optional) – Path to the output file.
level (str, optional) – Analysis level (e.g., peptide, protein).
working_thread_number (int, optional) – Number of threads to use.

Returns:

Top decoy peaks and best target peaks.

Return type:

tuple

set_learner(model)[source]

Sets the parameters of the inner learner.

Parameters:: model (object) – The model parameters.

start_semi_supervised_learning(train, score_columns, working_thread_number)[source]

Starts the semi-supervised learning process.

Parameters:

train (Experiment) – Training data.
score_columns (list) – List of score column names.
working_thread_number (int) – Number of threads to use.

Returns:

Model parameters, classifier scores, and selected main score column.

Return type:

tuple

tune_semi_supervised_learning(train)[source]

Tunes the semi-supervised learning model.

Parameters:: train (Experiment) – Training data.
Returns:: Model parameters and classifier scores.
Return type:: tuple