Experiment

class pyprophet.scoring.data_handling.Experiment(df)[source]

Bases: object

Encapsulates data operations for peak groups, decoys, and targets.

df

The underlying data.

Type:: pd.DataFrame

__init__(df)[source]

__setattr__(name, value)[source]: Implement setattr(self, name, value).

__weakref__: list of weak references to the object (if defined)

add_peak_group_rank()[source]: Adds a peak group rank column to the data.

filter_(idx)[source]

Filters the data based on the given index.

Parameters:: idx (array-like) – Boolean index for filtering.
Returns:: A new Experiment containing the filtered data.
Return type:: Experiment

get_decoy_peaks()[source]

Retrieves the decoy peaks.

Returns:: A new Experiment containing the decoy peaks.
Return type:: Experiment

get_feature_matrix(use_main_score)[source]

Retrieves the feature matrix for scoring.

Parameters:: use_main_score (bool) – Whether to include the main score.
Returns:: The feature matrix.
Return type:: np.ndarray

get_target_peaks()[source]

Retrieves the target peaks.

Returns:: A new Experiment containing the target peaks.
Return type:: Experiment

get_top_decoy_peaks()[source]

Retrieves the top decoy peaks.

Returns:: A new Experiment containing the top decoy peaks.
Return type:: Experiment

get_top_target_peaks()[source]

Retrieves the top target peaks.

Returns:: A new Experiment containing the top target peaks.
Return type:: Experiment

get_top_test_peaks()[source]

Retrieves the top test peaks.

Returns:: A new Experiment containing the top test peaks.
Return type:: Experiment

get_train_peaks()[source]

Retrieves the training peaks.

Returns:: A new Experiment containing the training peaks.
Return type:: Experiment

log_summary()[source]: Logs a summary of the input data, including the number of peak groups, group IDs, and scores.

normalize_score_by_decoys(score_col_name)[source]

Normalizes the decoy scores to mean 0 and standard deviation 1, and scales the target scores accordingly.

Parameters:: score_col_name (str) – Name of the score column to normalize.

rank_by(score_col_name)[source]

Ranks the data by the specified score column.

Parameters:: score_col_name (str) – Name of the score column to rank by.

scale_features(score_columns)[source]

Scales the features to the [0, 1] range.

Parameters:: score_columns (list) – List of columns to be scaled.

set_and_rerank(col_name, scores)[source]

Sets a column with new scores and re-ranks the data.

Parameters:

col_name (str) – Name of the column to update.
scores (array-like) – New scores to assign.

split_for_xval(fraction, is_test)[source]

Splits the data for cross-validation.

Parameters:

fraction (float) – Fraction of data to use for training.
is_test (bool) – Whether this is a test split.