Experiment

class pyprophet.scoring.data_handling.Experiment(df)[source]

Bases: object

Encapsulates data operations for peak groups, decoys, and targets.

df

The underlying data.

Type:

pd.DataFrame

__init__(df)[source]
__setattr__(name, value)[source]

Implement setattr(self, name, value).

__weakref__

list of weak references to the object (if defined)

add_peak_group_rank()[source]

Adds a peak group rank column to the data.

filter_(idx)[source]

Filters the data based on the given index.

Parameters:

idx (array-like) – Boolean index for filtering.

Returns:

A new Experiment containing the filtered data.

Return type:

Experiment

get_decoy_peaks()[source]

Retrieves the decoy peaks.

Returns:

A new Experiment containing the decoy peaks.

Return type:

Experiment

get_feature_matrix(use_main_score)[source]

Retrieves the feature matrix for scoring.

Parameters:

use_main_score (bool) – Whether to include the main score.

Returns:

The feature matrix.

Return type:

np.ndarray

get_target_peaks()[source]

Retrieves the target peaks.

Returns:

A new Experiment containing the target peaks.

Return type:

Experiment

get_top_decoy_peaks()[source]

Retrieves the top decoy peaks.

Returns:

A new Experiment containing the top decoy peaks.

Return type:

Experiment

get_top_target_peaks()[source]

Retrieves the top target peaks.

Returns:

A new Experiment containing the top target peaks.

Return type:

Experiment

get_top_test_peaks()[source]

Retrieves the top test peaks.

Returns:

A new Experiment containing the top test peaks.

Return type:

Experiment

get_train_peaks()[source]

Retrieves the training peaks.

Returns:

A new Experiment containing the training peaks.

Return type:

Experiment

log_summary()[source]

Logs a summary of the input data, including the number of peak groups, group IDs, and scores.

normalize_score_by_decoys(score_col_name)[source]

Normalizes the decoy scores to mean 0 and standard deviation 1, and scales the target scores accordingly.

Parameters:

score_col_name (str) – Name of the score column to normalize.

rank_by(score_col_name)[source]

Ranks the data by the specified score column.

Parameters:

score_col_name (str) – Name of the score column to rank by.

scale_features(score_columns)[source]

Scales the features to the [0, 1] range.

Parameters:

score_columns (list) – List of columns to be scaled.

set_and_rerank(col_name, scores)[source]

Sets a column with new scores and re-ranks the data.

Parameters:
  • col_name (str) – Name of the column to update.

  • scores (array-like) – New scores to assign.

split_for_xval(fraction, is_test)[source]

Splits the data for cross-validation.

Parameters:
  • fraction (float) – Fraction of data to use for training.

  • is_test (bool) – Whether this is a test split.