Semi-Supervised Scoring Documentation

This module provides the main tools for statistical scoring, error estimation, and hypothesis testing in targeted proteomics and glycoproteomics workflows. It includes modules for semi-supervised learning, feature scaling, classifier integration, and context-specific inference.

Submodules:

  • data_handling: Utilities for handling and processing data, including feature scaling, ranking, and validation.

  • classifiers: Implements various classifiers (e.g., LDA, SVM, XGBoost) for scoring.

  • semi_supervised: Implements semi-supervised learning workflows for iterative scoring.

  • runner: Defines workflows for running PyProphet, including learning and weight application.

  • pyprophet: Core functionality for orchestrating scoring and error estimation workflows.

Dependencies:

  • numpy

  • pandas

  • scikit-learn

  • xgboost

  • loguru

  • click

scoring

This module provides the main tools for statistical scoring, error estimation, and hypothesis testing in targeted proteomics and glycoproteomics workflows.

Runner

PyProphetRunner

Base class for running PyProphet workflows.

PyProphetLearner

Implements the learning and scoring workflow for PyProphet.

PyProphetWeightApplier

Applies pre-trained weights to full/new datasets.

PyProphet

PyProphet

Orchestrates the semi-supervised learning and scoring workflow.

Scorer

Handles scoring, error estimation, and hypothesis testing for experiments.

Semi-Supervised

AbstractSemiSupervisedLearner

Abstract base class for semi-supervised learning workflows.

StandardSemiSupervisedLearner

Implements a standard semi-supervised learning workflow.

Classifiers

AbstractLearner

Abstract base class for defining a learner interface.

LinearLearner

Implements a linear classifier for scoring.

LDALearner

Implements a Linear Discriminant Analysis (LDA) learner.

SVMLearner

Implements a Support Vector Linear Classification (SVM) learner.

XGBLearner

Implements an XGBoost-based learner for scoring.

HistGBCLearner

Implements a scikit-learn HistGradientBoostingClassifier-based learner for scoring.

Data Handling

Experiment

Encapsulates data operations for peak groups, decoys, and targets.

prepare_data_table

Prepares the input data table for scoring and analysis.

cleanup_and_check

Cleans up the input DataFrame and validates its structure.

check_for_unique_blocks

Checks if group IDs form unique blocks.

update_chosen_main_score_in_table

Updates the main score column in the feature table.

use_metabolomics_scores

Returns a list of metabolomics-specific score columns.