Semi-Supervised Scoring Documentation
This module provides the main tools for statistical scoring, error estimation, and hypothesis testing in targeted proteomics and glycoproteomics workflows. It includes modules for semi-supervised learning, feature scaling, classifier integration, and context-specific inference.
Submodules:
data_handling: Utilities for handling and processing data, including feature scaling, ranking, and validation.
classifiers: Implements various classifiers (e.g., LDA, SVM, XGBoost) for scoring.
semi_supervised: Implements semi-supervised learning workflows for iterative scoring.
runner: Defines workflows for running PyProphet, including learning and weight application.
pyprophet: Core functionality for orchestrating scoring and error estimation workflows.
Dependencies:
numpy
pandas
scikit-learn
xgboost
loguru
click
|
This module provides the main tools for statistical scoring, error estimation, and hypothesis testing in targeted proteomics and glycoproteomics workflows. |
Runner
Base class for running PyProphet workflows. |
|
Implements the learning and scoring workflow for PyProphet. |
|
Applies pre-trained weights to full/new datasets. |
PyProphet
Orchestrates the semi-supervised learning and scoring workflow. |
|
Handles scoring, error estimation, and hypothesis testing for experiments. |
Semi-Supervised
Abstract base class for semi-supervised learning workflows. |
|
Implements a standard semi-supervised learning workflow. |
Classifiers
Abstract base class for defining a learner interface. |
|
Implements a linear classifier for scoring. |
|
Implements a Linear Discriminant Analysis (LDA) learner. |
|
Implements a Support Vector Linear Classification (SVM) learner. |
|
Implements an XGBoost-based learner for scoring. |
|
Implements a scikit-learn HistGradientBoostingClassifier-based learner for scoring. |
Data Handling
Encapsulates data operations for peak groups, decoys, and targets. |
|
Prepares the input data table for scoring and analysis. |
|
Cleans up the input DataFrame and validates its structure. |
|
Checks if group IDs form unique blocks. |
|
Updates the main score column in the feature table. |
|
Returns a list of metabolomics-specific score columns. |