SplitParquetReader

class pyprophet.io.scoring.split_parquet.SplitParquetReader(config: RunnerIOConfig)[source]

Bases: BaseSplitParquetReader

Class for reading and processing data from OpenSWATH results stored in a directoy containing split Parquet files.

The ParquetReader class provides methods to read different levels of data from the split parquet files and process it accordingly. It supports reading data for semi-supervised learning, IPF analysis, context level analysis.

This assumes that the input infile path is a directory containing the following files: - precursors_features.parquet - transition_features.parquet - feature_alignment.parquet (optional)

infile

Input file path.

Type:

str

outfile

Output file path.

Type:

str

classifier

Classifier used for semi-supervised learning.

Type:

str

level

Level used in semi-supervised learning (e.g., ‘ms1’, ‘ms2’, ‘ms1ms2’, ‘transition’, ‘alignment’), or context level used peptide/protein/gene inference (e.g., ‘global’, ‘experiment-wide’, ‘run-specific’).

Type:

str

glyco

Flag indicating whether analysis is glycoform-specific.

Type:

bool

read()[source]

Read data from the input file based on the alogorithm.

__init__(config: RunnerIOConfig)[source]

Initialize the reader with a given configuration.

Parameters:

config (BaseIOConfig) – Configuration object containing input details, and module specific config for params for reading.

read() DataFrame[source]

Reads and processes data from a DuckDB connection to generate a final feature table based on the specified level and main score.

Returns:

Final feature table with the specified main score.

Return type:

pd.DataFrame