SplitParquetReader
- class pyprophet.io.export.split_parquet.SplitParquetReader(config: ExportIOConfig)[source]
Bases:
BaseSplitParquetReaderClass for reading and processing data from an OpenSWATH workflow parquet split based file. Extended to support exporting functionality.
- __init__(config: ExportIOConfig)[source]
Initialize the reader with a given configuration.
- Parameters:
config (BaseIOConfig) – Configuration object containing input details, and module specific config for params for reading.
- _add_protein_error_data(data, con) DataFrame[source]
Add protein-level error rate data from split files.
- _add_transition_data(data, con) DataFrame[source]
Add transition-level quantification data from split files.
- _check_alignment_file_exists() bool[source]
Check if alignment parquet file exists for split parquet format.
For split parquet, alignment file is at the parent directory level: - infile is a directory containing *.oswpq subdirectories - alignment file is at infile/feature_alignment.parquet
- _fetch_alignment_features(con) DataFrame[source]
Fetch aligned features with good alignment scores from alignment parquet file.
This method checks for an alignment parquet file and retrieves features that have been aligned across runs and pass the alignment quality threshold. Only features whose reference feature passes the MS2 QVALUE threshold are included.
- Parameters:
con – DuckDB connection
- Returns:
DataFrame with aligned feature IDs that pass quality threshold
- _has_peptide_protein_global_scores() bool[source]
Check if files contain peptide and protein global scores
- _is_unscored_file() bool[source]
Check if the files are unscored by verifying the presence of the ‘SCORE_’ columns.
- _read_augmented_data(con) DataFrame[source]
Read standard data augmented with IPF information from split files.
- _read_for_export_scored_report(con) DataFrame[source]
Lightweight reader that returns the minimal scored-report columns from split Parquet files.
- _read_library_data(con) DataFrame[source]
Read data specifically for precursors for library generation. This does not include all output in standard output
- _read_peptidoform_data(con) DataFrame[source]
Read data with peptidoform IPF information from split files.
- _read_standard_data(con) DataFrame[source]
Read standard OpenSWATH data without IPF from split files, optionally including aligned features.
- export_feature_scores(outfile: str, plot_callback)[source]
Export feature scores from split Parquet directory for plotting.
Detects if SCORE columns exist and adjusts behavior: - If SCORE columns exist: applies RANK==1 filtering and plots SCORE + VAR_ columns - If SCORE columns don’t exist: plots only VAR_ columns
- Parameters:
outfile (str) – Path to the output PDF file.
plot_callback (callable) – Function to call for plotting each level’s data. Signature: plot_callback(df, outfile, level, append)