ParquetReader

class pyprophet.io.export.parquet.ParquetReader(config: ExportIOConfig)[source]

Bases: BaseParquetReader

Class for reading and processing data from an OpenSWATH workflow parquet based file. Extended to support exporting functionality.

__init__(config: ExportIOConfig)[source]

Initialize the reader with a given configuration.

Parameters:

config (BaseIOConfig) – Configuration object containing input details, and module specific config for params for reading.

_add_peptide_data(data, con) DataFrame[source]

Add peptide-level error rate data.

_add_protein_data(data, con) DataFrame[source]

Add protein identifier data.

_add_protein_error_data(data, con) DataFrame[source]

Add protein-level error rate data.

_add_transition_data(data, con) DataFrame[source]

Add transition-level quantification data.

_augment_data(data, con) DataFrame[source]

Apply common data augmentations to the base dataset.

_build_feature_vars_sql() str[source]

Build SQL fragment for feature variables.

_check_alignment_file_exists() bool[source]

Check if alignment parquet file exists.

_fetch_alignment_features(con) DataFrame[source]

Fetch aligned features with good alignment scores from alignment parquet file.

This method checks for an alignment parquet file and retrieves features that have been aligned across runs and pass the alignment quality threshold. Only features whose reference feature passes the MS2 QVALUE threshold are included.

Parameters:

con – DuckDB connection

Returns:

DataFrame with aligned feature IDs that pass quality threshold

_get_ms1_score_info() tuple[str, str][source]

Get MS1 score information if available.

_is_unscored_file() bool[source]

Check if the file is unscored by verifying the presence of the ‘SCORE_’ columns.

_read_augmented_data(con) DataFrame[source]

Read standard data augmented with IPF information.

_read_for_export_scored_report(con) DataFrame[source]

Lightweight reader that returns the minimal scored-report columns from a Parquet file.

_read_peptidoform_data(con) DataFrame[source]

Read data with peptidoform IPF information.

_read_standard_data(con) DataFrame[source]

Read standard OpenSWATH data without IPF, optionally including aligned features.

_read_unscored_data(con) DataFrame[source]

Read unscored data from Parquet files.

export_feature_scores(outfile: str, plot_callback)[source]

Export feature scores from Parquet file for plotting.

Detects if SCORE columns exist and adjusts behavior: - If SCORE columns exist: applies RANK==1 filtering and plots SCORE + VAR_ columns - If SCORE columns don’t exist: plots only VAR_ columns

Parameters:
  • outfile (str) – Path to the output PDF file.

  • plot_callback (callable) – Function to call for plotting each level’s data. Signature: plot_callback(df, outfile, level, append)

read() DataFrame[source]

Main entry point for reading Parquet data.