BaseWriter

class pyprophet.io._base.BaseWriter(config: BaseIOConfig)[source]

Bases: ABC

Abstract base class for implementing writers that save results to various output formats.

__eq__(other)

Return self==value.

__hash__ = None
__init__(config: BaseIOConfig)[source]

Initialize the writer with a given configuration.

Parameters:

config (BaseIOConfig) – Configuration object containing output details.

__post_init__()[source]

Post-initialization method to set up variables for IO specific config

__repr__()

Return repr(self).

__weakref__

list of weak references to the object (if defined)

_execute_copy_query(conn, query: str, path: str) None[source]

Execute COPY query with configured compression settings

_get_columns_to_keep(existing_cols: list[str], score_prefix: str) list[str][source]

Get the columns to keep in the DataFrame by removing existing score columns. Mainly for Parquet files.

Note: this method itself does not remove the columns, it just returns a list of columns to keep that does not include the existing score columns.

_get_parquet_row_count(con, target_file: str) int[source]

Get the row count of a Parquet file.

Parameters:
  • con – DuckDB connection object

  • target_file – Path to the Parquet file

Returns:

Row count of the Parquet file

Return type:

int

_median_median_normalize(matrix: DataFrame) DataFrame[source]

Median of medians normalization

_median_normalize(matrix: DataFrame) DataFrame[source]

Median normalization (per sample)

_prepare_score_dataframe(df: DataFrame, level: str, prefix: str) DataFrame[source]

Prepare the score DataFrame

_quantile_normalize(matrix: DataFrame) DataFrame[source]

Quantile normalization

_save_bin_weights(weights)[source]

Save the model weights to a binary file.

Parameters:

weights – Model weights or trained object.

_save_tsv_weights(weights)[source]

Save the model weights to a TSV file, ensuring no duplicate levels.

If weights for the current level already exist, they are removed before saving the new ones.

_summarize_gene_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]

Summarize to gene level using top N peptides.

_summarize_peptide_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]

Summarize to peptide level using top N precursors.

_summarize_precursor_level(data: DataFrame, _top_n: int, _consistent_top: bool) DataFrame[source]

Create precursor-level matrix (no summarization needed). Just select top peak group per precursor.

_summarize_protein_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]

Summarize to protein level using top N peptides.

_validate_row_count_after_join(con, target_file: str, key_cols: str, join_on: str, prefix: str)[source]

Validates the row count after performing a join operation on a Parquet file.

This is important, because we would not expect the appending of scores to change the number of rows in the input Parquet file.

Parameters:
  • con – DuckDB connection object

  • target_file – Path to the Parquet file

  • key_cols – The key columns for the join operation

  • join_on – The condition for the join operation

  • prefix – The prefix (table alias) used for the Parquet file in the query

Raises:

RowCountMismatchError – If the row count of the resulting join doesn’t match the original row count.

_write_levels_context_pdf_report(data, stat_table, pi0)[source]

Write a PDF report for levels context.

_write_pdf_report(result, pi0)[source]

Write a PDF report if the scoring results contain final statistics.

clean_and_export_library(data: DataFrame) DataFrame[source]

This function cleans the original dataframe and exports the library

Parameters:

data – Input DataFrame with library data

export_quant_matrix(data: DataFrame) DataFrame[source]

Export quantification matrix at specified level with optional normalization.

Parameters:

data – Input DataFrame with quantification data

export_results(data: DataFrame)[source]

Save the results to the output file based on the export format.

Parameters:

data – DataFrame containing the data to be exported

abstract save_results(result, pi0)[source]

Abstract method to save scoring results and statistical outputs.

Parameters:
  • result – The result object containing scoring tables.

  • pi0 – Estimated pi0 value from FDR statistics.

save_weights(weights)[source]

Abstract method to save model weights (e.g., LDA coefficients, XGBoost model).

Parameters:

weights – Model weights or trained object.