BaseWriter

class pyprophet.io._base.BaseWriter(config: BaseIOConfig)[source]

Bases: ABC

Abstract base class for implementing writers that save results to various output formats.

__eq__(other): Return self==value.

__hash__ = None

__init__(config: BaseIOConfig)[source]

Initialize the writer with a given configuration.

Parameters:: config (BaseIOConfig) – Configuration object containing output details.

__post_init__()[source]: Post-initialization method to set up variables for IO specific config

__repr__(): Return repr(self).

__weakref__: list of weak references to the object (if defined)

_execute_copy_query(conn, query: str, path: str) → None[source]: Execute COPY query with configured compression settings

_get_columns_to_keep(existing_cols: list[str], score_prefix: str) → list[str][source]

Get the columns to keep in the DataFrame by removing existing score columns. Mainly for Parquet files.

Note: this method itself does not remove the columns, it just returns a list of columns to keep that does not include the existing score columns.

_get_parquet_row_count(con, target_file: str) → int[source]

Get the row count of a Parquet file.

Parameters:

con – DuckDB connection object
target_file – Path to the Parquet file

Returns:

Row count of the Parquet file

Return type:

int

_median_median_normalize(matrix: DataFrame) → DataFrame[source]: Median of medians normalization

_median_normalize(matrix: DataFrame) → DataFrame[source]: Median normalization (per sample)

_prepare_score_dataframe(df: DataFrame, level: str, prefix: str) → DataFrame[source]: Prepare the score DataFrame

_quantile_normalize(matrix: DataFrame) → DataFrame[source]: Quantile normalization

_save_bin_weights(weights)[source]

Save the model weights to a binary file.

Parameters:: weights – Model weights or trained object.

_save_tsv_weights(weights)[source]

Save the model weights to a TSV file, ensuring no duplicate levels.

If weights for the current level already exist, they are removed before saving the new ones.

_summarize_gene_level(data: DataFrame, top_n: int, consistent_top: bool) → DataFrame[source]: Summarize to gene level using top N peptides.

_summarize_peptide_level(data: DataFrame, top_n: int, consistent_top: bool) → DataFrame[source]: Summarize to peptide level using top N precursors.

_summarize_precursor_level(data: DataFrame, _top_n: int, _consistent_top: bool) → DataFrame[source]: Create precursor-level matrix (no summarization needed). Just select top peak group per precursor.

_summarize_protein_level(data: DataFrame, top_n: int, consistent_top: bool) → DataFrame[source]: Summarize to protein level using top N peptides.

_validate_row_count_after_join(con, target_file: str, key_cols: str, join_on: str, prefix: str)[source]

Validates the row count after performing a join operation on a Parquet file.

This is important, because we would not expect the appending of scores to change the number of rows in the input Parquet file.

Parameters:

con – DuckDB connection object
target_file – Path to the Parquet file
key_cols – The key columns for the join operation
join_on – The condition for the join operation
prefix – The prefix (table alias) used for the Parquet file in the query

Raises:

RowCountMismatchError – If the row count of the resulting join doesn’t match the original row count.

_write_levels_context_pdf_report(data, stat_table, pi0)[source]: Write a PDF report for levels context.

_write_pdf_report(result, pi0)[source]: Write a PDF report if the scoring results contain final statistics.

clean_and_export_library(data: DataFrame) → DataFrame[source]

This function cleans the original dataframe and exports the library

Parameters:: data – Input DataFrame with library data

export_quant_matrix(data: DataFrame) → DataFrame[source]

Export quantification matrix at specified level with optional normalization.

Parameters:: data – Input DataFrame with quantification data

export_results(data: DataFrame)[source]

Save the results to the output file based on the export format.

Parameters:: data – DataFrame containing the data to be exported

abstract save_results(result, pi0)[source]

Abstract method to save scoring results and statistical outputs.

Parameters:

result – The result object containing scoring tables.
pi0 – Estimated pi0 value from FDR statistics.

save_scorer(scorer)[source]

Persist a scorer object when the backend supports it.

The default implementation is a no-op.

save_weights(weights)[source]

Abstract method to save model weights (e.g., LDA coefficients, XGBoost model).

Parameters:: weights – Model weights or trained object.