BaseWriter
- class pyprophet.io._base.BaseWriter(config: BaseIOConfig)[source]
Bases:
ABCAbstract base class for implementing writers that save results to various output formats.
- __eq__(other)
Return self==value.
- __hash__ = None
- __init__(config: BaseIOConfig)[source]
Initialize the writer with a given configuration.
- Parameters:
config (BaseIOConfig) – Configuration object containing output details.
- __repr__()
Return repr(self).
- __weakref__
list of weak references to the object (if defined)
- _execute_copy_query(conn, query: str, path: str) None[source]
Execute COPY query with configured compression settings
- _get_columns_to_keep(existing_cols: list[str], score_prefix: str) list[str][source]
Get the columns to keep in the DataFrame by removing existing score columns. Mainly for Parquet files.
Note: this method itself does not remove the columns, it just returns a list of columns to keep that does not include the existing score columns.
- _get_parquet_row_count(con, target_file: str) int[source]
Get the row count of a Parquet file.
- Parameters:
con – DuckDB connection object
target_file – Path to the Parquet file
- Returns:
Row count of the Parquet file
- Return type:
int
- _prepare_score_dataframe(df: DataFrame, level: str, prefix: str) DataFrame[source]
Prepare the score DataFrame
- _save_bin_weights(weights)[source]
Save the model weights to a binary file.
- Parameters:
weights – Model weights or trained object.
- _save_tsv_weights(weights)[source]
Save the model weights to a TSV file, ensuring no duplicate levels.
If weights for the current level already exist, they are removed before saving the new ones.
- _summarize_gene_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]
Summarize to gene level using top N peptides.
- _summarize_peptide_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]
Summarize to peptide level using top N precursors.
- _summarize_precursor_level(data: DataFrame, _top_n: int, _consistent_top: bool) DataFrame[source]
Create precursor-level matrix (no summarization needed). Just select top peak group per precursor.
- _summarize_protein_level(data: DataFrame, top_n: int, consistent_top: bool) DataFrame[source]
Summarize to protein level using top N peptides.
- _validate_row_count_after_join(con, target_file: str, key_cols: str, join_on: str, prefix: str)[source]
Validates the row count after performing a join operation on a Parquet file.
This is important, because we would not expect the appending of scores to change the number of rows in the input Parquet file.
- Parameters:
con – DuckDB connection object
target_file – Path to the Parquet file
key_cols – The key columns for the join operation
join_on – The condition for the join operation
prefix – The prefix (table alias) used for the Parquet file in the query
- Raises:
RowCountMismatchError – If the row count of the resulting join doesn’t match the original row count.
- _write_levels_context_pdf_report(data, stat_table, pi0)[source]
Write a PDF report for levels context.
- _write_pdf_report(result, pi0)[source]
Write a PDF report if the scoring results contain final statistics.
- clean_and_export_library(data: DataFrame) DataFrame[source]
This function cleans the original dataframe and exports the library
- Parameters:
data – Input DataFrame with library data
- export_quant_matrix(data: DataFrame) DataFrame[source]
Export quantification matrix at specified level with optional normalization.
- Parameters:
data – Input DataFrame with quantification data
- export_results(data: DataFrame)[source]
Save the results to the output file based on the export format.
- Parameters:
data – DataFrame containing the data to be exported