Command Line Interface

pyprophet is the main command-line interface for PyProphet, with subcommands for scoring, IPF, levels context inference, and other utility functions.

pyprophet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.

Visit http://openswath.org for usage instructions and help.

pyprophet [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

--log-level <log_level>

Set global logging level.

Options:

TRACE | DEBUG | INFO | SUCCESS | WARNING | ERROR | CRITICAL

--log-colorize, --no-log-colorize

Turn on/off colorized logging output.

--helphelp

Show advanced help with all options.

Semi-Supervised Scoring of Peak-Groups

PyProphet provides a command-line interface for scoring peak-groups using the score subcommand. This provides a re-implementation of the original mProphet algorithm, which is a semi-supervised machine learning approach for scoring peak-groups in SRM mass spectrometry data.

pyprophet score

Conduct semi-supervised learning and error-rate estimation for MS1, MS2 and transition-level data.

Note

When using –classifier HistGradientBoosting, the OMP_NUM_THREADS environment variable controls OpenMP thread usage to avoid CPU oversubscription. The CLI will automatically set it if not already specified, but for best control and performance, set it explicitly before launching pyprophet:

For example, if your machine has 20 CPU threads and you want to use 3 threads for semi-supervised learning, set OMP_NUM_THREADS to 7 (ceil(20/3)):

Example

export OMP_NUM_THREADS=7
pyprophet score --in input.osw --classifier HistGradientBoosting --threads 3

# Or in one line (automatic setting):
pyprophet score --in input.osw --classifier HistGradientBoosting --threads 3
pyprophet score [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file. Valid formats are .osw, .parquet and .tsv.

--out <outfile>

PyProphet output file. Valid formats are .osw, .parquet and .tsv. Must be the same format as input file.

--subsample_ratio <subsample_ratio>

Subsampling ratio for large data. Use <1.0 to subsample precursors for semi-supervised learning, the learned weights will then be applied to the full data set. When set to 1.0 (default) and the input has >20 runs, auto-subsampling to 1/N is applied (N=number of runs). Set to -1.0 to disable auto-subsampling and use full data.

Default:

1.0

--classifier <classifier>

Either a “LDA”, “SVM”, “XGBoost” or “HistGradientBoosting” classifier is used for semi-supervised learning.

Default:

'LDA'

Options:

LDA | SVM | XGBoost | HistGradientBoosting

--apply_weights <apply_weights>

Apply PyProphet score weights file (.csv/.bin) instead of semi-supervised learning.

--xeval_num_iter <xeval_num_iter>

Number of iterations for cross-validation of semi-supervised learning step.

Default:

10

--ss_num_iter <ss_num_iter>

Number of iterations for semi-supervised learning step.

Default:

10

--ss_scale_features, --no-ss_scale_features

Scale / standardize features to unit variance before semi-supervised learning.

Default:

False

--parametric, --no-parametric

Do parametric estimation of p-values.

Default:

False

--level <level>

Either “ms1”, “ms2”, “ms1ms2”, “transition”, or “alignment”; the data level selected for scoring. “ms1ms2 integrates both MS1- and MS2-level scores and can be used instead of “ms2”-level results.”

Default:

'ms2'

Options:

ms1 | ms2 | ms1ms2 | transition | alignment

--threads <threads>

Number of threads used for semi-supervised learning. -1 means all available CPUs.

Default:

1

--profile

Enable memory allocation tracking and profiling. Requires memrary to be installed.

The score command has several advanced options that can be seen using the --helphelp flag.

Inference of Peptidoforms

For PTM analyses, PyProphet provides the infer peptidoform subcommand. This command allows you to perform inference of peptidoforms, for site-localization of peptidoforms in large-scale DIA experiments.

Refer to Rosenberger, G. et. al. (2017) to learn more about the inference of peptidoforms workflow.

pyprophet infer peptidoform

Infer peptidoforms after scoring of MS1, MS2 and transition-level data.

pyprophet infer peptidoform [OPTIONS]

Options

--in <infile>

Required PyProphet input file. Valid formats are .osw, .parquet (produced by export parquet)

--out <outfile>

PyProphet output file. Valid formats are .osw, .parquet. Must be the same format as input file.

--ipf_ms1_scoring, --no-ipf_ms1_scoring

Use MS1 precursor data for IPF.

Default:

True

--ipf_ms2_scoring, --no-ipf_ms2_scoring

Use MS2 precursor data for IPF.

Default:

True

--ipf_h0, --no-ipf_h0

Include possibility that peak groups are not covered by peptidoform space.

Default:

True

--ipf_grouped_fdr, --no-ipf_grouped_fdr

[Experimental] Compute grouped FDR instead of pooled FDR to better support data where peak groups are evaluated to originate from very heterogeneous numbers of peptidoforms.

Default:

False

--ipf_max_precursor_pep <ipf_max_precursor_pep>

Maximum PEP to consider scored precursors in IPF.

Default:

0.7

--ipf_max_peakgroup_pep <ipf_max_peakgroup_pep>

Maximum PEP to consider scored peak groups in IPF.

Default:

0.7

--ipf_max_precursor_peakgroup_pep <ipf_max_precursor_peakgroup_pep>

Maximum BHM layer 1 integrated precursor peakgroup PEP to consider in IPF.

Default:

0.4

--ipf_max_transition_pep <ipf_max_transition_pep>

Maximum PEP to consider scored transitions in IPF.

Default:

0.6

--propagate_signal_across_runs, --no-propagate_signal_across_runs

Propagate signal across runs (requires running alignment).

Default:

False

--ipf_max_alignment_pep <ipf_max_alignment_pep>

Maximum PEP to consider for good alignments.

Default:

1.0

--across_run_confidence_threshold <across_run_confidence_threshold>

Maximum PEP to consider for propagating signal across runs for aligned features.

Default:

0.5

For glycoform inference, you can use the infer glycoform subcommand, which is specifically designed for glycopeptide analyses.

pyprophet infer glycoform

Infer glycoforms after scoring of MS1, MS2 and transition-level data.

pyprophet infer glycoform [OPTIONS]

Options

--in <infile>

Required Input file.

--out <outfile>

Output file.

--ms1_precursor_scoring, --no-ms1_precursor_scoring

Use MS1 precursor data for glycoform inference.

Default:

True

--ms2_precursor_scoring, --no-ms2_precursor_scoring

Use MS2 precursor data for glycoform inference.

Default:

True

--grouped_fdr, --no-grouped_fdr

[Experimental] Compute grouped FDR instead of pooled FDR to better support data where peak groups are evaluated to originate from very heterogeneous numbers of glycoforms.

Default:

False

--max_precursor_pep <max_precursor_pep>

Maximum PEP to consider scored precursors.

Default:

1

--max_peakgroup_pep <max_peakgroup_pep>

Maximum PEP to consider scored peak groups.

Default:

0.7

--max_precursor_peakgroup_pep <max_precursor_peakgroup_pep>

Maximum BHM layer 1 integrated precursor peakgroup PEP to consider.

Default:

1

--max_transition_pep <max_transition_pep>

Maximum PEP to consider scored transitions.

Default:

0.6

--use_glycan_composition, --use_glycan_struct

Compute glycoform-level FDR based on glycan composition or struct.

Default:

True

--ms1_mz_window <ms1_mz_window>

MS1 m/z window in Thomson or ppm.

Default:

10

--ms1_mz_window_unit <ms1_mz_window_unit>

MS1 m/z window unit.

Default:

'ppm'

Options:

ppm | Da | Th

--propagate_signal_across_runs, --no-propagate_signal_across_runs

Propagate signal across runs (requires running alignment).

Default:

False

--max_alignment_pep <max_alignment_pep>

Maximum PEP to consider for good alignments.

Default:

1.0

--across_run_confidence_threshold <across_run_confidence_threshold>

Maximum PEP to consider for propagating signal across runs for aligned features.

Default:

0.5

Refer to Yang, Y. et. al. (2021) to learn more about the glycoform inference workflow.

Peptide / Protein / Gene Inference

To perform inference at different levels of biological context and different experimental contexts (global, experiment-wide and run-specific), PyProphet provides the infer subcommand. This command allows you to infer peptide, glycopeptide, protein, and gene levels from your data.

Refer to Rosenberger, G. et. al. (2017) to learn more about the levels context inference.

For more information about glycopeptide inference, refer to Yang, Y.. et. al. (2021).

pyprophet infer peptide

Infer peptides and conduct error-rate estimation in different contexts.

pyprophet infer peptide [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file. Valid formats are .osw, .parquet (produced by export parquet)

--out <outfile>

PyProphet output file. Valid formats are .osw, .parquet. Must be the same format as input file.

--context <context>

Context to estimate peptide-level FDR control.

Default:

'run-specific'

Options:

run-specific | experiment-wide | global

--parametric, --no-parametric

Do parametric estimation of p-values.

Default:

False

--color_palette <color_palette>

Color palette to use in reports.

Default:

'normal'

Options:

normal | protan | deutran | tritan

The peptide command accepts a helphelp argument to display its advanced options that are not shown here.

pyprophet infer glycopeptide

Infer glycopeptides and conduct error-rate estimation in different contexts.

pyprophet infer glycopeptide [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required Input file.

--out <outfile>

Output file.

--context <context>

Context to estimate glycopeptide-level FDR control.

Default:

'run-specific'

Options:

run-specific | experiment-wide | global

--density_estimator <density_estimator>

Either kernel density estimation (“kde”) or Gaussian mixture model (“gmm”) is used for score density estimation.

Default:

'gmm'

Options:

kde | gmm

--grid_size <grid_size>

Number of d-score cutoffs to build grid coordinates for local FDR calculation.

Default:

256

--parametric, --no-parametric

Do parametric estimation of p-values.

Default:

False

pyprophet infer protein

Infer proteins and conduct error-rate estimation in different contexts.

pyprophet infer protein [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file. Valid formats are .osw, .parquet (produced by export parquet)

--out <outfile>

PyProphet output file. Valid formats are .osw, .parquet. Must be the same format as input file.

--context <context>

Context to estimate protein-level FDR control.

Default:

'run-specific'

Options:

run-specific | experiment-wide | global

--parametric, --no-parametric

Do parametric estimation of p-values.

Default:

False

--color_palette <color_palette>

Color palette to use in reports.

Default:

'normal'

Options:

normal | protan | deutran | tritan

The protein command accepts a helphelp argument to display its advanced options that are not shown here.

pyprophet infer gene

Infer genes and conduct error-rate estimation in different contexts.

pyprophet infer gene [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file. Valid formats are .osw, .parquet (produced by export parquet)

--out <outfile>

PyProphet output file. Valid formats are .osw, .parquet. Must be the same format as input file.

--context <context>

Context to estimate gene-level FDR control.

Default:

'run-specific'

Options:

run-specific | experiment-wide | global

--parametric, --no-parametric

Do parametric estimation of p-values.

Default:

False

--color_palette <color_palette>

Color palette to use in reports.

Default:

'normal'

Options:

normal | protan | deutran | tritan

The gene command accepts a helphelp argument to display its advanced options that are not shown here.

Exporters

PyProphet provides several export utilities to export between different file formats for OpenSwath’s (.osw / *.sqMass*sqlite-based formats) and experimental parquet formats, as well as exporting PDF reports of the data.

TSV Results (Proteomics)

To export results from a post-scoring workflow (using the .osw input workflow) to a tab-separated values (TSV) file, you can use the export tsv subcommand. This is useful for exporting results in a format that can be easily read and processed by other tools or scripts.

pyprophet export tsv

Export Proteomics/Peptidoform TSV/CSV tables

pyprophet export tsv [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file.

--out <outfile>

Output TSV/CSV (legacy_split, legacy_merged) file.

--format <format>

Export format, either legacy_split/legacy_merged (mProphet/PyProphet).

Default:

'legacy_merged'

Options:

legacy_split | legacy_merged

--csv, --no-csv

Export CSV instead of TSV file.

Default:

False

--transition_quantification, --no-transition_quantification

[format: legacy] Report aggregated transition-level quantification.

Default:

True

--max_transition_pep <max_transition_pep>

[format: legacy] Maximum PEP to retain scored transitions for quantification (requires transition-level scoring).

Default:

0.7

--ipf <ipf>

[format: matrix/legacy] Should IPF results be reported if present? “peptidoform”: Report results on peptidoform-level, “augmented”: Augment OpenSWATH results with IPF scores, “disable”: Ignore IPF results

Default:

'peptidoform'

Options:

peptidoform | augmented | disable

--ipf_max_peptidoform_pep <ipf_max_peptidoform_pep>

[format: matrix/legacy] IPF: Filter results to maximum run-specific peptidoform-level PEP.

Default:

0.4

--max_rs_peakgroup_qvalue <max_rs_peakgroup_qvalue>

[format: matrix/legacy] Filter results to maximum run-specific peak group-level q-value.

Default:

0.05

--peptide, --no-peptide

Append peptide-level error-rate estimates if available.

Default:

True

--max_global_peptide_qvalue <max_global_peptide_qvalue>

[format: matrix/legacy] Filter results to maximum global peptide-level q-value.

Default:

0.01

--protein, --no-protein

Append protein-level error-rate estimates if available.

Default:

True

--max_global_protein_qvalue <max_global_protein_qvalue>

[format: matrix/legacy] Filter results to maximum global protein-level q-value.

Default:

0.01

--use_alignment, --no-use_alignment

Use alignment results to recover peaks with good alignment scores if alignment data is present in the input file.

Default:

True

--max_alignment_pep <max_alignment_pep>

[format: matrix/legacy] Maximum PEP to consider for good alignments when use_alignment is enabled.

Default:

0.7

--exclude-decoys, --no-exclude-decoys

Exclude decoy entries from the exported results. Use –no-exclude-decoys to retain decoys.

Default:

True

TSV Results (Small Molecules)

This is similar to the TSV export for proteomics, but specifically designed for small molecule data. It allows you to export results in a tab-separated values (TSV) format, which can be easily read and processed by other tools or scripts.

pyprophet export compound

Export Compound TSV/CSV tables

pyprophet export compound [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file.

--out <outfile>

Output TSV/CSV (matrix, legacy_merged) file.

--format <format>

Export format, either matrix, legacy_merged (PyProphet) or score_plots format.

Default:

'legacy_merged'

Options:

matrix | legacy_merged

--csv, --no-csv

Export CSV instead of TSV file.

Default:

False

--max_rs_peakgroup_qvalue <max_rs_peakgroup_qvalue>

[format: matrix/legacy] Filter results to maximum run-specific peak group-level q-value.

Default:

0.05

TSV Results (Glycoform)

This is similar to the TSV export for proteomics, but specifically designed for glycoform data. It allows you to export results in a tab-separated values (TSV) format, which can be easily read and processed by other tools or scripts.

pyprophet export glyco

Export Gylcoform TSV/CSV tables

pyprophet export glyco [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file.

--out <outfile>

Output TSV/CSV (matrix, legacy_split, legacy_merged) file.

--format <format>

Export format, either matrix, legacy_split/legacy_merged (mProphet/PyProphet) format.

Default:

'legacy_split'

Options:

matrix | legacy_split | legacy_merged

--csv, --no-csv

Export CSV instead of TSV file.

Default:

False

--transition_quantification, --no-transition_quantification

[format: legacy] Report aggregated transition-level quantification.

Default:

True

--max_transition_pep <max_transition_pep>

[format: legacy] Maximum PEP to retain scored transitions for quantification (requires transition-level scoring).

Default:

0.7

--max_rs_peakgroup_qvalue <max_rs_peakgroup_qvalue>

[format: matrix/legacy] Filter results to maximum run-specific peak group-level q-value.

Default:

0.05

--glycoform_match_precursor <glycoform_match_precursor>

[format: matrix/legacy] Export glycoform results with glycan matched with precursor-level results.

Default:

'glycan_composition'

Options:

exact | glycan_composition | none

--max_glycoform_pep <max_glycoform_pep>

[format: matrix/legacy] Filter results to maximum glycoform PEP.

Default:

1

--max_glycoform_qvalue <max_glycoform_qvalue>

[format: matrix/legacy] Filter results to maximum glycoform q-value.

Default:

0.05

--glycopeptide, --no-glycopeptide

Append glycopeptide-level error-rate estimates if available.

Default:

True

--max_global_glycopeptide_qvalue <max_global_glycopeptide_qvalue>

[format: matrix/legacy] Filter results to maximum global glycopeptide-level q-value.

Default:

0.01

TSV Quantification Matrices (Proteomics)

To export quantification matrices from a post-scoring workflow to a tab-separated values (TSV) file, you can use the export matrix subcommand. This is useful for exporting quantification data in a format that can be easily read and processed by other tools or scripts.

pyprophet export matrix

Export Proteomics/Peptidoform Quantification Matrix

pyprophet export matrix [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file.

--out <outfile>

Output TSV/CSV file.

--level <level>

Export quantification level, either precursor, peptide, protein, or gene.

Default:

'peptide'

Options:

precursor | peptide | protein | gene

--csv, --no-csv

Export CSV instead of TSV file.

Default:

False

--transition_quantification, --no-transition_quantification

[format: legacy] Report aggregated transition-level quantification.

Default:

True

--max_transition_pep <max_transition_pep>

[format: legacy] Maximum PEP to retain scored transitions for quantification (requires transition-level scoring).

Default:

0.7

--ipf <ipf>

[format: matrix/legacy] Should IPF results be reported if present? “peptidoform”: Report results on peptidoform-level, “augmented”: Augment OpenSWATH results with IPF scores, “disable”: Ignore IPF results

Default:

'peptidoform'

Options:

peptidoform | augmented | disable

--ipf_max_peptidoform_pep <ipf_max_peptidoform_pep>

[format: matrix/legacy] IPF: Filter results to maximum run-specific peptidoform-level PEP.

Default:

0.4

--max_rs_peakgroup_qvalue <max_rs_peakgroup_qvalue>

[format: matrix/legacy] Filter results to maximum run-specific peak group-level q-value.

Default:

0.05

--max_global_peptide_qvalue <max_global_peptide_qvalue>

[format: matrix/legacy] Filter results to maximum global peptide-level q-value.

Default:

0.01

--max_global_protein_qvalue <max_global_protein_qvalue>

[format: matrix/legacy] Filter results to maximum global protein-level q-value.

Default:

0.01

--use_alignment, --no-use_alignment

Use alignment results to recover peaks with good alignment scores if alignment data is present in the input file.

Default:

True

--max_alignment_pep <max_alignment_pep>

[format: matrix/legacy] Maximum PEP to consider for good alignments when use_alignment is enabled.

Default:

0.7

--top_n <top_n>

[format: matrix/legacy] Number of top intense features to use for summarization

Default:

3

--consistent_top, --no-consistent_top

[format: matrix/legacy] Whether to use same top features across all runs

Default:

True

--normalization <normalization>

[format: matrix/legacy] Normalization method to apply to the quantification matrix.

Default:

'none'

Options:

none | median | medianmedian | quantile

--exclude-decoys, --no-exclude-decoys

Exclude decoy entries from the exported matrix. Use –no-exclude-decoys to retain decoys.

Default:

True

Convert OSW to Parquet

To convert OpenSwath’s .osw / .sqMass format to a parquet format, you can use the export parquet subcommand. This is useful for converting results from the .osw / .sqMass format to a more efficient and space saving data storage format. This subcommand has the option to convert the entire .osw file to a snigle parquet file (with both precursor and transition data) or to split the parquet file into a separate precursors_features.parquet file and a transition_features.parquet file. There is the option to further split by run, which is useful for large datasets.

pyprophet export parquet

Export OSW or sqMass to parquet format

pyprophet export parquet [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet OSW or sqMass input file.

--out <outfile>

Output parquet file.

--pqpfile <pqpfile>

PyProphet PQP file. Only required when converting sqMass to parquet.

--transitionLevel

Whether to export transition level data as well

--onlyFeatures

Only include precursors that have a corresponding feature

--noDecoys

Do not include decoys in the exported data

--split_transition_data, --no-split_transition_data

Split transition data into a separate parquet (default: True).

Default:

False

--split_runs, --no-split_runs

Split runs into separate parquet files/directories (default: False).

Default:

False

--compression <compression>

Compression algorithm to use for parquet file.

Default:

'zstd'

Options:

lz4 | uncompressed | snappy | gzip | lzo | brotli | zstd

--compression_level <compression_level>

Compression level to use for parquet file.

Default:

11

--include_transition_data, --no-include_transition_data

Include transition data in the exported parquet file(s). When disabled, only precursor-level data is exported.

Default:

True

Export Feature Score Plots

To export the distribution of feature scores (VAR_ columns) and, if available, scorer scores (SCORE columns), you can use the export feature-scores subcommand. This command works with all file formats (OSW, Parquet, and Split Parquet):

  • For unscored files: Plots only VAR_ columns (feature variables)

  • For scored files: Applies RANK==1 filtering and plots both SCORE and VAR_ columns

This is useful for investigating the distribution and quality of scores for target-decoy separation.

pyprophet export feature-scores

Export feature score plots from a PyProphet input file.

Creates plots showing the distribution of feature scores (var_* columns) at different levels (ms1, ms2, transition, alignment) colored by target/decoy status. Works with OSW, Parquet, and Split Parquet files (scored or unscored).

pyprophet export feature-scores [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet input file (OSW, Parquet, or Split Parquet directory).

--out <outfile>

Output PDF file. If not provided, will be auto-generated based on input filename.

Export Score Plots (Deprecated)

Deprecated since version 3.1: Use pyprophet export feature-scores instead.

The export score-plots command is deprecated and will be removed in a future version. It has been replaced by the more flexible export feature-scores command which works with all file formats.

pyprophet export score-plots

Export score plots (DEPRECATED - use ‘feature-scores’ instead)

This command is deprecated. Please use ‘pyprophet export feature-scores’ instead.

pyprophet export score-plots [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet OSW input file.

--glycoform, --no-glycoform

Export glycoform score plots.

Default:

False

Export Results Report

To export a PDF report of the results, you can use the export score-report subcommand. This is useful for generating a report that summarizes the results of your analysis, including scores and identifications, and other relevant information.

pyprophet export score-report

Export report with scored results from a PyProphet input file.

pyprophet export score-report [OPTIONS]

Options

--helphelp

Show advanced help with all options.

--in <infile>

Required PyProphet OSW input file.

Merge files

PyProphet provides a command to merge multiple files into a single file. This is useful for combining results from different analyses or runs into a single file for further processing or analysis.

Merge OSW Files

To merge multiple OSW files into a single OSW file, you can use the merge osw subcommand.

pyprophet merge osw

Merge multiple OSW files and (for large experiments, it is recommended to subsample first).

pyprophet merge osw [OPTIONS] [INFILES]...

Options

--helphelp

Show advanced help with all options.

--out <outfile>

Required Merged OSW output file.

--same_run, --no-same_run

Assume input files are from same run (deletes run information).

--template <templatefile>

Required Template OSW file.

--merged_post_scored_runs

Merge OSW output files that have already been scored.

Arguments

INFILES

Optional argument(s)

Merge Parquet Files

To merge multiple Parquet files into a single Parquet file, you can use the merge parquet subcommand.

pyprophet merge parquet

Merge multiple parquet files.

pyprophet merge parquet [OPTIONS] [INFILES]...

Options

--helphelp

Show advanced help with all options.

--out <outfile>

Required Merged parquet output file.

--merge_transitions, --no-merge_transitions

If the input is of type split_parquet / split_parquet_multi, merge the separate transition files into a single file as well.

Arguments

INFILES

Optional argument(s)