SCORE_ALIGNMENT Integration Workflow ==================================== Overview -------- This diagram illustrates how the SCORE_ALIGNMENT integration works to recover peaks with weak MS2 signals but good alignment scores. High-Level Workflow ------------------- .. code-block:: text ┌─────────────────────────────────────────────────────────────────────┐ │ PyProphet Export Command │ │ pyprophet export tsv --in data.osw --out results.tsv │ │ (use_alignment=True by default) │ └─────────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ 1. Configuration Check │ │ • use_alignment = True (default) │ │ • max_alignment_pep = 0.7 (default) │ └─────────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ 2. Auto-Detection Phase │ │ │ │ OSW Files: │ │ ├─ Check FEATURE_MS2_ALIGNMENT table exists? │ │ └─ Check SCORE_ALIGNMENT table exists? │ │ │ │ Parquet Files: │ │ └─ Check for {basename}_feature_alignment.parquet? │ │ │ │ Split Parquet Files: │ │ └─ Check for {infile}/feature_alignment.parquet? │ └─────────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────┴─────────────┐ │ │ ┌─────────▼──────────┐ ┌──────────▼─────────┐ │ Alignment Present │ │ Alignment Missing │ │ use_alignment=T │ │ use_alignment=T │ └─────────┬──────────┘ └──────────┬─────────┘ │ │ │ ↓ │ ┌────────────────────────┐ │ │ Standard Export Only │ │ │ (no alignment used) │ │ └────────────────────────┘ │ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ 3. Data Reading Phase │ │ │ │ Step A: Fetch Base Features (MS2 QVALUE filter) │ │ ┌───────────────────────────────────────────────────────────────┐ │ │ │ SELECT CAST(FEATURE.ID AS INTEGER) AS id, │ │ │ │ ... (other columns) │ │ │ │ FROM FEATURES │ │ │ │ WHERE SCORE_MS2.QVALUE < max_rs_peakgroup_qvalue (e.g., 0.05)│ │ │ │ → Base Features (passed MS2 threshold) │ │ │ │ → Mark with from_alignment=0 │ │ │ │ → CAST preserves precision for large feature IDs │ │ │ └───────────────────────────────────────────────────────────────┘ │ │ │ │ Step B: Fetch Aligned Features (Alignment PEP filter) │ │ ┌───────────────────────────────────────────────────────────────┐ │ │ │ SELECT DENSE_RANK() OVER (...) AS alignment_group_id, │ │ │ │ ALIGNED_FEATURE_ID AS id, │ │ │ │ CAST(REFERENCE_FEATURE_ID AS INTEGER) │ │ │ │ AS alignment_reference_feature_id, │ │ │ │ REFERENCE_RT AS alignment_reference_rt │ │ │ │ FROM FEATURE_MS2_ALIGNMENT │ │ │ │ JOIN SCORE_ALIGNMENT │ │ │ │ WHERE LABEL = 1 (target) │ │ │ │ AND SCORE_ALIGNMENT.PEP < max_alignment_pep (e.g., 0.7) │ │ │ │ AND REF FEATURE passes MS2 QVALUE threshold │ │ │ │ → Aligned Features (good alignment scores) │ │ │ │ → Includes alignment_group_id and reference info │ │ │ │ → CAST preserves precision for large feature IDs │ │ │ └───────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ 4. Feature Recovery Logic │ │ │ │ ┌─────────────────┐ ┌──────────────────┐ │ │ │ Base Features │ │ Aligned Features │ │ │ │ (MS2 passed) │ │ (Alignment good) │ │ │ │ IDs: 1,2,3,4,5 │ │ IDs: 3,4,6,7,8 │ │ │ └────────┬────────┘ └────────┬─────────┘ │ │ │ │ │ │ └──────────┬───────────────┘ │ │ ↓ │ │ ┌──────────────────────┐ │ │ │ Find NEW features: │ │ │ │ aligned - base │ │ │ │ = {6, 7, 8} │ │ │ └──────────┬───────────┘ │ │ ↓ │ │ ┌──────────────────────────────────────┐ │ │ │ Fetch full data for recovered │ │ │ │ features: 6, 7, 8 │ │ │ │ Mark: from_alignment=1 │ │ │ │ Add: alignment_pep │ │ │ │ Add: alignment_qvalue │ │ │ │ Add: alignment_group_id │ │ │ │ Add: alignment_reference_feature_id │ │ │ │ Add: alignment_reference_rt │ │ │ └──────────┬───────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────────┐ │ │ │ Assign alignment_group_id to │ │ │ │ reference features │ │ │ │ (features pointed to by aligned IDs) │ │ │ └──────────┬───────────────────────────┘ │ │ ↓ │ │ ┌──────────────────────┐ │ │ │ Combine: │ │ │ │ Base (1,2,3,4,5) + │ │ │ │ Recovered (6,7,8) │ │ │ │ = Final (1-8) │ │ │ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ 5. Export Results │ │ │ │ Final TSV/Matrix includes: │ │ • Original features (from_alignment=0) │ │ • Recovered features (from_alignment=1, with alignment scores) │ │ • More complete quantification with fewer missing values │ └─────────────────────────────────────────────────────────────────────┘ Detailed Component Workflow --------------------------- A. Reader Classes (OSW, Parquet, Split Parquet) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text ┌──────────────────────────────────────────────────────────────┐ │ Reader.__init__() │ │ │ │ OSWReader: │ │ N/A - checks at read time │ │ │ │ ParquetReader: │ │ self._has_alignment = _check_alignment_file_exists() │ │ • Checks: {basename}_feature_alignment.parquet │ │ │ │ SplitParquetReader: │ │ self._has_alignment = _check_alignment_file_exists() │ │ • Checks: {infile}/feature_alignment.parquet │ └──────────────────────────────────────────────────────────────┘ ↓ ┌──────────────────────────────────────────────────────────────┐ │ Reader.read() │ │ │ │ → _read_standard_data() │ │ if config.use_alignment AND alignment_present: │ │ → _fetch_alignment_features() │ │ → Merge with base features │ └──────────────────────────────────────────────────────────────┘ B. Alignment Detection Methods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text OSW Files (.osw): ┌─────────────────────────────────────────┐ │ _check_alignment_presence(con) │ │ │ │ return: │ │ check_sqlite_table( │ │ con, "FEATURE_MS2_ALIGNMENT" │ │ ) AND │ │ check_sqlite_table( │ │ con, "SCORE_ALIGNMENT" │ │ ) │ └─────────────────────────────────────────┘ Parquet Files (.parquet): ┌─────────────────────────────────────────┐ │ _check_alignment_file_exists() │ │ │ │ if infile.endswith('.parquet'): │ │ base = infile[:-8] │ │ alignment_file = │ │ f"{base}_feature_alignment.parquet"│ │ return os.path.exists(alignment_file)│ └─────────────────────────────────────────┘ Split Parquet Files (directory with .oswpq): ┌─────────────────────────────────────────┐ │ _check_alignment_file_exists() │ │ │ │ if os.path.isdir(infile): │ │ alignment_file = os.path.join( │ │ infile, "feature_alignment.parquet"│ │ ) │ │ return os.path.exists(alignment_file)│ └─────────────────────────────────────────┘ C. Feature Recovery Decision Tree ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text Start Export │ ↓ ┌──────────────────────┐ │ use_alignment=True? │ └──────────┬───────────┘ │ ┌─────────────┴─────────────┐ │ │ YES NO │ │ ↓ ↓ ┌──────────────┐ ┌──────────────┐ │ Alignment │ │ Standard │ │ data exists? │ │ Export Only │ └──────┬───────┘ └──────────────┘ │ ┌─────┴─────┐ │ │ YES NO │ │ ↓ ↓ ┌─────────┐ ┌─────────┐ │ Use │ │Standard │ │Alignment│ │Export │ └─────────┘ └─────────┘ │ │ └─────┬─────┘ ↓ Export Results Example Scenario ---------------- Before Alignment Integration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text Run 1: Feature detected with MS2 QVALUE = 0.02 ✓ (exported) Run 2: Feature detected with MS2 QVALUE = 0.08 ✗ (not exported - weak signal) Run 3: Feature detected with MS2 QVALUE = 0.03 ✓ (exported) Result: Missing quantification in Run 2 After Alignment Integration ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: text Run 1: Feature detected with MS2 QVALUE = 0.02 ✓ (exported, from_alignment=0) Run 2: Feature detected with MS2 QVALUE = 0.08 ✗ (weak MS2) BUT: Alignment PEP = 0.4 ✓ (good alignment!) → Recovered via alignment (exported, from_alignment=1) Run 3: Feature detected with MS2 QVALUE = 0.03 ✓ (exported, from_alignment=0) Result: Complete quantification across all runs File Structure Examples ----------------------- OSW Format ^^^^^^^^^^ .. code-block:: text data.osw (SQLite database) ├─ FEATURE_MS2_ALIGNMENT table └─ SCORE_ALIGNMENT table Parquet Format ^^^^^^^^^^^^^^ .. code-block:: text data.parquet ← Main file data_feature_alignment.parquet ← Alignment file Split Parquet Format ^^^^^^^^^^^^^^^^^^^^ .. code-block:: text experiment/ ├─ run1.oswpq/ │ ├─ precursors_features.parquet │ └─ transition_features.parquet ├─ run2.oswpq/ │ ├─ precursors_features.parquet │ └─ transition_features.parquet └─ feature_alignment.parquet ← Alignment file (parent level) Key Benefits ------------ 1. **Increased Coverage**: Recovers peaks with weak MS2 but good alignment 2. **Better Quantification**: Fewer missing values in matrices 3. **Quality Control**: Uses alignment PEP/QVALUE thresholds 4. **Backwards Compatible**: Disabled by default via auto-detection 5. **Transparent**: Features marked with ``from_alignment`` flag Configuration Options --------------------- .. code-block:: bash # Use default (enabled with auto-detection) pyprophet export tsv --in data.osw --out results.tsv # Customize threshold pyprophet export tsv --in data.osw --out results.tsv \ --max_alignment_pep 0.5 # Explicitly disable pyprophet export tsv --in data.osw --out results.tsv \ --no-use_alignment Output Columns -------------- Recovered features include additional columns: - ``from_alignment``: 0 (base) or 1 (recovered) - ``alignment_pep``: Alignment posterior error probability - ``alignment_qvalue``: Alignment q-value - ``alignment_group_id``: Group identifier linking aligned features together - ``alignment_reference_feature_id``: ID of the reference feature used for alignment - ``alignment_reference_rt``: Retention time of the reference feature These allow users to: - Identify which features were recovered - Assess alignment quality - Track which features are aligned together via ``alignment_group_id`` - Find the reference feature that was used for alignment - Filter or analyze separately if needed