SCORE_ALIGNMENT Integration Workflow

Overview

This diagram illustrates how the SCORE_ALIGNMENT integration works to recover peaks with weak MS2 signals but good alignment scores.

High-Level Workflow

┌─────────────────────────────────────────────────────────────────────┐
│                    PyProphet Export Command                          │
│  pyprophet export tsv --in data.osw --out results.tsv               │
│  (use_alignment=True by default)                                    │
└─────────────────────────────────────────────────────────────────────┘
                                  ↓
┌─────────────────────────────────────────────────────────────────────┐
│                    1. Configuration Check                            │
│  • use_alignment = True (default)                                   │
│  • max_alignment_pep = 0.7 (default)                                │
└─────────────────────────────────────────────────────────────────────┘
                                  ↓
┌─────────────────────────────────────────────────────────────────────┐
│                    2. Auto-Detection Phase                           │
│                                                                      │
│  OSW Files:                                                          │
│  ├─ Check FEATURE_MS2_ALIGNMENT table exists?                       │
│  └─ Check SCORE_ALIGNMENT table exists?                             │
│                                                                      │
│  Parquet Files:                                                      │
│  └─ Check for {basename}_feature_alignment.parquet?                 │
│                                                                      │
│  Split Parquet Files:                                                │
│  └─ Check for {infile}/feature_alignment.parquet?                   │
└─────────────────────────────────────────────────────────────────────┘
                                  ↓
                    ┌─────────────┴─────────────┐
                    │                           │
          ┌─────────▼──────────┐    ┌──────────▼─────────┐
          │  Alignment Present │    │ Alignment Missing  │
          │   use_alignment=T  │    │  use_alignment=T   │
          └─────────┬──────────┘    └──────────┬─────────┘
                    │                           │
                    │                           ↓
                    │              ┌────────────────────────┐
                    │              │ Standard Export Only   │
                    │              │ (no alignment used)    │
                    │              └────────────────────────┘
                    │
                    ↓
┌─────────────────────────────────────────────────────────────────────┐
│                    3. Data Reading Phase                             │
│                                                                      │
│  Step A: Fetch Base Features (MS2 QVALUE filter)                    │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ SELECT CAST(FEATURE.ID AS INTEGER) AS id,                     │ │
│  │        ... (other columns)                                     │ │
│  │ FROM FEATURES                                                  │ │
│  │ WHERE SCORE_MS2.QVALUE < max_rs_peakgroup_qvalue (e.g., 0.05)│ │
│  │ → Base Features (passed MS2 threshold)                        │ │
│  │ → Mark with from_alignment=0                                  │ │
│  │ → CAST preserves precision for large feature IDs             │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  Step B: Fetch Aligned Features (Alignment PEP filter)              │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ SELECT DENSE_RANK() OVER (...) AS alignment_group_id,        │ │
│  │        ALIGNED_FEATURE_ID AS id,                              │ │
│  │        CAST(REFERENCE_FEATURE_ID AS INTEGER)                  │ │
│  │          AS alignment_reference_feature_id,                   │ │
│  │        REFERENCE_RT AS alignment_reference_rt                 │ │
│  │ FROM FEATURE_MS2_ALIGNMENT                                    │ │
│  │ JOIN SCORE_ALIGNMENT                                          │ │
│  │ WHERE LABEL = 1 (target)                                      │ │
│  │ AND SCORE_ALIGNMENT.PEP < max_alignment_pep (e.g., 0.7)      │ │
│  │ AND REF FEATURE passes MS2 QVALUE threshold                  │ │
│  │ → Aligned Features (good alignment scores)                    │ │
│  │ → Includes alignment_group_id and reference info              │ │
│  │ → CAST preserves precision for large feature IDs             │ │
│  └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
                                  ↓
┌─────────────────────────────────────────────────────────────────────┐
│                    4. Feature Recovery Logic                         │
│                                                                      │
│  ┌─────────────────┐        ┌──────────────────┐                   │
│  │ Base Features   │        │ Aligned Features │                   │
│  │ (MS2 passed)    │        │ (Alignment good) │                   │
│  │ IDs: 1,2,3,4,5  │        │ IDs: 3,4,6,7,8   │                   │
│  └────────┬────────┘        └────────┬─────────┘                   │
│           │                          │                              │
│           └──────────┬───────────────┘                              │
│                      ↓                                              │
│           ┌──────────────────────┐                                  │
│           │ Find NEW features:   │                                  │
│           │ aligned - base       │                                  │
│           │ = {6, 7, 8}         │                                  │
│           └──────────┬───────────┘                                  │
│                      ↓                                              │
│           ┌──────────────────────────────────────┐                  │
│           │ Fetch full data for recovered        │                  │
│           │ features: 6, 7, 8                    │                  │
│           │ Mark: from_alignment=1               │                  │
│           │ Add: alignment_pep                   │                  │
│           │ Add: alignment_qvalue                │                  │
│           │ Add: alignment_group_id              │                  │
│           │ Add: alignment_reference_feature_id  │                  │
│           │ Add: alignment_reference_rt          │                  │
│           └──────────┬───────────────────────────┘                  │
│                      ↓                                              │
│           ┌──────────────────────────────────────┐                  │
│           │ Assign alignment_group_id to         │                  │
│           │ reference features                   │                  │
│           │ (features pointed to by aligned IDs) │                  │
│           └──────────┬───────────────────────────┘                  │
│                      ↓                                              │
│           ┌──────────────────────┐                                  │
│           │ Combine:             │                                  │
│           │ Base (1,2,3,4,5) +   │                                  │
│           │ Recovered (6,7,8)    │                                  │
│           │ = Final (1-8)        │                                  │
│           └──────────────────────┘                                  │
└─────────────────────────────────────────────────────────────────────┘
                                  ↓
┌─────────────────────────────────────────────────────────────────────┐
│                    5. Export Results                                 │
│                                                                      │
│  Final TSV/Matrix includes:                                          │
│  • Original features (from_alignment=0)                              │
│  • Recovered features (from_alignment=1, with alignment scores)      │
│  • More complete quantification with fewer missing values            │
└─────────────────────────────────────────────────────────────────────┘

Detailed Component Workflow

A. Reader Classes (OSW, Parquet, Split Parquet)

┌──────────────────────────────────────────────────────────────┐
│                    Reader.__init__()                         │
│                                                              │
│  OSWReader:                                                  │
│    N/A - checks at read time                                │
│                                                              │
│  ParquetReader:                                              │
│    self._has_alignment = _check_alignment_file_exists()     │
│    • Checks: {basename}_feature_alignment.parquet           │
│                                                              │
│  SplitParquetReader:                                         │
│    self._has_alignment = _check_alignment_file_exists()     │
│    • Checks: {infile}/feature_alignment.parquet             │
└──────────────────────────────────────────────────────────────┘
                         ↓
┌──────────────────────────────────────────────────────────────┐
│                    Reader.read()                             │
│                                                              │
│  → _read_standard_data()                                     │
│    if config.use_alignment AND alignment_present:           │
│      → _fetch_alignment_features()                           │
│      → Merge with base features                              │
└──────────────────────────────────────────────────────────────┘

B. Alignment Detection Methods

OSW Files (.osw):
┌─────────────────────────────────────────┐
│  _check_alignment_presence(con)        │
│                                         │
│  return:                                │
│    check_sqlite_table(                 │
│      con, "FEATURE_MS2_ALIGNMENT"      │
│    ) AND                                │
│    check_sqlite_table(                 │
│      con, "SCORE_ALIGNMENT"            │
│    )                                    │
└─────────────────────────────────────────┘

Parquet Files (.parquet):
┌─────────────────────────────────────────┐
│  _check_alignment_file_exists()        │
│                                         │
│  if infile.endswith('.parquet'):       │
│    base = infile[:-8]                  │
│    alignment_file =                    │
│      f"{base}_feature_alignment.parquet"│
│    return os.path.exists(alignment_file)│
└─────────────────────────────────────────┘

Split Parquet Files (directory with .oswpq):
┌─────────────────────────────────────────┐
│  _check_alignment_file_exists()        │
│                                         │
│  if os.path.isdir(infile):             │
│    alignment_file = os.path.join(      │
│      infile, "feature_alignment.parquet"│
│    )                                    │
│    return os.path.exists(alignment_file)│
└─────────────────────────────────────────┘

C. Feature Recovery Decision Tree

                    Start Export
                         │
                         ↓
              ┌──────────────────────┐
              │ use_alignment=True?  │
              └──────────┬───────────┘
                         │
           ┌─────────────┴─────────────┐
           │                           │
          YES                         NO
           │                           │
           ↓                           ↓
    ┌──────────────┐          ┌──────────────┐
    │ Alignment    │          │ Standard     │
    │ data exists? │          │ Export Only  │
    └──────┬───────┘          └──────────────┘
           │
     ┌─────┴─────┐
     │           │
    YES         NO
     │           │
     ↓           ↓
┌─────────┐  ┌─────────┐
│ Use     │  │Standard │
│Alignment│  │Export   │
└─────────┘  └─────────┘
     │           │
     └─────┬─────┘
           ↓
    Export Results

Example Scenario

Before Alignment Integration

Run 1: Feature detected with MS2 QVALUE = 0.02 ✓ (exported)
Run 2: Feature detected with MS2 QVALUE = 0.08 ✗ (not exported - weak signal)
Run 3: Feature detected with MS2 QVALUE = 0.03 ✓ (exported)

Result: Missing quantification in Run 2

After Alignment Integration

Run 1: Feature detected with MS2 QVALUE = 0.02 ✓ (exported, from_alignment=0)
Run 2: Feature detected with MS2 QVALUE = 0.08 ✗ (weak MS2)
       BUT: Alignment PEP = 0.4 ✓ (good alignment!)
       → Recovered via alignment (exported, from_alignment=1)
Run 3: Feature detected with MS2 QVALUE = 0.03 ✓ (exported, from_alignment=0)

Result: Complete quantification across all runs

File Structure Examples

OSW Format

data.osw (SQLite database)
├─ FEATURE_MS2_ALIGNMENT table
└─ SCORE_ALIGNMENT table

Parquet Format

data.parquet                    ← Main file
data_feature_alignment.parquet  ← Alignment file

Split Parquet Format

experiment/
├─ run1.oswpq/
│  ├─ precursors_features.parquet
│  └─ transition_features.parquet
├─ run2.oswpq/
│  ├─ precursors_features.parquet
│  └─ transition_features.parquet
└─ feature_alignment.parquet    ← Alignment file (parent level)

Key Benefits

Increased Coverage: Recovers peaks with weak MS2 but good alignment
Better Quantification: Fewer missing values in matrices
Quality Control: Uses alignment PEP/QVALUE thresholds
Backwards Compatible: Disabled by default via auto-detection
Transparent: Features marked with from_alignment flag

Configuration Options

# Use default (enabled with auto-detection)
pyprophet export tsv --in data.osw --out results.tsv

# Customize threshold
pyprophet export tsv --in data.osw --out results.tsv \
  --max_alignment_pep 0.5

# Explicitly disable
pyprophet export tsv --in data.osw --out results.tsv \
  --no-use_alignment

Output Columns

Recovered features include additional columns:

from_alignment: 0 (base) or 1 (recovered)
alignment_pep: Alignment posterior error probability
alignment_qvalue: Alignment q-value
alignment_group_id: Group identifier linking aligned features together
alignment_reference_feature_id: ID of the reference feature used for alignment
alignment_reference_rt: Retention time of the reference feature

These allow users to:

Identify which features were recovered
Assess alignment quality
Track which features are aligned together via alignment_group_id
Find the reference feature that was used for alignment
Filter or analyze separately if needed