Nature Machine Intelligence submission · Protein–peptide recognition
ProPepX
A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein–peptide binding-site prediction
ProPepX learns protein–peptide interfaces as reciprocal recognition events, using bidirectional cross-interface attention and gated fusion to produce partner-conditioned residue-level binding-site predictions.
Transfer-learning workflow
From generic binding data to task-specific interface prediction
The proposed workflow separates representation learning from task adaptation. Stage 1 learns general protein–peptide recognition patterns from large-scale labeled binding data. Stage 2 fine-tunes the pretrained model for protein-side or peptide-side residue-level prediction.
- Pretraining on broad protein-binding and peptide-binding residue annotations.
- Fine-tuning on small, high-quality task-specific benchmarks.
- Supervised transfer learning to reduce overfitting and improve generalization.
Architecture
Partner-aware model design
ProPepX integrates pretrained protein language model embeddings, multi-scale motif extraction, transformer contextual encoding, bidirectional cross-attention, gated fusion, and residue-level classification.
PLM embeddings
Protein and peptide sequences are converted into contextual residue representations.
Intra-molecular encoders
Local sequence motifs and long-range dependencies are captured before partner exchange.
Cross-interface attention
Protein→peptide and peptide→protein attention learns residue–residue coupling.
Gated fusion
Intrinsic self features and partner-aware cross features are adaptively combined.
Residue classifier
Binding probabilities are predicted for both protein and peptide residues.
ProPepX OVERVIEW
From Protein–Peptide Sequences to Interpretable Molecular Insight
Large-scale interaction-aware pretraining enables ProPepX to accurately identify peptide-binding residues on proteins and protein-binding residues on peptides, while transforming residue-level predictions into interpretable evidence through cross-interface attention, attribution analysis, and perturbation-based validation.
Benchmark evidence
State-of-the-Art Performance Across Interaction-Aware Transfer Learning and Zero-Shot Generalization
Performance is summarized using MCC and AUROC, two stringent metrics for residue-level binding-site prediction under severe class imbalance. The comparison panels present results for the strongest competing methods and recent baseline approaches. Across protein-side, peptide-side, joint prediction, and zero-shot settings, ProPepX consistently achieves state-of-the-art performance relative to structure-based predictors, handcrafted sequence-feature models, and modern PLM-based baselines.
Protein-side IATL
Peptide-binding residues on proteins
Peptide-binding residue prediction on protein benchmarks
TS92
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| MCC | AF-Multimer0.605 | LMFFT#0.386 | ProPepX ESM-30.595 | ProPepX ProtT50.609 |
| AUC | PePNN-Struct0.855 | LMFFT#0.844 | ProPepX ESM-30.895 | ProPepX ProtT50.919 |
TS125
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| MCC | AF-Multimer0.576 | LMFFT0.450 | ProPepX ESM-30.546 | ProPepX ProtT50.583 |
| AUC | PePNN-Struct0.885 | LMFFT#0.862 | ProPepX ESM-30.919 | ProPepX ProtT50.925 |
TS251
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| MCC | PepNN-struct0.566 | PepCA0.340 | ProPepX ESM-30.627 | ProPepX ProtT50.677 |
| AUC | PePNN-Struct0.833 | PepCA0.794 | ProPepX ESM-30.894 | ProPepX ProtT50.909 |
TS639
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| MCC | AF-Multimer0.450 | LMFFT0.363 | ProPepX ESM-30.517 | ProPepX ProtT50.558 |
| AUC | PePNN-Struct0.868 | LMFFT#0.829 | ProPepX ESM-30.902 | ProPepX ProtT50.907 |
Peptide-side IATL
Protein-binding residues on peptides
Protein-binding residue prediction on peptide benchmark TS231
TS231
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| Avg. MCC | PepTrans0.558 | AlphaFold30.514 | ProPepX ESM-30.600 | ProPepX ProtT50.633 |
| Avg. AUROC | CAMP0.803 | PepTrans0.776 | ProPepX ESM-30.789 | ProPepX ProtT50.807 |
Joint-mode learning
Simultaneous protein and peptide residue prediction
Simultaneous prediction on both molecular partners
Test167
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| Protein MCC | KGIPA0.533 | KGIPA0.533 | ProPepX ESM-30.539 | ProPepX ProtT50.544 |
| Protein AUROC | KGIPA0.937 | KGIPA0.937 | ProPepX ESM-30.940 | ProPepX ProtT50.941 |
| Peptide MCC | KGIPA0.473 | KGIPA0.473 | ProPepX ESM-30.406 | ProPepX ProtT50.395 |
| Peptide AUROC | KGIPA0.845 | KGIPA0.845 | ProPepX ESM-30.804 | ProPepX ProtT50.802 |
Test251–LEADS
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| Protein MCC | AlphaFold30.550 | KGIPA0.540 | ProPepX ESM-30.581 | ProPepX ProtT50.610 |
| Protein AUROC | KGIPA0.924 | KGIPA0.924 | ProPepX ESM-30.958 | ProPepX ProtT50.964 |
| Peptide MCC | IIDL-PepPI0.350 | KGIPA0.339 | ProPepX ESM-30.497 | ProPepX ProtT50.502 |
| Peptide AUROC | KGIPA0.761 | KGIPA0.761 | ProPepX ESM-30.798 | ProPepX ProtT50.819 |
Zero-shot generalization
Prediction without task-specific fine-tuning
Generalization to unseen protein–peptide complexes on TS167
TS167
| Metric | Best competing | Recent baseline | ProPepX ESM-3 | ProPepX ProtT5 |
|---|---|---|---|---|
| Protein MCC | IIDL-PepPI0.537 | IIDL-PepPI0.537 | ProPepX ESM-30.581 | ProPepX ProtT50.605 |
| Protein AUROC | PepCA0.891 | IIDL-PepPI0.882 | ProPepX ESM-30.928 | ProPepX ProtT50.947 |
| Peptide MCC | IIDL-PepPI0.505 | IIDL-PepPI0.505 | ProPepX ESM-30.583 | ProPepX ProtT50.613 |
| Peptide AUROC | IIDL-PepPI0.849 | IIDL-PepPI0.849 | ProPepX ESM-30.879 | ProPepX ProtT50.871 |
Only MCC and AUROC are highlighted here to provide a concise overview of benchmark performance. Comprehensive evaluation using additional metrics, including F1 score, precision, recall, accuracy (ACC), specificity, and AUPR, is reported in the manuscript and supplementary materials, where ProPepX maintains strong and consistently competitive performance across all evaluation settings.
Interpretability
From Residue-Level Predictions to Mechanistic Biological Insight
ProPepX transforms residue-level predictions into biologically meaningful insight through cross-interface attention maps, attribution analysis, and perturbation-based validation, revealing the residue–residue interactions that govern protein–peptide recognition.
Resources
Open resources for ProPepX and interpretation
Citation
Cite ProPepX
Please cite the manuscript when using ProPepX, its model weights, figures, or code.
@article{naqvi2026propepx,
title = {ProPepX: A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein--peptide binding-site prediction},
author = {Naqvi, Syed Kumail Hussain and Sourav, Chandra and Chong, Kil To and Tayara, Hilal},
journal = {Nature Machine Intelligence},
year = {2026},
note = {Manuscript under consideration}
}