Nature Machine Intelligence submission · Protein–peptide recognition

ProPepX

A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein–peptide binding-site prediction

ProPepX learns protein–peptide interfaces as reciprocal recognition events, using bidirectional cross-interface attention and gated fusion to produce partner-conditioned residue-level binding-site predictions.

Syed Kumail Hussain Naqvi^† Chandra Sourav^† Kil To Chong^* Hilal Tayara^*

^† These first authors contributed equally to this work. • ^* Corresponding authors.

Web Server View GitHub Model Weights

Unified predictionSimultaneously identifies peptide-binding residues on proteins and protein-binding residues on peptides within a shared interaction-aware architecture.

Interaction-aware transferTransfers molecular-recognition knowledge from large-scale protein–peptide interaction data to enable robust residue-level binding-site prediction across protein-side, peptide-side, joint, and previously unseen zero-shot settings.

Mechanistic interpretationTransforms residue-level predictions into mechanistic biological insight by revealing residue-level determinants of molecular recognition through cross-interface attention, attribution analysis, and perturbation-based validation.

Transfer-learning workflow

From generic binding data to task-specific interface prediction

The proposed workflow separates representation learning from task adaptation. Stage 1 learns general protein–peptide recognition patterns from large-scale labeled binding data. Stage 2 fine-tunes the pretrained model for protein-side or peptide-side residue-level prediction.

Pretraining on broad protein-binding and peptide-binding residue annotations.
Fine-tuning on small, high-quality task-specific benchmarks.
Supervised transfer learning to reduce overfitting and improve generalization.

Conceptual interaction-aware transfer-learning workflow for ProPepX — **Training objective.** Large-scale supervised interaction-aware pretraining followed by task-specific fine-tuning.

Architecture

Partner-aware model design

ProPepX integrates pretrained protein language model embeddings, multi-scale motif extraction, transformer contextual encoding, bidirectional cross-attention, gated fusion, and residue-level classification.

PLM embeddings

Protein and peptide sequences are converted into contextual residue representations.

Intra-molecular encoders

Local sequence motifs and long-range dependencies are captured before partner exchange.

Cross-interface attention

Protein→peptide and peptide→protein attention learns residue–residue coupling.

Gated fusion

Intrinsic self features and partner-aware cross features are adaptively combined.

Residue classifier

Binding probabilities are predicted for both protein and peptide residues.

ProPepX OVERVIEW

From Protein–Peptide Sequences to Interpretable Molecular Insight

Large-scale interaction-aware pretraining enables ProPepX to accurately identify peptide-binding residues on proteins and protein-binding residues on peptides, while transforming residue-level predictions into interpretable evidence through cross-interface attention, attribution analysis, and perturbation-based validation.

Benchmark evidence

State-of-the-Art Performance Across Interaction-Aware Transfer Learning and Zero-Shot Generalization

Performance is summarized using MCC and AUROC, two stringent metrics for residue-level binding-site prediction under severe class imbalance. The comparison panels present results for the strongest competing methods and recent baseline approaches. Across protein-side, peptide-side, joint prediction, and zero-shot settings, ProPepX consistently achieves state-of-the-art performance relative to structure-based predictors, handcrafted sequence-feature models, and modern PLM-based baselines.

Task 01

Protein-side IATL

Peptide-binding residues on proteins

0.000Best MCC

Hover to view all values

Protein-side interaction-aware transfer learning

Peptide-binding residue prediction on protein benchmarks

TS92

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
MCC	AF-Multimer0.605	LMFFT#0.386	ProPepX ESM-30.595	ProPepX ProtT50.609
AUC	PePNN-Struct0.855	LMFFT#0.844	ProPepX ESM-30.895	ProPepX ProtT50.919

TS125

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
MCC	AF-Multimer0.576	LMFFT0.450	ProPepX ESM-30.546	ProPepX ProtT50.583
AUC	PePNN-Struct0.885	LMFFT#0.862	ProPepX ESM-30.919	ProPepX ProtT50.925

TS251

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
MCC	PepNN-struct0.566	PepCA0.340	ProPepX ESM-30.627	ProPepX ProtT50.677
AUC	PePNN-Struct0.833	PepCA0.794	ProPepX ESM-30.894	ProPepX ProtT50.909

TS639

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
MCC	AF-Multimer0.450	LMFFT0.363	ProPepX ESM-30.517	ProPepX ProtT50.558
AUC	PePNN-Struct0.868	LMFFT#0.829	ProPepX ESM-30.902	ProPepX ProtT50.907

Task 02

Peptide-side IATL

Protein-binding residues on peptides

0.000Best Avg. MCC

Hover to view all values

Peptide-side interaction-aware transfer learning

Protein-binding residue prediction on peptide benchmark TS231

TS231

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
Avg. MCC	PepTrans0.558	AlphaFold30.514	ProPepX ESM-30.600	ProPepX ProtT50.633
Avg. AUROC	CAMP0.803	PepTrans0.776	ProPepX ESM-30.789	ProPepX ProtT50.807

Task 03

Joint-mode learning

Simultaneous protein and peptide residue prediction

0.000Best AUROC

Hover to view all values

Joint protein–peptide learning

Simultaneous prediction on both molecular partners

Test167

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
Protein MCC	KGIPA0.533	KGIPA0.533	ProPepX ESM-30.539	ProPepX ProtT50.544
Protein AUROC	KGIPA0.937	KGIPA0.937	ProPepX ESM-30.940	ProPepX ProtT50.941
Peptide MCC	KGIPA0.473	KGIPA0.473	ProPepX ESM-30.406	ProPepX ProtT50.395
Peptide AUROC	KGIPA0.845	KGIPA0.845	ProPepX ESM-30.804	ProPepX ProtT50.802

Test251–LEADS

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
Protein MCC	AlphaFold30.550	KGIPA0.540	ProPepX ESM-30.581	ProPepX ProtT50.610
Protein AUROC	KGIPA0.924	KGIPA0.924	ProPepX ESM-30.958	ProPepX ProtT50.964
Peptide MCC	IIDL-PepPI0.350	KGIPA0.339	ProPepX ESM-30.497	ProPepX ProtT50.502
Peptide AUROC	KGIPA0.761	KGIPA0.761	ProPepX ESM-30.798	ProPepX ProtT50.819

Task 04

Zero-shot generalization

Prediction without task-specific fine-tuning

0.000Protein AUROC

Hover to view all values

Zero-shot evaluation

Generalization to unseen protein–peptide complexes on TS167

TS167

Metric	Best competing	Recent baseline	ProPepX ESM-3	ProPepX ProtT5
Protein MCC	IIDL-PepPI0.537	IIDL-PepPI0.537	ProPepX ESM-30.581	ProPepX ProtT50.605
Protein AUROC	PepCA0.891	IIDL-PepPI0.882	ProPepX ESM-30.928	ProPepX ProtT50.947
Peptide MCC	IIDL-PepPI0.505	IIDL-PepPI0.505	ProPepX ESM-30.583	ProPepX ProtT50.613
Peptide AUROC	IIDL-PepPI0.849	IIDL-PepPI0.849	ProPepX ESM-30.879	ProPepX ProtT50.871

Only MCC and AUROC are highlighted here to provide a concise overview of benchmark performance. Comprehensive evaluation using additional metrics, including F1 score, precision, recall, accuracy (ACC), specificity, and AUPR, is reported in the manuscript and supplementary materials, where ProPepX maintains strong and consistently competitive performance across all evaluation settings.

Bidirectional gated-weighted cross-interface attention and attribution perturbation analysis — **Mechanistic interpretation.** Bidirectional attention maps, attribution profiles, and perturbation curves show which partner residues support binding-site decisions.

Interpretability

From Residue-Level Predictions to Mechanistic Biological Insight

ProPepX transforms residue-level predictions into biologically meaningful insight through cross-interface attention maps, attribution analysis, and perturbation-based validation, revealing the residue–residue interactions that govern protein–peptide recognition.

Residue–residue cross-interface coupling maps High-confidence attention hotspots and recognition motifs Attribution profiles for protein and peptide binding sites Perturbation-based validation against random deletion controls

Resources

Open resources for ProPepX and interpretation

GitHub repository Source code, training pipelines, benchmark datasets, and reproducible examples for ProPepX. Hugging Face model weights Pretrained and task-specific ProPepX checkpoints for protein-side, peptide-side, joint, and zero-shot prediction tasks. Scientific visualizations Visual resources illustrating the ProPepX architecture, transfer-learning workflow, and mechanistic interpretation framework. Citation BibTeX and manuscript title for academic referencing.

Citation

Cite ProPepX

Please cite the manuscript when using ProPepX, its model weights, figures, or code.

@article{naqvi2026propepx,
  title   = {ProPepX: A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein--peptide binding-site prediction},
  author  = {Naqvi, Syed Kumail Hussain and Sourav, Chandra and Chong, Kil To and Tayara, Hilal},
  journal = {Nature Machine Intelligence},
  year    = {2026},
  note    = {Manuscript under consideration}
}