Nature Machine Intelligence submission · Protein–peptide recognition

ProPepX

A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein–peptide binding-site prediction

ProPepX learns protein–peptide interfaces as reciprocal recognition events, using bidirectional cross-interface attention and gated fusion to produce partner-conditioned residue-level binding-site predictions.

These first authors contributed equally to this work.   •   * Corresponding authors.

Unified predictionSimultaneously identifies peptide-binding residues on proteins and protein-binding residues on peptides within a shared interaction-aware architecture.
Interaction-aware transferTransfers molecular-recognition knowledge from large-scale protein–peptide interaction data to enable robust residue-level binding-site prediction across protein-side, peptide-side, joint, and previously unseen zero-shot settings.
Mechanistic interpretationTransforms residue-level predictions into mechanistic biological insight by revealing residue-level determinants of molecular recognition through cross-interface attention, attribution analysis, and perturbation-based validation.

Transfer-learning workflow

From generic binding data to task-specific interface prediction

The proposed workflow separates representation learning from task adaptation. Stage 1 learns general protein–peptide recognition patterns from large-scale labeled binding data. Stage 2 fine-tunes the pretrained model for protein-side or peptide-side residue-level prediction.

  • Pretraining on broad protein-binding and peptide-binding residue annotations.
  • Fine-tuning on small, high-quality task-specific benchmarks.
  • Supervised transfer learning to reduce overfitting and improve generalization.
Conceptual interaction-aware transfer-learning workflow for ProPepX
Training objective. Large-scale supervised interaction-aware pretraining followed by task-specific fine-tuning.

Architecture

Partner-aware model design

ProPepX integrates pretrained protein language model embeddings, multi-scale motif extraction, transformer contextual encoding, bidirectional cross-attention, gated fusion, and residue-level classification.

01

PLM embeddings

Protein and peptide sequences are converted into contextual residue representations.

02

Intra-molecular encoders

Local sequence motifs and long-range dependencies are captured before partner exchange.

03

Cross-interface attention

Protein→peptide and peptide→protein attention learns residue–residue coupling.

04

Gated fusion

Intrinsic self features and partner-aware cross features are adaptively combined.

05

Residue classifier

Binding probabilities are predicted for both protein and peptide residues.

ProPepX OVERVIEW

From Protein–Peptide Sequences to Interpretable Molecular Insight

Large-scale interaction-aware pretraining enables ProPepX to accurately identify peptide-binding residues on proteins and protein-binding residues on peptides, while transforming residue-level predictions into interpretable evidence through cross-interface attention, attribution analysis, and perturbation-based validation.

Benchmark evidence

State-of-the-Art Performance Across Interaction-Aware Transfer Learning and Zero-Shot Generalization

Performance is summarized using MCC and AUROC, two stringent metrics for residue-level binding-site prediction under severe class imbalance. The comparison panels present results for the strongest competing methods and recent baseline approaches. Across protein-side, peptide-side, joint prediction, and zero-shot settings, ProPepX consistently achieves state-of-the-art performance relative to structure-based predictors, handcrafted sequence-feature models, and modern PLM-based baselines.

Task 01

Protein-side IATL

Peptide-binding residues on proteins

0.000Best MCC
Hover to view all values
Task 02

Peptide-side IATL

Protein-binding residues on peptides

0.000Best Avg. MCC
Hover to view all values
Task 03

Joint-mode learning

Simultaneous protein and peptide residue prediction

0.000Best AUROC
Hover to view all values
Task 04

Zero-shot generalization

Prediction without task-specific fine-tuning

0.000Protein AUROC
Hover to view all values

Only MCC and AUROC are highlighted here to provide a concise overview of benchmark performance. Comprehensive evaluation using additional metrics, including F1 score, precision, recall, accuracy (ACC), specificity, and AUPR, is reported in the manuscript and supplementary materials, where ProPepX maintains strong and consistently competitive performance across all evaluation settings.

Bidirectional gated-weighted cross-interface attention and attribution perturbation analysis
Mechanistic interpretation. Bidirectional attention maps, attribution profiles, and perturbation curves show which partner residues support binding-site decisions.

Interpretability

From Residue-Level Predictions to Mechanistic Biological Insight

ProPepX transforms residue-level predictions into biologically meaningful insight through cross-interface attention maps, attribution analysis, and perturbation-based validation, revealing the residue–residue interactions that govern protein–peptide recognition.

Residue–residue cross-interface coupling maps High-confidence attention hotspots and recognition motifs Attribution profiles for protein and peptide binding sites Perturbation-based validation against random deletion controls

Resources

Open resources for ProPepX and interpretation

Citation

Cite ProPepX

Please cite the manuscript when using ProPepX, its model weights, figures, or code.

@article{naqvi2026propepx,
  title   = {ProPepX: A unified framework for interpretable bidirectional interaction-aware transfer learning in joint residue-level protein--peptide binding-site prediction},
  author  = {Naqvi, Syed Kumail Hussain and Sourav, Chandra and Chong, Kil To and Tayara, Hilal},
  journal = {Nature Machine Intelligence},
  year    = {2026},
  note    = {Manuscript under consideration}
}