HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence

Wenhao Jin, Kristopher W. Brannan, Katannya Kapeli, Samuel S. Park, Hui Qing Tan, Maya L. Gosztyla, Mayuresh Mujumdar, Joshua Ahdout, Bryce Henroid, Katherine Rothamel, Joy S. Xiang, Limsoon Wong, Gene W. Yeo

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression and, when dysfunctional, underlie human diseases. Proteome-wide discovery efforts predict thousands of RBP candidates, many of which lack canonical RNA-binding domains (RBDs). Here, we present a hybrid ensemble RBP classifier (HydRA), which leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machines (SVMs), convolutional neural networks (CNNs), and Transformer-based protein language models. Occlusion mapping by HydRA robustly detects known RBDs and predicts hundreds of uncharacterized RNA-binding associated domains. Enhanced CLIP (eCLIP) for HydRA-predicted RBP candidates reveals transcriptome-wide RNA targets and confirms RNA-binding activity for HydRA-predicted RNA-binding associated domains. HydRA accelerates construction of a comprehensive RBP catalog and expands the diversity of RNA-binding associated domains.

Original languageEnglish (US)
Pages (from-to)2595-2611.e11
JournalMolecular Cell
Volume83
Issue number14
DOIs
StatePublished - Jul 20 2023

Keywords

  • machine learning
  • protein-protein interaction network
  • RNA-binding domains
  • RNA-binding proteins
  • RNA/metabolism
  • Animals
  • Humans
  • Protein Binding
  • Binding Sites/genetics
  • Deep Learning
  • Hydra/genetics

ASJC Scopus subject areas

  • Molecular Biology
  • Cell Biology

Fingerprint

Dive into the research topics of 'HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence'. Together they form a unique fingerprint.

Cite this