Skip to content

tsudalab/HDXRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDXRank DOI

HDXRank is an deep learning pipeline that applies HDX-MS (Hydrogen-Deuterium Exchange Mass Spectrometry) restraints to rank protein-protein complex predictions.

Overview

HDXRank addresses the challenge of selecting accurate protein complex models by integrating experimental HDX-MS data with graph-based deep learning. The method uses HDX restraints to evaluate how well predicted complex structures align with experimental binding interface data, providing a robust framework for complex model ranking with improved prediction accuracy.

Key Features

  • HDX-MS data integration for experimental restraints
  • Support for multiple input sources (docking predictions, AlphaFold models)
  • Flexible and extensible framework for incorporating new experimental data

Installation

HDXRank requires Python with CUDA 11.8 support. We provide both Docker and Conda installation options.

Prerequisites

  • Docker (recommended) or Conda
  • CUDA 11.8 compatible GPU (for model training/prediction)

Quick Start with Docker (Recommended)

  1. Clone the repository:
git clone https://github.com/SuperChrisW/HDXRank.git cd HDXRank
  1. Run with Docker:
docker pull superchrisw/hdxrank:latest docker run -it --rm -v $(pwd):/job/code superchrisw/hdxrank:latest /bin/bash cd /job/code python main.py --help

Alternative: Conda Installation

chmod +x ./install.sh ./install.sh conda activate HDXRank python main.py --help

Required Input Files

HDXRank requires four main types of input files:

  1. Protein Structure Files (.pdb) - Complex structure predictions to be ranked + apo structures
  2. Multiple Sequence Alignments (.hhm) - Generated using HHblits against UniRef30
  3. HDX-MS Data (.xlsx) - Experimental HDX data with specific column format
  4. Configuration File (.yaml) - Pipeline settings and parameters

Preparing MSA Files

HDXRank requires .hhm format multiple sequence alignments generated using HHblits:

Install HHblits

conda create -n hhblits -y conda activate hhblits conda install hhsuite -c conda-forge -c bioconda -y

Download UniRef30 Database

mkdir -p databases cd databases wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz tar -xvfz UniRef30_2020_06_hhsuite.tar.gz rm UniRef30_2020_06_hhsuite.tar.gz cd ..

Generate .hhm Files

bash ./scripts/hhblits.sh

This processes all .fasta files in /HDXRank/fasta_files/ and saves .hhm files to /HDXRank/hhm_files/

HDX-MS Data Format

Your Excel file should contain the following columns:

  • protein - Protein identifier
  • state - Experimental state (apo/complex)
  • start - Peptide start position
  • end - Peptide end position
  • sequence - Peptide sequence
  • log_t - Log exchange time
  • RFU - Relative fractional uptake

Usage

Configuration Setup

HDXRank uses YAML configuration files to define all pipeline parameters. See configs/config.template.yaml for a complete template.

Key Configuration Sections:

GeneralParameters: File paths and execution mode

TaskParameters: Control protein embedding and graph construction

PredictionParameters: Model prediction settings

ScorerParameters: Scoring and ranking settings

Running HDXRank

Basic Usage

python main.py --config path/to/config.yaml

Output Files

Results are saved to the specified output directory:

  • HDX_scores.csv - Ranked structures with HDXRank scores
  • predictions/ - Raw RFU predictions for each structure
  • results/scores/ - Detailed scoring analysis and plots

Example Data

Download example datasets and configurations:

# HDX-MS dataset for training/validation wget -O dataset.zip https://zenodo.org/records/15426072/files/dataset.zip?download=1 unzip dataset.zip # Example structures and configurations wget -O example.zip https://zenodo.org/records/15426072/files/example.zip?download=1 unzip example.zip rm dataset.zip example.zip

Users can repeat rigid docking by using HDock program in prog.tar.gz.

Model Training

Preparing Training Data

  1. Add new HDX-MS files to dataset/HDX_files/
  2. Update the dataset record in dataset/250110_HDXRank_dataset.xlsx
  3. Generate embeddings and graphs:
    python main.py --config ./configs/config_retrain_HDXRank.yaml

Training the Model

python ./hdxrank/HDXRank_train.py --config ./configs/config_retrain_HDXRank.yaml

Citation

If you use HDXRank in your research, please cite:

@article{Wang2025HDXRank, author = {Liyao Wang and Andrejs Tucš and Songting Ding and Koji Tsuda and Adnan Sljoka}, title = {HDXRank: A Deep Learning Framework for Ranking Protein Complex Predictions With Hydrogen–Deuterium Exchange Data}, journal = {Journal of Chemical Theory and Computation}, year = {2025}, volume = {21}, number = {14}, pages = {7173--7187}, doi = {10.1021/acs.jctc.5c00175} }

Support

For questions, bug reports, or feature requests, please open an issue on GitHub

About

applies HDX-MS restraints to rank protein-protein complex predictions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published