Documentation | Paper | Tutorials | Installation
Protein & Interactomic Graph Library
This package provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks. We provide compatibility with standard PyData formats, as well as graph objects designed for ease of use with popular deep learning libraries.
Graphein provides both a programmatic API and a command-line interface for constructing graphs.
Graphein configs can be specified as .yaml files to batch process graphs from the commandline.
graphein -c config.yaml -p path/to/pdbs -o path/to/output| Tutorial (Residue-level) | Tutorial (Atomic) | Docs |
from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import construct_graph config = ProteinGraphConfig() g = construct_graph(config=config, pdb_code="3eiy")| Tutorial | Docs |
from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import construct_graph from graphein.protein.utils import download_alphafold_structure config = ProteinGraphConfig() fp = download_alphafold_structure("Q5VSL9", aligned_score=False) g = construct_graph(config=config, path=fp)| Tutorial | Docs |
from graphein.protein.config import ProteinMeshConfig from graphein.protein.meshes import create_mesh verts, faces, aux = create_mesh(pdb_code="3eiy", config=config)Graphein can create molecular graphs from smiles strings as well as .sdf, .mol2, and .pdb files
| Tutorial | Docs |
from graphein.molecule.config import MoleculeGraphConfig from graphein.molecule.graphs import construct_graph g = create_graph(smiles="CC(=O)OC1=CC=CC=C1C(=O)O", config=config)| Tutorial | Docs |
from graphein.rna.graphs import construct_rna_graph # Build the graph from a dotbracket & optional sequence rna = construct_rna_graph(dotbracket='..(((((..(((...)))..)))))...', sequence='UUGGAGUACACAACCUGUACACUCUUUC')| Tutorial | Docs |
from graphein.ppi.config import PPIGraphConfig from graphein.ppi.graphs import compute_ppi_graph from graphein.ppi.edges import add_string_edges, add_biogrid_edges config = PPIGraphConfig() protein_list = ["CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB"] g = compute_ppi_graph(config=config, protein_list=protein_list, edge_construction_funcs=[add_string_edges, add_biogrid_edges] )| Tutorial | Docs |
from graphein.grn.config import GRNGraphConfig from graphein.grn.graphs import compute_grn_graph from graphein.grn.edges import add_regnetwork_edges, add_trrust_edges config = GRNGraphConfig() gene_list = ["AATF", "MYC", "USF1", "SP1", "TP53", "DUSP1"] g = compute_grn_graph( gene_list=gene_list, edge_construction_funcs=[ partial(add_trrust_edges, trrust_filtering_funcs=config.trrust_config.filtering_functions), partial(add_regnetwork_edges, regnetwork_filtering_funcs=config.regnetwork_config.filtering_functions), ], )The simplest install is via pip. N.B this does not install ML/DL libraries which are required for conversion to their data formats and for generating protein structure meshes with PyTorch 3D. Further details
pip install graphein # For base install pip install graphein[extras] # For additional featurisation dependencies pip install graphein[dev] # For dev dependencies pip install graphein[all] # To get the lotHowever, there are a number of (optional) utilities (DSSP, PyMol, GetContacts) that are not available via PyPI:
conda install -c salilab dssp # Required for computing secondary structural features conda install -c schrodinger pymol # Required for PyMol visualisations & mesh generation # GetContacts - used as an alternative way to compute intramolecular interactions conda install -c conda-forge vmd-python git clone https://github.com/getcontacts/getcontacts # Add folder to PATH echo "export PATH=\$PATH:`pwd`/getcontacts" >> ~/.bashrc source ~/.bashrc To test the installation, run: cd getcontacts/example/5xnd get_dynamic_contacts.py --topology 5xnd_topology.pdb \ --trajectory 5xnd_trajectory.dcd \ --itypes hb \ --output 5xnd_hbonds.tsv The dev environment includes GPU Builds (CUDA 11.1) for each of the deep learning libraries integrated into graphein.
git clone https://www.github.com/a-r-j/graphein cd graphein conda env create -f environment-dev.yml pip install -e .A lighter install can be performed with:
git clone https://www.github.com/a-r-j/graphein cd graphein conda env create -f environment.yml pip install -e .We provide two docker-compose files for CPU (docker-compose.cpu.yml) and GPU usage (docker-compose.yml) locally. For GPU usage please ensure that you have NVIDIA Container Toolkit installed. Ensure that you install the locally mounted volume after entering the container (pip install -e .). This will also setup the dev environment locally.
To build (GPU) run:
docker-compose up -d --build # start the container docker-compose down # stop the container Please consider citing graphein if it proves useful in your work.
@inproceedings{jamasb2022graphein, title={Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks}, author={Arian Rokkum Jamasb and Ramon Vi{\~n}as Torn{\'e} and Eric J Ma and Yuanqi Du and Charles Harris and Kexin Huang and Dominic Hall and Pietro Lio and Tom Leon Blundell}, booktitle={Advances in Neural Information Processing Systems}, editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho}, year={2022}, url={https://openreview.net/forum?id=9xRZlV6GfOX} } 