Harnessing the 3D-Beacons Network: A Comprehensive Guide to Accessing and Displaying Protein Structure Data
Paulyna Magaña, Paulyna Magaña, Sreenath Nair, Sreenath Nair, Mihaly Varadi, Mihaly Varadi, Sameer Velankar, Sameer Velankar
3D-Beacons
FAIR data access
federated data network
macromolecular structures
programmatic access
structural bioinformatics
Abstract
Recent advancements in protein structure determination and especially in protein structure prediction techniques have led to the availability of vast amounts of macromolecular structures. However, the accessibility and integration of these structures into scientific workflows are hindered by the lack of standardization among publicly available data resources. To address this issue, we introduced the 3D-Beacons Network, a unified platform that aims to establish a standardized framework for accessing and displaying protein structure data. In this article, we highlight the importance of standardized approaches for accessing protein structure data and showcase the capabilities of 3D-Beacons. We describe four protocols for finding and accessing macromolecular structures from various specialist data resources via 3D-Beacons. First, we describe three scenarios for programmatically accessing and retrieving data using the 3D-Beacons API. Next, we show how to perform sequence-based searches to find structures from model providers. Then, we demonstrate how to search for structures and fetch them directly into a workflow using JalView. Finally, we outline the process of facilitating access to data from providers interested in contributing their structures to the 3D-Beacons Network. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC.
Basic Protocol 1 : Programmatic access to the 3D-Beacons API
Basic Protocol 2 : Sequence-based search using the 3D-Beacons API
Basic Protocol 3 : Accessing macromolecules from 3D-Beacons with JalView
Basic Protocol 4 : Enhancing data accessibility through 3D-Beacons
INTRODUCTION
Emerging from the confluence of breakthroughs in experimental methods, such as cryo-electron microscopy and the dawn of AI-based structure prediction tools exhibiting unparalleled accuracy, the sources of macromolecular structures have proliferated (Lin et al., 2023; Varadi, Anyango, et al., 2022). This growth has resulted in a dynamic landscape of available information, leading to a need for common data standards and a mechanism for unified access to the multitude of molecular structure resources.
Publicly available data resources for protein structures often provide diverse data access mechanisms, hindering seamless access and integration of the data into scientific workflows. Each resource follows its own rules for storing and presenting protein structures, resulting in a fragmented landscape that poses challenges for researchers. This fragmentation becomes particularly apparent when dealing with different types of structures, ranging from experimentally determined to computationally predicted models. For example, whereas the Protein Data Bank (Velankar et al., 2021) contains over 200,000 entries, many of which are macromolecular assemblies, the AlphaFold Protein Structure Database (AlphaFold DB) (Varadi, Anyango, et al., 2022) contains over 214 million predictions for single polypeptide chains. On the other hand, AlphaFill (Hekkelman et al., 2021) has expanded predicted structures by adding known ligands. Other, more specialized data resources, like Isoforms.io (Sommer et al., 2022) and the ABC family transporter dataset of the HegeLab (Tordai et al., 2022), provide smaller but functionally important model datasets. Additionally, including data from the Small-Angle Scattering Biological Data Bank (Kikhney et al., 2020) and Protein Ensemble Database (PED) (Ghafouri et al., 2024) further adds to the complexity of effectively accessing and utilizing protein structure information with low-resolution structural envelopes and highly diverse conformational ensembles.
The existence of multiple, non-standardized approaches to accessing protein structure data slows the pace of scientific advancement. With the sudden influx of hundreds of millions of new macromolecular models, it became crucial to establish a standardized framework encompassing various protein structures and providing a unified interface for their retrieval. Such a standardized approach would streamline data access, enable efficient data integration into scientific workflows, and foster collaboration across research communities. The genomics data domain already has infrastructure to tackle this problem, the ELIXIR Beacon network (Rambla et al., 2022), which not only allows FAIR (Findable, Accessible, Interoperable, and Reusable) data access but also addresses data confidentiality while handling sensitive variants data. Based loosely on the same concepts, the 3D-Beacons Network (Varadi, Nair, et al., 2022) established an open collaboration among providers of macromolecular structure models to present model coordinates and meta-information in a standardized data format from all participating data resources on a unified platform.
By highlighting the importance of a standardized approach and showcasing the capabilities of 3D-Beacons, we aim to promote the adoption of unified data access mechanisms in structural biology to improve the findability and accessibility of structure data, making it FAIRer (Wilkinson et al., 2016). Establishing a standardized framework for accessing protein structure data will enhance scientific collaborations and accelerate discoveries in areas such as protein function elucidation, drug design, and understanding of the underlying mechanisms of complex biological processes.
Through the following protocols, we present 3D-Beacons as a solution to the challenges posed by the disparate nature of publicly available protein structure datasets. By leveraging the power of 3D-Beacons, researchers gain access to a standardized and comprehensive platform encompassing a wide range of structure types, from experimentally determined to predicted models. Importantly, the network provides access not only to the model files but also to essential metadata, such as confidence metrics. Furthermore, integrating data from diverse resources, including AlphaFold DB, Protein Data Bank in Europe (PDBe) (Armstrong et al., 2020), SWISS-MODEL (Waterhouse et al., 2018), PED, and other data resources, ensures that researchers have a holistic view of protein structures and can make informed decisions in their investigations. Basic Protocol 1 shows how to access and retrieve data and summaries using the 3D-Beacons API programmatically. Basic Protocol 2 describes how to search for structures and fetch them straight into a workflow using JalView. Basic Protocol 3 demonstrates how to find macromolecular structures using the sequence search functionality of 3D-Beacons. Finally, Basic Protocol 4 highlights how to facilitate access to data from providers interested in making their structures available through the 3D-Beacons Network. Extensive documentation is available online on the 3D-Beacons repository (https://github.com/3D-Beacons). In addition, we offer a complementary resource in the form of a notebook (https://colab.research.google.com/github/3D-Beacons/3D-Beacons/blob/main/Tutorials/Harnessing_3DBeaconsAPI.ipynb) with Python scripts to navigate 3D-Beacons, featuring a similar structure and protocols as the current article.
Basic Protocol 1: PROGRAMMATIC ACCESS TO THE 3D-BEACONS API
This protocol introduces the basic structure to use the 3D-Beacons Hub API. The 3D-Beacons platform (https://3d-beacons.org) offers programmatic access through its REST API, enabling users to retrieve individual entries and perform database searches. The comprehensive documentation of the 3D-Beacons Hub API, available at https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/#/, follows the OpenAPI 3 specifications and is presented in a Swagger representation. This documentation is a valuable resource, providing detailed information and guidelines for utilizing the 3D-Beacons API effectively, facilitating seamless integration and exploration of spatial biological data. For more information and access to the sample codes, a notebook is available at https://colab.research.google.com/github/3D-Beacons/3D-Beacons/blob/main/Tutorials/Harnessing_3DBeaconsAPI.ipynb).
Necessary Resources
Hardware
A computer capable of running Python code and with a stable Internet connection
Software
- Python 3 (https://www.python.org/downloads/) installed
- Pip (Python package installer)
1.Open a terminal and install the necessary Python libraries:
- pip install ijson, wget
2.To get all macromolecular structures for a single entity, create a new Python file, add the following sample code, and save the file.
- import ijson
- from urllib.request import urlopen
- Uniprot_ID = "P04637"
- WEBSITE_API = ".ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"
- r = ijson.parse(urlopen(f"{WEBSITE_API}{Uniprot_ID}.json"))
- structures = list(ijson.items(r, "structures.item", use_float=True))
- for structure in structures:
- print(structure)
3.To perform a model filter, create a new Python file, copy the sample code below, and save the file.
-
import ijson
-
from urllib.request import urlopen
-
WEBSITE_API = ".ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"
-
Uniprot_ID = "P04637"
-
model = "TEMPLATE-BASED"
-
r = ijson.parse(urlopen(f"{WEBSITE_API}{Uniprot_ID}.json"))
-
structures = list(ijson.items(r, "structures.item", use_float=True))
-
for structure in structures:
-
model_category = structure.get("summary", {}).get("model_category")
-
if model_category == model:
-
print(structure)
4.Retrieve and rank non-PDBe models based on average confidence scores using the following code:
-
import ijson
-
from urllib.request import urlopen
-
WEBSITE_API = "bi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"
-
Uniprot_ID = "P04637"
-
provider_filterout = "PDBe"
-
r = ijson.parse(urlopen(f"{WEBSITE_API}{Uniprot_ID}.json"))
-
structures = list(ijson.items(r, "structures.item", use_float=True))
-
filtered_structures = []
-
for structure in structures:
-
provider = structure.get("summary", {}).get("provider")
-
if provider != provider_filterout:
-
if structure.get("summary", {}).get("confidence_avg_local_score") is not None:
-
filtered_structures.append(structure)
-
sorted_structures = sorted(filtered_structures, key=lambda x: x.get("summary", {}).get("provider"), reverse=False)
-
top5_structures = sorted_structures[:5]
-
for structure in top5_structures:
-
print(structure)
5.Perform a model filter, sort results by coverage, and fetch the model with the highest coverage using the following code:
-
import ijson, wget
-
from urllib.request import urlopen
-
WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"
-
Uniprot_ID = "P04637"
-
model = "TEMPLATE-BASED"
-
response = urlopen(f"{WEBSITE_API}{Uniprot_ID}.json")
-
r = ijson.parse(response)
-
structures = list(ijson.items(r, "structures.item", use_float=True))
-
structures.sort(key=lambda x: x.get("summary", {}).get("coverage", 0), reverse=True)
-
highest_coverage_structure = None
-
for structure in structures:
-
model_category = structure.get("summary", {}).get("model_category")
-
if model_category == model:
-
highest_coverage_structure = structure
-
break
-
if highest_coverage_structure is not None:
-
print(highest_coverage_structure)
-
model_download = highest_coverage_structure.get("summary", {}).get("model_identifier")
-
for structure in structures:
-
model = structure.get("summary", {}).get("model_identifier")
-
if model == model_download:
-
model_url = structure.get("summary", {}).get("model_url")
-
wget.download(model_url)
6.To filter by provider and fetch the highest-resolution experimental structures from the PDB, create a new Python file, copy the sample code below, and run it.
-
import ijson, wget
-
from urllib.request import urlopen
-
WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/"
-
Uniprot_ID = "P04637"
-
provider_search = "PDBe"
-
resolution_search = 2
-
r = ijson.parse(urlopen(f"{WEBSITE_API}{Uniprot_ID}.json"))
-
structures = list(ijson.items(r, "structures.item", use_float=True))
-
high_resolution_structures = []
-
for structure in structures:
-
provider = structure.get("summary", {}).get("provider")
-
resolution = structure.get("summary", {}).get("resolution")
-
if provider == provider_search and resolution is not None and resolution < resolution_search:
-
Append the structure to the list without assigning the result back to the list
-
high_resolution_structures.append(structure)
-
for structure in high_resolution_structures:
-
model_url = structure.get("summary", {}).get("model_url")
-
wget.download(model_url)
-
print("Downloading:", model_url)
7.Retrieve Ensembl summary via 3D-Beacons by creating a new Python file, copying the sample code below, and running it.
-
import ijson, wget
-
from urllib.request import urlopen
-
WEBSITE_API = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/ensembl/summary/"
-
ENSEMBL_ID = "ENSG00000288864"
-
r = ijson.parse(urlopen(f"{WEBSITE_API}{ENSEMBL_ID}.json"))
-
ensembls = ijson.items(r, "uniprot_mappings.item", use_float=True)
-
for ensembl in ensembls:
-
print(ensembl)
Basic Protocol 2: SEQUENCE-BASED SEARCH USING THE 3D-BEACONS API
The 3D-Beacons Network has introduced Sequence Similarity Search functionality, which allows users to query the network using the amino acid sequence of a protein. It is important to note that the Sequence Similarity Search option only accepts standard amino acids and does not support DNA or RNA sequences. The Sequence Similarity Search option available through the network uses the Basic Local Alignment Search Tool (BLAST) (Altschul et al., 1990) to find regions of sequence similarity by aligning them with a query sequence. This alignment process allows for the statistical assessment of the degree of similarity between the query sequence and sequences in the network. By evaluating the match between the network and query sequence, valuable insights into the structure, function, and evolutionary aspects can be obtained, thus facilitating targeted and systematic exploration of protein structures.
The protocol presented below illustrates the process of performing a sequence-based query on the 3D-Beacons Network, employing the POST and GET methods. In this protocol, the POST method is used to transmit data from the client to the server, whereas the GET method is employed to obtain the results from the server.
Necessary Resources
- See Basic Protocol 1.
1.Open a terminal and install the necessary Python libraries:
- pip install ijson, wget
2.Get all the models in the 3D-Beacons Network that aligns with the query by creating a new Python file, copying the sample code below, and running it.
-
import requests
-
import ijson
-
POST_WEBSITE = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/sequence/search"
-
GET_WEBSITE = "https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/sequence/result"
-
query_sequence = {"sequence": "MNMLVINGTPRKHGRTRIAASYIAALYHTA"}
-
response = requests.post(POST_WEBSITE, json=query_sequence)
-
if response.status_code == 200:
-
print("POST request successful")
-
job_id = response.json()["job_id"]
-
else:
-
print(f"POST request failed with status code {response.status_code}")
-
exit()
-
response = requests.get(f"{GET_WEBSITE}?job_id={job_id}")
-
if response.status_code == 200:
-
for item in ijson.items(response.content,"item"):
-
print(item)
-
else:
-
print(f"GET request failed with status code {response.status_code}")
Basic Protocol 3: ACCESSING MACROMOLECULES FROM 3D-BEACONS WITH JalView
This protocol shows how to search for 3D-Beacons models through JalView (Procter et al., 2021). JalView is a versatile and accessible program for Multiple Sequence Alignment (MSA) editing, visualization, and analysis. By integrating the 3D-Beacons Hub API with JalView, its users can explore and discover 3D models for protein alignments sourced from UniProt.
Necessary Resources
Hardware
A computer capable of supporting a web browser and an Internet connection
Software
JalView v2.11+ (https://www.jalview.org/download/) installed locally
1.Launch JalView.
2.Select sequence ID to find available structures on 3D-Beacons. To view available 3D structures for the currently selected set of sequences, open the pop-up menu of the Sequence ID panel and choose the “3D Structure Data…” option (Fig. 1).

3.Query the 3D-Beacons Network.

4.View and set filtering options for structures in 3D Beacons using the Structure Chooser.

5.Retrieve structure by selecting the desired structure from the Structure Chooser and then pressing the “Enter” key to retrieve and open it in JalView.
Basic Protocol 4: ENHANCING DATA ACCESSIBILITY THROUGH 3D-BEACONS
This protocol will introduce the basic navigational techniques needed to browse the 3D-Beacons website. It outlines the recommended steps for pushing and making data accessible through the 3D-Beacons Client server. This protocol serves as a guide for researchers and data providers to effectively contribute their data to the 3D-Beacons ecosystem. By adhering to this protocol, users can ensure seamless integration and discoverability of their datasets within the 3D-Beacons platform. The protocol covers the processes of data preparation, metadata description, data formatting, and the actual data upload to the 3D-Beacons Client server. It also highlights the recommended practices for ensuring data accessibility, including the use of standardized file formats, providing comprehensive metadata, and complying with data sharing policies. Following this protocol will enable data providers to maximize the visibility and impact of their 3D spatial datasets within the 3D-Beacons Network, fostering collaboration and knowledge exchange in the field of spatial biology and beyond.
To successfully process a model, both a PDB, PDBx/mmCIF, or modelCIF and a corresponding JSON file containing metadata mapping the model to a UniProt entry are required. It is essential to ensure that the related files have identical names, such as “HAT_1.pdb” and “HAT_1.json”. For this protocol, one model dataset is given within the repository: P38398_1jm7.1.A_1_103.pdb and P38398_1jm7.1.A_1_103.json.
Data providers who are interested in making their macromolecule structures available through the 3D-Beacons Network should contact the consortium to have their models added to the 3D-Beacons registry. For more details and step-by-step instructions, please refer to this documentation: https://github.com/3D-Beacons/3d-beacons-registry.
Necessary Resources
Hardware
A computer capable of supporting a web browser and an Internet connection
Software
- Python 3 (https://www.python.org/downloads/)
- Docker Compose (https://docs.docker.com/compose/install/)
1.To obtain the complete infrastructure to make structural models available, clone the 3D-Beacons Client repository with the following command and navigate to the working directory:
- mkdir -p./data/{pdb,mmcif,metadata,index}
- cp tests/data/pdb/P38398_1jm7.1.A_1_103.pdb./data/pdb/
- cp tests/data/metadata/P38398_1jm7.1.A_1_103.json./data/metadata/
2.Generate the necessary directories:
- mkdir -p./data/{pdb,mmcif,metadata,index}
- cp tests/data/pdb/P38398_1jm7.1.A_1_103.pdb./data/pdb/
- cp tests/data/metadata/P38398_1jm7.1.A_1_103.json./data/metadata/
3.Set up the local environment:
- a. Copy the provided example file to the working directory.
- b. Open the file and update the variables “MONGO_PASSWORD” and “PROVIDER”.
- cp .env.example .env
- nano .env
4.Start docker containers:
- docker-compose up -d
5.Process the model PDB files:
- docker-compose exec cli snakemake --cores = 2
6.Perform database verification:
- curl -X' GET' \
- ' http://localhost/uniprot/summary/P38398.json' \ -H 'accept: application/json '
COMMENTARY
Background Information
3D-Beacons is an open collaboration that addresses the challenges of finding, accessing, and integrating all relevant macromolecular structure models from diverse providers. By establishing a standardized framework, 3D-Beacons offers researchers a unified platform for accessing meta-information and model coordinates from experimentally determined structures, ab-initio models, template-based models, and conformational ensembles. The network links data from multiple providers (Table 1). This collaborative effort ensures that a wide range of protein structure data is available in a standardized format, facilitating seamless integration into scientific workflows.
Data provider | Model category | Number of structures |
---|---|---|
AlphaFill | Template based | 995,411 |
AlphaFold DB | Ab initio | 214,684,311 |
HegeLab | Ab initio | 18 |
isoform.io | Ab initio | 237,275 |
ModelArchive | Ab initio/template based | 616,917 |
PDBe | Experimentally determined | 217,387 |
PED | Conformation ensembles | 305 |
SASBDB | Experimentally determined | 4073 |
SWISS-MODEL repository | Template based | 2,570,296 |
Through 3D-Beacons, researchers gain access to a comprehensive repository that combines the expertise and resources of multiple providers. For instance, experimentally determined structures offer valuable insights into the three-dimensional arrangements of proteins, and ab-initio models provide predictions based on computational algorithms. However, conformational ensembles capture the flexibility and dynamics of protein structures, enhancing our understanding of their functional properties. By incorporating data from diverse providers, 3D-Beacons offers a rich and varied collection of structure models, enabling researchers to explore different perspectives and uncover novel insights into protein structure and function.
Critical Parameters
The 3D-Beacons Network supports API endpoints keyed on the following information:
- UniProt accessions
- Protein sequences (i.e., sequences of one-letter amino acid codes)
- Ensembl identifiers (IDs start with ENS for Ensembl and then a G for gene).
Troubleshooting
Table 2 displays the response codes of the API and actions that can be taken to mitigate their effects. Requests for clarifications or reporting new errors can be made by contacting pdbekb_help@ebi.ac.uk.
Problem/response code | Possible cause | Solution |
---|---|---|
202 |
|
Please wait until the sequence search run completes. It can take 5-10+ min. |
400 |
|
1. Please check that the input UniProt accession is correct. 2. Please check your input sequence and retry the submission. 3. Please check if your job identifier is correct. |
404 | Not found - No results found for the given request | There may be no results for a specific UniProt accession or protein sequence |
500 | Internal server error | This error might be due to scheduled maintenance or, rarely, technical issues. Please try again later. If the issue persists, please email pdbekb_help@ebi.ac.uk. |
Understanding Results
The 3D-Beacons Hub API responses return JSON objects that all modern programming and scripting languages can parse. Throughout the examples presented here, we demonstrate how to parse the JSON responses using Python.
To more easily understand the JSON response, we advise reviewing the 3D-Beacons API specification available at Apiary: https://3dbeacons.docs.apiary.io/#. This interactive documentation shows the latest released specification and defines every field, including their types, ranges, and examples. Previous versions of the specification are available from GitHub: https://github.com/3D-Beacons/3d-beacons-specifications.
The key information from the responses is the URLs to the model coordinate files captured in the “model_url” field. All the other fields describe the metadata associated with the model, from quality metrics to species and sequence information.
Time Considerations
The 3D-Beacons Hub API responses vary based on the input type. Generally, API endpoints keyed on unique identifiers, such as UniProt accessions, will return responses in seconds, whereas the sequence-based search might take up to 10 to 15 min.
Acknowledgments
The 3D-Beacons infrastructure was initially funded by the BBSRC grant BB/S020071/1, and its continued development and maintenance are funded by Wellcome Trust 223739/Z/21/Z. We also acknowledge funding from Google DeepMind, which supports the creation of training materials.
Open access funding enabled and organized by Projekt DEAL.
Author Contributions
Paulyna Magana : Software; visualization; writing—original draft; writing—review and editing. Sreenath Nair : Software; writing—review and editing. Mihaly Varadi : Project administration; Supervision; writing—original draft; writing—review and editing. Sameer Velankar : Conceptualization; funding acquisition; writing—review and editing.
Conflict of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
Documentation of the 3D-Beacons Hub API is available at https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/. The specification of the data exchange format is available at https://3dbeacons.docs.apiary.io/#. The code base of the 3D-Beacons client, shown in Basic Protocol 4, is available at https://github.com/3D-Beacons/3d-beacons-client. The Jupyter notebook accompanying the protocols shown here is available at https://colab.research.google.com/github/3D-Beacons/3D-Beacons/blob/main/Tutorials/Harnessing_3DBeaconsAPI.ipynb. The MSA for use on JalView is available at https://raw.githubusercontent.com/3D-Beacons/3D-Beacons/main/Tutorials/AA_MSAkinase.fasta.
Literature Cited
- Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology , 215(3), 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
- Armstrong, D. R., Berrisford, J. M., Conroy, M. J., Gutmanas, A., Anyango, S., Choudhary, P., Clark, A. R., Dana, J. M., Deshpande, M., Dunlop, R., Gane, P., Gáborová, R., Gupta, D., Haslam, P., Koča, J., Mak, L., Mir, S., Mukhopadhyay, A., Nadzirin, N., … Velankar, S. (2020). PDBe: Improved findability of macromolecular structure data in the PDB. Nucleic Acids Research , 48(D1), D335–D343. https://doi.org/10.1093/nar/gkz990
- Cunningham, F., Allen, J. E., Allen, J., Alvarez-Jarreta, J., Amode, M. R., Armean, I. M., Austine-Orimoloye, O., Azov, A. G., Barnes, I., Bennett, R., Berry, A., Bhai, J., Bignell, A., Billis, K., Boddu, S., Brooks, L., Charkhchi, M., Cummins, C., da Rin Fioretto, L., … Flicek, P. (2022). Ensembl 2022. Nucleic Acids Research , 50(D1), D988–D995. https://doi.org/10.1093/nar/gkab1049
- Ghafouri, H., Lazar, T., del Conte, A., Tenorio Ku, L. G., PED Consortium, Tompa, P., Tosatto, S. C. E., & Monzon, A. M. (2024). PED in 2024: Improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Research , 52(D1), D536–D544. https://doi.org/10.1093/nar/gkad947
- Hekkelman, M. L., de Vries, I., Joosten, R. P., & Perrakis, A. (2021). AlphaFill: Enriching the AlphaFold models with ligands and co-factors (p. 2021.11.26.470110). bioRxiv. https://doi.org/10.1101/2021.11.26.470110 bioRxiv
- Kikhney, A. G., Borges, C. R., Molodenskiy, D. S., Jeffries, C. M., & Svergun, D. I. (2020). SASBDB: Towards an automatically curated and validated repository for biological scattering data. Protein Science , 29(1), 66–75. https://doi.org/10.1002/pro.3731
- Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science , 379(6637), 1123–1130. https://doi.org/10.1126/science.ade2574
- Procter, J. B., Carstairs, G. M., Soares, B., Mourão, K., Ofoegbu, T. C., Barton, D., Lui, L., Menard, A., Sherstnev, N., Roldan-Martinez, D., Duce, S., Martin, D. M. A., & Barton, G. J. (2021). Alignment of biological sequences with Jalview. Methods in Molecular Biology , 2231, 203–224. https://doi.org/10.1007/978-1-0716-1036-7_13
- Rambla, J., Baudis, M., Ariosa, R., Beck, T., Fromont, L. A., Navarro, A., Paloots, R., Rueda, M., Saunders, G., Singh, B., Spalding, J. D., Törnroos, J., Vasallo, C., Veal, C. D., & Brookes, A. J. (2022). Beacon v2 and Beacon networks: A ‘lingua franca’ for federated data discovery in biomedical genomics, and beyond. Human Mutation , 43(6), 791–799. https://doi.org/10.1002/humu.24369
- Sommer, M. J., Cha, S., Varabyou, A., Rincon, N., Park, S., Minkin, I., Pertea, M., Steinegger, M., & Salzberg, S. L. (2022). Structure-guided isoform identification for the human transcriptome. eLife , 11, e82556. https://doi.org/10.7554/eLife.82556
- Tordai, H., Suhajda, E., Sillitoe, I., Nair, S., Varadi, M., & Hegedus, T. (2022). Comprehensive collection and prediction of ABC transmembrane protein structures in the AI era of structural biology. International Journal of Molecular Sciences , 23(16), 8877. https://doi.org/10.3390/ijms23168877
- Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., Yuan, D., Stroe, O., Wood, G., Laydon, A., Žídek, A., Green, T., Tunyasuvunakool, K., Petersen, S., Jumper, J., Clancy, E., Green, R., Vora, A., Lutfi, M., … Velankar, S. (2022). AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research , 50(D1), D439–D444. https://doi.org/10.1093/nar/gkab1061
- Varadi, M., Nair, S., Sillitoe, I., Tauriello, G., Anyango, S., Bienert, S., Borges, C., Deshpande, M., Green, T., Hassabis, D., Hatos, A., Hegedus, T., Hekkelman, M. L., Joosten, R., Jumper, J., Laydon, A., Molodenskiy, D., Piovesan, D., Salladini, E., … Velankar, S. (2022). 3D-Beacons: Decreasing the gap between protein sequences and structures through a federated network of protein structure data resources. GigaScience , 11, giac118. https://doi.org/10.1093/gigascience/giac118
- Velankar, S., Burley, S. K., Kurisu, G., Hoch, J. C., & Markley, J. L. (2021). The protein data bank archive. Methods in Molecular Biology , 2305, 3–21. https://doi.org/10.1007/978-1-0716-1406-8_1
- Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F. T., de Beer, T. A. P., Rempfer, C., Bordoli, L., Lepore, R., & Schwede, T. (2018). SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Research , 46(W1), W296–W303. https://doi.org/10.1093/nar/gky427
- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data , 3(1), 160018. https://doi.org/10.1038/sdata.2016.18