UniProt entries describing protein sequence are regularly updated as researchers collect new sequencing evidence.
Protein structure files (modelled or experimental) often lag behind these sequence updates. Sequences within structure files might not fully represent assigned UniProt entries and the residue numbers might not match the UniProt numbering.
Mapping UniProt annotations (i.e. variants, domains, binding sites) to outdated structure files may result in errors and hinder downstream structural analysis.
3DSeq-Check checks your structure file against the given UniProt entry. 3DSeq-Check gives maps your structure to the current sequence of the UniProt entry and gives an overview of the mapping.
UniProt ID is the first required input to StructureCheck-UP. The UniProt ID should correspond to the entry that your structure describes, it is used to fetch the current up-to-date sequence that is used as a reference sequence in the StructureCheck-UP.
Structure Source is the second required input to StructureCheck-UP. The structure source should correspond to the structure you want to check-UP. There are currently two supported ways to supply a structure:
To fetch the up-to-date reference sequence, we use the UniProt REST API for the entry with the input UniProt ID. The query sequence is extracted from the structure source, by extracting the residues that appear in the `ATOM` section of the structure file (and with the matching chain id, in the case of custom file input).
The pairwise sequence alignment of the reference and query sequences is performed using the EMBOSS implementation of the Needleman-Wunsch algorithm with the default parameters.
Resulting alignment of the query sequence (orginating from the input structure source) to the refernce sequence (the up-to-date UniProt sequence) is presented in the results page.
Resulting matching of the residue indices from the protein structure to the indices in the UniProt sequence is presented in the results page.
AlphaFoldDB currently (December 2024) stores a model of the UniProt entry Q9BRI3 that reflects version 1 of the Q9BRI3 sequence (changed in October 2022). Since then, an update to the UniProt entry led to a new version of the sequence. Our alignment reveals that version 2 of Q9BRI3 sequence contains ~50 new residues 'inserted' around position 100 of the sequence that was modelled for AlphaFoldDB.
Comparing the structure deposited in AlphaFoldDB (purple, left) and the model of the up-to-date sequence (green, center and right) we see notable differences in the fold. The inserted sequence chunk is highlighted in blue in the figure on the right.
Forthermore, mapping annotations such as Zn2+ binding sites to the outdated models (highlighted below with red spheres) might result in misleading structural views.
AlphaFoldDB currently (December 2024) stores a model of the UniProt entry H0Y7S4 that reflects version 2 of the H0Y7S4 sequence (changed in January 2024). Since then, an update to the UniProt entry led to a new version (version 3) of the sequence. Our alignment reveals that version 3 of H0Y7S4 sequence contains ~100 new residues 'inserted' at the beginning of the sequence that was modelled for AlphaFoldDB. In addition, the
In addition, the first two residues (methionine and valine) modelled for AlphaFoldDB have since been updated in the UniProt to proline and arginine. They are annotated as variations on our dashoard and represented with green rectangles.
Comparing the structure deposited in AlphaFoldDB (purple, left) and the model of the up-to-date sequence (green, center) we see a somewhat preserved fold with the inserted sequence chunk not aligned with the AlphaFoldDB structure (structure alignment, on the right).
Forthermore, mapping annotations such domain annotations to the outdated models (highlighted below in orange) might result in misleading structural views as the domains in the AlphaFoldDB outdated structure (left) are shifted in relation to the actual domains in the updated model (right).
Time flies in bioinformatics - the aging of the AlphaFold DB. Manuscript in preparation