Datasets
Missense3D-PTMdb uses data from UniProt release UniProt Release 2026_01 and clinvar weekly release dated 2026-03-08
Both PTM and variant data were procured for an initial set of 20,431 reviewed (Swiss-Prot) canonical proteins from UniProtKB. The Missense3D-PTMdb contains information for 334,255 PTMs and 11,490,257 human missense variants. There are 294,902 PTM-harbouring residues (i.e., residues with at least one type of PTM) across 16,945 proteins, and 6,985,070 variant-harbouring residues (i.e., residues with at least one variant) across 19,361 proteins. For 20,291 proteins, an AlphaFold structure model is available and displayed on the web server.
Variant data was extracted from a UniProt-procured file containing all Homo sapiens protein altering variants, downloaded from the UniProt Variant FTP site. The file contained both UniProtKB manually reviewed natural variants and additional protein-altering variants imported from publicly available databases including: Ensembl Variation, ClinVar, 1000 Genomes, exAC, NCI-TCGA, ESP, COSMIC, and ClinGen.
A total of 2,010,633 aggregate variant-condition accession records (RCVs) from ClinVar are included in Missense3D-PTMdb for those analysed variants. 421,307 (21.0%) of these accession records include associated clinical significance, and 1,063,683 (52.9%) contain disease annotations; these yield 261,434 and 723,172 missense variants with ClinVar documented classifications and diseases, respectively. PTM data was extracted from the UniProt entry API and UniProt proteomics ptm API the further enriched with high quality PTM data, currently from Ochoa et al. 2019 and Rega et al. 2025.
This data was then enriched with conservation information from ProtVar generated using the ScoreCons alogrithm, as well as decryptM drug-PTM modulation data from ProteomicsDB.
PTM-Variant Sites
Among these PTMs, phosphorylation is the most prevalent, accounting for 133,386 (39.9%) of PTM records. Glycosylation, Acetylation, sumoylation, and methylation are also well represented as seen in figure a. By mapping PTM and variant data based on their sequence positions, we identified 169,443 residues across 15,291 proteins that harbour both a PTM and a missense variant. This represents 57.5% of all PTM-harbouring residues. Consistent with the overall PTM distribution, phosphorylation is the most common PTM type with variants, accounting for 84,906 (44.9%) PTM–variant sites as seen in figure b.