Datasets
Both PTM and variant data were procured for an initial set of 20,421 reviewed (Swiss-Prot) canonical proteins from UniProtKB. The Missense3D-PTMdb contains information for 203,775 PTMs and 11,544,303 human missense variants. There are 199,813 PTM-harbouring residues (i.e., residues with at least one type of PTM) across 15,828 proteins, and 7,039,687 variant-harbouring residues (i.e., residues with at least one variant) across 19,264 proteins. For 20,235 proteins, an AlphaFold structure model is available and displayed on the web server.
Variant data was extracted from a UniProt-procured file containing all Homo sapiens protein altering variants, downloaded from the UniProt Variant FTP site. The file contained both UniProtKB manually reviewed natural variants and additional protein-altering variants imported from publicly available databases including: Ensembl Variation, ClinVar, 1000 Genomes, exAC, NCI-TCGA, ESP, COSMIC, and ClinGen.
A total of 1,061,409 aggregate variant-condition accession records (RCVs) from ClinVar are included in Missense3D-PTMdb for those analysed variants. 332,226 (31.3%) of these accession records include associated clinical significance, and 713,370 (67.2%) contain disease annotations; these yield 185,730 and 438,971 missense variants with ClinVar documented classifications and diseases, respectively. PTM data was extracted from the UniProt entry API and UniProt proteomics ptm API the further enriched with high quality PTM data, currently from Ochoa et al. 2019 and Rega et al. 2025.
PTM-Variant Sites
Among these PTMs, phosphorylation is the most prevalent, accounting for 130,269 (63.9%) of PTM records. Glycosylation, Acetylation, sumoylation, and methylation are also well represented. By mapping PTM and variant data based on their sequence positions, we identified 127,106 residues across 14,014 proteins that harbour both a PTM and a missense variant. This represents 63.6% of all PTM-harbouring residues. Consistent with the overall PTM distribution, phosphorylation is the most common PTM type with variants, accounting for 84,863 (66.8%) PTM–variant sites.
