Ernest Awoonor-Williams*, Andrei A. Golosov, and Viktor Hornak
Accurate estimation of the pKa’s of cysteine residues in proteins could inform targeted approaches in hit discovery. The pKa of a targetable cysteine residue in a disease-related protein is an important physiochemical parameter in covalent drug discovery, as it influences the fraction of nucleophilic thiolate amenable to chemical protein modification. Traditional structure-based in silico tools are limited in their predictive accuracy of cysteine pKa’s relative to other titratable residues. Additionally, there are limited comprehensive benchmark assessments for cysteine pKa predictive tools. This raises the need for extensive assessment and evaluation of methods for cysteine pKa prediction. Here, we report the performance of several computational pKa methods, including single-structure and ensemble-based approaches, on a diverse test set of experimental cysteine pKa’s retrieved from the PKAD database. The dataset consisted of 16 wildtype and 10 mutant proteins with experimentally measured cysteine pKa values. Our results highlight that these methods are varied in their overall predictive accuracies. Among the test set of wildtype proteins evaluated, the best method (MOE) yielded a mean absolute error of 2.3 pK units, highlighting the need for improvement of existing pKa methods for accurate cysteine pKa estimation. Given the limited accuracy of these methods, further development is needed before these approaches can be routinely employed to drive design decisions in early drug discovery efforts.