Bryn Marie Reimer, Ernest Awoonor-Williams, Andrei A. Golosov, and Viktor Hornak
Journal of Chemical Information and Modeling 2025
Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer’s toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., “ligandable” cysteines). We explored myriad physicochemical features (pKa, solvent exposure, residue electrostatics, etc.) and protein–ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F1 score of 0.73 on held-out test sets. When tested on a small sample of holo proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.