Matthew E. H. White, Jesús Gil, Edward W. Tate
bioRxiv 2022.12.12.518491;
doi: https://doi.org/10.1101/2022.12.12.518491
Covalent drug discovery, in particular targeting reactive cysteines, has undergone a resurgence over the past two decades, demonstrated by recent clinical successes of covalent inhibitors for high-priority cancer targets. Reactive cysteine profiling, first pioneered by the Cravatt lab, has emerged in parallel as a powerful approach for proteome-wide on- and off-target profiling. Thus far however, structural analysis of liganded cysteines has been restricted to experimentally determined protein structures. We combined AlphaFold-predicted amino acid side chain accessibilities for >95% of the human proteome with a meta-analysis of thirteen public cysteine profiling datasets, totalling 40,070 unique cysteine residues, revealing accessibility biases in sampled cysteines primarily dictated by warhead chemistry. Analysis of >3.5 million cysteine-fragment interactions further suggests that exposed cysteine residues are preferentially targeted by elaborated fragments and drug-like compounds. We finally propose a framework for benchmarking coverage of ligandable cysteines in future cysteine profiling approaches, considering both selectivity for high-priority residues and quantitative depth. All analysis and produced resources (freely available at www.github.com/TateLab) are readily extendable to reactive amino acids beyond cysteine, and related questions in chemical biology.