McCarthy, William J.; Nightingale, Luke; Biggs, George S.; Cawood, Emma E.; Dudley-Fraser, Jane; Werner, Thilo; Riziotis, Ioannis G.; Pillay, Timesh D.; Lambert, Hugues; Pogány, Peter; van der Zouwen, Antonie J.; Pettinger, Jonathan; Boulton, Simon J.; House, David; Skehel, J. Mark; Bush, Jacob T.; Rittinger, Katrin
A significant barrier in translating biological insights into therapeutic targets is the limited availability
of high-quality chemical probes for target validation. Chemoproteomic profiling of covalent small
molecules has dramatically accelerated the discovery of ligandable binding sites across the human
proteome. However, the limited specificity and selectivity of initial hits often hinders their
effectiveness in evaluating the functional consequences of ligand binding. To address this challenge,
we developed a data-driven strategy that integrates chemoproteomic profiling of enantiomerically
pure pairs of cysteine-targeting electrophilic fragments (enantiopairs) with machine learning (ML) to
design fragment libraries optimised for proteome-wide selectivity. ML-guided library evolution
produced a second generation enantiopair library markedly enriched in selective and stereospecific
interactions relative to the first generation library. This approach identified high-quality
enantioselective binding events with 205 cysteines, the majority not previously liganded. These
findings establish a general framework for designing covalent fragment libraries to deliver higherquality initial hits.