Large, chemically diverse dataset of log P measurements for benchmarking studies

Martel, Sophie; Gillerat, Fabrice; Carosati, Emanuele; Maiarelli, Daniele; Tetko, Igor V.; Mannhold, Raimund; Pierre Alain Carrupt,

doi:10.1016/j.ejps.2012.10.019

Lipophilicity is a crucial parameter in drug development since it impacts both ADME properties and target affinity of drug candidates. In early drug discovery stage, accurate tools for log P prediction are highly desired. Many calculation methods were developed to aid pharmaceutical scientists in drug research; however almost all suffer from insufficient accuracy and variation of performance in several regions of the chemical space associated with new chemical entities. The low redictive power of existing software packages can be explained by limited availability and/or variable quality of experimental log P values associated with training set used, which stem from various protocols and poorly cover chemical space. In this study, a dataset of 1000 diverse test compounds out of 4.5 million was generated; log P values of 759 purchasable compounds (46% non-ionizable, 30% basic, 17% acidic, 0.5% zwitterionic and 6.5% ampholytes) from this selected set were experimentally determined by UHPLC followed by UV detection or MS detection when necessary. Finally, a data collection of 707 validated log P values ranging from 0.30 to 7.50 is now available for benchmarking of existing and development of new approaches to predict octanol/water partition coefficients of chemical compounds.