Publication: MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes
Open/View Files
Date
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
We propose a multiple descriptor multiple kernel (MultiDK) method for efficientmolecular discovery using machine learning. We show that the MultiDK method im-proves both the speed and the accuracy of molecular property prediction. We applythe method to the discovery of electrolyte molecules for aqueous redox flow batteries.Usingmultiple-type - as opposed to single-type - descriptors, more relevant featuresfor machine learning can be obtained. Following the principle of the ’wisdom of thecrowds’, the combination of multiple-type descriptors significantly boosts predictionperformance. Moreover, MultiDK can exploit irregularities between molecular struc-ture and property relations better than the linear regression method by employingmultiple kernels - more than one kernel functions for a set of the input descriptors.The multiple kernels consist of the Tanimoto similarity function and a linear kernelfor a set of binary descriptors and a set of non-binary descriptors, respectively. UsingMultiDK, we achieve average performance ofr2= 0.92 with a set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility andapply it to solubility estimation of quinone molecules with ionizable functional groupsas strong candidates of flow battery electrolytes.