Structure-Based Drug Design with Multi-Task Learning and Data Augmentation
Abstract
With rapid advances in machine learning methods and the availability of vast amounts of chemical data, structure-based drug design is at the dawn of a golden age. The tremendous successes of deep learning methods in the fields of natural language processing, speech recognition and computer vision have set the expectation for these emerging technologies to successfully target undruggable proteins and novel sites of well established pharmaceutical targets. In recent years, the scientific community reported excellent performance of deep-learning methods in various benchmarks for virtual high-throughput screening, QSAR, and ADMET tasks. Nevertheless follow up work often reveals that many of these methods fail to prospectively deliver the performance initially reported on retrospective benchmarks. These failures suggest that the described approaches are not generalizing as well as expected, and are instead overfitting to the training set or just cheating, i.e. finding exploits in the training and testing data sets that secure the supreme performance but with little practical value (also known as “Clever Hans” solutions in the machine learning community). In this work, we present a battery of benchmarks meant to detect and flag models that, despite their excellent retrospective performance, are likely to poorly perform when applied prospectively. Once pathological properties of these models are identified, we show how they can be systematically corrected through a combination of data-augmentation and multi-task learning. We use a data augmentation technique called “pose-negatives” -- where poor poses are used as negative data-points -- and multi-task learning that biases models towards physically plausible ones. The methods proposed in this work are general, work for both grid and graph-based convolutional neural network models and, when paired with the presented battery of benchmarks, set new community standards for the robustness of the models in prospective discovery campaigns.
-
Most popular related searches
Customer comments
No comments were found for Structure-Based Drug Design with Multi-Task Learning and Data Augmentation. Be the first to comment!