What is it about?
Over the past decade there has been a growth in the number of public chemistry and bioactivity databases. Some of these have tens of millions of molecules and millions of bioactivity data. These sources are immensely valuable for data mining and machine learning modeling. This chapter covers databases like BindingDB, PubChem, ChEMBL, GtoPdb, CDD Vault etc. We discuss issues of data quality and how the data may be used.
Featured Image
Why is it important?
This chapter is important because these massive datasets are used to build models that can be useful in decision making. However these models are only sporadically tested or evaluated with external datasets so its unclear as to the utility of them. We also propose areas that could be improved such as data curation and correction of errors.
Perspectives
Read the Original
This page is a summary of: Chapter 16. Small-molecule Bioactivity Databases, January 2016, Royal Society of Chemistry,
DOI: 10.1039/9781782626770-00344.
You can read the full text:
Resources
Contributors
The following have contributed to this page