What is it about?

Over the past decade there has been a growth in the number of public chemistry and bioactivity databases. Some of these have tens of millions of molecules and millions of bioactivity data. These sources are immensely valuable for data mining and machine learning modeling. This chapter covers databases like BindingDB, PubChem, ChEMBL, GtoPdb, CDD Vault etc. We discuss issues of data quality and how the data may be used.

Featured Image

Why is it important?

This chapter is important because these massive datasets are used to build models that can be useful in decision making. However these models are only sporadically tested or evaluated with external datasets so its unclear as to the utility of them. We also propose areas that could be improved such as data curation and correction of errors.

Perspectives

Each author brought there own perspective as developers of databases, curators, software developers etc. We also provide an update of the Bayesian models created with ChEMBL.

Dr Sean Ekins
Collaborations in Chemistry

Read the Original

This page is a summary of: Chapter 16. Small-molecule Bioactivity Databases, January 2016, Royal Society of Chemistry,
DOI: 10.1039/9781782626770-00344.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page