Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL

Alex M. Clark; Sean Ekins

doi:10.1021/acs.jcim.5b00144

What is it about?

We use open source fingerprints and a Bayesian algorithm to build thousands of computational models from data in a very big public dataset called ChEMBL. We demonstrate the cross validation of these models, make them openly accessible and demonstrate how they can be imported in to a mobile app and used for predictions.

Why is it important?

We are not aware of anyone using ChEMBL in this way with open source technologies and making the thousands of models accessible. In addition we describe a novel algorithm for detecting thresholds for active / inactive in continuous data. Finally we access the effect of folding on the fingerprints.

Perspectives

The paper follows up on the previous description of open source Bayesian models, adding some more detail about validation and calibration techniques. It describes a method for partitioning the ChEMBL database of bioactivity data into >2000 datasets, and an algorithm for automatically detecting a threshold for classifying as active/inactive, which is required for Bayesian algorithms. Each of the datasets was used for model building, in order to evaluate the technique. The results are made available, as well as a description of the method.
Alex Michael Clark
Molecular Materials Informatics

This page is a summary of: Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL, Journal of Chemical Information and Modeling, June 2015, American Chemical Society (ACS),
DOI: 10.1021/acs.jcim.5b00144.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page

Creating open source Bayesian models with a big dataset

What is it about?

Why is it important?

Perspectives

Resources

Accompanying Data

Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models

Mining Big datasets to create and validate machine learning models

Bigger Data to Increase Drug Discovery

Open Source Bayesian Models (X2)

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Creating open source Bayesian models with a big dataset

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Accompanying Data

Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models

Mining Big datasets to create and validate machine learning models

Bigger Data to Increase Drug Discovery

Open Source Bayesian Models (X2)

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management