What is it about?
Metabonomics has been applied for predictive modeling in diverse fields of research ranging from toxicology and nutrition to parasitology and molecular epidemiology,1 including disease diagnosis and therapy monitoring. Though vast developments in metabonomics over the last decade were observed particularly in modelling methods, further contributions are needed in these areas, including enhanced mathematical analysis. Currently, partial least squares regression (PLSR) and its variants are the preferred modelling approaches in metabonomics due to their flexibility and accuracy in catering to the complexity of these data including their suitability in handling the issue of multicollinearity. However, PLSR typically requires a large training sample size and a large number of indicators of each latent variable which may be disadvantageous for rare metabonomic datasets such as those of rare diseases. In addition, it would be of interest to reduce PLSR's training complexity and hence the processing time when dealing with metabonomics data such as gas chromatography/mass spectrometry (GC/MS) total ion chromatograms (TICs) which tend to be very large. A common method to reduce the computational complexity of classification in general is to use dimensionality reduction approaches prior to classification. Dimensionality reduction techniques can be broadly divided into variable selection and transformation. Variable selection approaches can identify the significant variables but may not perform well when the data are highly correlated. Transformation based approaches tend to combine variables without selecting a subset of significant variables. There are many different dimensional reduction approaches and this increases the complexity of finding an optimum dimensionality reduction approach for PLSR and its variants for each metabonomics dataset. Hence it would be useful to develop a simpler modelling approach to address these problems. A study has shown variable ranking via the correlation based feature selection8 which uses the magnitude of the Pearson's correlation coefficient between the class values and variable values for each feature to be promising. In this study, we extended from correlation based feature selection, and created a new automated Pearson's correlation change classification (APC3) technique which have high computational efficiency. The aim of this study is to evaluate the performance of APC3 by comparing it with other classification algorithms, classification algorithms in combination with transformation techniques and classification algorithms in combination with variable selection approaches using TICs of binominal GC/MS data.
Featured Image
Why is it important?
A fully automated and computationally efficient Pearson's correlation change classification (APC3) approach is proposed and shown to have overall comparable performance with both an average accuracy and an average AUC of 0.89 ± 0.08 but is 3.9 to 7 times faster, easier to use and have low outlier susceptibility in contrast to other dimensional reduction and classification combinations using only the total ion chromatogram (TIC) intensities of GC/MS data. The use of only the TIC permits the possible application of APC3 to other metabonomic data such as LC/MS TICs or NMR spectra. A RapidMiner implementation is available for download at http://padel.nus.edu.sg/software/padelapc3.
Perspectives
Read the Original
This page is a summary of: An automated Pearson's correlation change classification (APC3) approach for GC/MS metabonomic data using total ion chromatograms (TICs), The Analyst, January 2013, Royal Society of Chemistry,
DOI: 10.1039/c3an00048f.
You can read the full text:
Contributors
The following have contributed to this page