TY - JOUR
T1 - Visualization of statistically processed LC-MS-based metabolomics data for identifying significant features in a multiple-group comparison
AU - Pan, Yu Yi
AU - Chen, Yuan Chih
AU - Chang, William Chih Wei
AU - Ma, Mi Chia
AU - Liao, Pao Chi
N1 - Funding Information:
This work was supported by the Ministry of Science and Technology, Taiwan [grant number MOST108-2113-M-006-008 and MOST109-2113-M-006-015 ]. The authors acknowledge the mass spectrometry analysis supported by the Metabolomics Core Facility, Scientific Instrument Center at Academia Sinica, NTU Consortia of Key Technologies and NTU Instrumentation center, and Instrument Center of National Cheng Kung University.
Funding Information:
In choosing significant features, machine learning algorithms are used to determine the cutting point of the x-axis, which is an objective idea, these algorithms can merge domain knowledge. We discuss two types of cutting point decisions: unsupervised methods and supervised methods. When the metabolites of interest have not been reported previously and remain unidentified, an unsupervised method can be used to decide the cutting point. When the significant metabolites have been found by reviewed literature or reports, they can be regarded as identified metabolites, a better cutting point can be determined by some popular machine learning algorithms, such as logistic regression (LR) [ 17?19], a support vector machine (SVM) [20,21], an artificial neural network (ANN) [ 22?24] or a random forest (RF) [25,26]. There are two real cases in this study of the performance of visualization plots: one case uses multiple groups for different concentrations, and the other case uses multiple groups for different time points. Therefore, the objective of this study is to propose a new visualization plot to apply to two real cases. The x-axis of the new visualization plot relaxes the number of groups by hypothesis testing of multiple groups conducted by parametric and nonparametric methods. The parametric method is based on ANOVA and Welch's ANOVA, the nonparametric method is based on the KW test, and the cut-off point of the x-axis is determined by a machine learning method.This work was supported by the Ministry of Science and Technology, Taiwan [grant number MOST108-2113-M-006-008 and MOST109-2113-M-006-015]. The authors acknowledge the mass spectrometry analysis supported by the Metabolomics Core Facility, Scientific Instrument Center at Academia Sinica, NTU Consortia of Key Technologies and NTU Instrumentation center, and Instrument Center of National Cheng Kung University.
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/3/15
Y1 - 2021/3/15
N2 - Analyzing and presenting data from multiple groups are much more informative than that from two groups. However, common tools such as S plot and volcano plot are only available for identifying the significant features between two groups and are restricted to multiple-group comparisons. This study proposed novel visualization plots which not only overcame the restrictions of the above methods but also utilized the p values of multiple tests as the x-axis. The novel visualization plots included a parametric method and a nonparametric method. The parametric method was a combination of an analysis of variance and Welch's analysis of variance; the nonparametric method used the Kruskal-Wallis test. During the selection of significant features, machine learning algorithms were used to determine the cutting points of the x-axis. As a proof of concept, the real data from the experiments of 4-MeO-α-PVP metabolites and fish spoilage metabolomics were illustrated via our visualization method. The results showed that the novel visualization plots were much efficiently presented to identify significant metabolites in multiple-group comparisons. Especially, the positive predicted values of the nonparametric method and the cutting points determined by logistic regression were higher than those of other machine learning algorithms in determining the cutting points for multiple groups.
AB - Analyzing and presenting data from multiple groups are much more informative than that from two groups. However, common tools such as S plot and volcano plot are only available for identifying the significant features between two groups and are restricted to multiple-group comparisons. This study proposed novel visualization plots which not only overcame the restrictions of the above methods but also utilized the p values of multiple tests as the x-axis. The novel visualization plots included a parametric method and a nonparametric method. The parametric method was a combination of an analysis of variance and Welch's analysis of variance; the nonparametric method used the Kruskal-Wallis test. During the selection of significant features, machine learning algorithms were used to determine the cutting points of the x-axis. As a proof of concept, the real data from the experiments of 4-MeO-α-PVP metabolites and fish spoilage metabolomics were illustrated via our visualization method. The results showed that the novel visualization plots were much efficiently presented to identify significant metabolites in multiple-group comparisons. Especially, the positive predicted values of the nonparametric method and the cutting points determined by logistic regression were higher than those of other machine learning algorithms in determining the cutting points for multiple groups.
UR - http://www.scopus.com/inward/record.url?scp=85100813176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100813176&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2021.104271
DO - 10.1016/j.chemolab.2021.104271
M3 - Article
AN - SCOPUS:85100813176
SN - 0169-7439
VL - 210
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
M1 - 104271
ER -