25.3 Model-based integration

Model-based integration builds intermediate models from each omic layer and then builds a final model combining all intermediate models. An advantage of this approach is that it allows merging multiple omic types that have been collected in different sets of sampling units, if the outcome of interest is the same across datasets (e.g. specific disease). On the other hand, since the models are first built independently for different omic layers, these methods may fail to capture interactions between features belonging to different omic datasets, i.e. if there are two features belonging to different omic layers that affect the outcome, but only through their interaction and not when evaluated independently. Therefore, the model-based integration is particularly suitable when the different omic datasets are extremely heterogeneous (even collected from different samples), and concatenating or transforming them to a common intermediate form is not possible.

Model-based unsupervised integration methods include Format Concept Analysis (FCA) consensus clustering [83], Bayesian consensus clustering (BCC) [84] or Perturbation Clustering for Data Integration and Disease Subtyping (PINS+) [85]. Network-based methods such as Lemon Tree [86] or Similarity Network Fusion (SNF) [87] are also available for association analysis. Model-based supervised integration can use a variety of frameworks for model development, including majority-based voting [88], hierarchical classifiers [89], ensemble-based approaches such as XGBoost [90] or DL methods [91]. Multi-omic data integration efforts such as ATHENA (Analysis Tool for Heritable and Environmental Network Associations) [92] or MOSAE (Multi-omics Supervised Autoencoder) [93] use model-based integration for disease prediction by combining a variety of modelling frameworks and algorithms.

Contents of this section were created by Iñaki Odriozola and Antton Alberdi.

References

83. Hristoskova A, Boeva V, Tsiporkova E. A formal concept analysis approach to consensus clustering of multi-experiment expression data. BMC Bioinformatics. 2014;15:151.

84. Lock EF, Dunson DB. Bayesian consensus clustering. Bioinformatics. 2013;29:2610–6.

85. Nguyen H, Shrestha S, Draghici S, Nguyen T. PINSPlus: A tool for tumor subtype discovery in integrated genomic data. Bioinformatics. 2019;35:2843–6.

86. Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with Lemon-Tree. PLoS Comput Biol. 2015;11:e1003983.

87. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11:333–7.

88. Drăghici S, Potter RB. Predicting HIV drug resistance with neural networks. Bioinformatics. 2003;19:98–107.

89. Bavafaye Haghighi E, Knudsen M, Elmedal Laursen B, Besenbacher S. Hierarchical classification of cancers of unknown primary using Multi-Omics data. Cancer Inform. 2019;18:1176935119872163.

90. Ma A, McDermaid A, Xu J, Chang Y, Ma Q. Integrative methods and practical challenges for Single-Cell multi-omics. Trends Biotechnol. 2020;38:1007–22.

91. Poirion OB, Chaudhary K, Huang S, Garmire LX. Multi-omics-based pan-cancer prognosis prediction using an ensemble of deep-learning and machine-learning models. medRxiv. 2020.

92. Holzinger ER, Dudek SM, Frase AT, Pendergrass SA, Ritchie MD. ATHENA: The analysis tool for heritable and environmental network associations. Bioinformatics. 2014;30:698–705.

93. Tan K, Huang W, Hu J, Dong S. A multi-omics supervised autoencoder for pan-cancer clinical outcome endpoints prediction. BMC Med Inform Decis Mak. 2020;20 Suppl 3:129.