25.3 Model-based integration
Model-based integration builds intermediate models from each omic layer and then builds a final model combining all intermediate models. An advantage of this approach is that it allows merging multiple omic types that have been collected in different sets of sampling units, if the outcome of interest is the same across datasets (e.g. specific disease). On the other hand, since the models are first built independently for different omic layers, these methods may fail to capture interactions between features belonging to different omic datasets, i.e. if there are two features belonging to different omic layers that affect the outcome, but only through their interaction and not when evaluated independently. Therefore, the model-based integration is particularly suitable when the different omic datasets are extremely heterogeneous (even collected from different samples), and concatenating or transforming them to a common intermediate form is not possible.
Model-based unsupervised integration methods include Format Concept Analysis (FCA) consensus clustering [83], Bayesian consensus clustering (BCC) [84] or Perturbation Clustering for Data Integration and Disease Subtyping (PINS+) [85]. Network-based methods such as Lemon Tree [86] or Similarity Network Fusion (SNF) [87] are also available for association analysis. Model-based supervised integration can use a variety of frameworks for model development, including majority-based voting [88], hierarchical classifiers [89], ensemble-based approaches such as XGBoost [90] or DL methods [91]. Multi-omic data integration efforts such as ATHENA (Analysis Tool for Heritable and Environmental Network Associations) [92] or MOSAE (Multi-omics Supervised Autoencoder) [93] use model-based integration for disease prediction by combining a variety of modelling frameworks and algorithms.