25.2 Transformation-based integration

In transformation-based integration, omic datasets are first transformed into an intermediate representation, typically a graph or a kernel matrix, and they are then merged before building the final model. This approach preserves the specific properties of each omic layer if they are transformed into appropriate intermediate representations, and a wide range of omic data can be combined as long as they share a unique identifier (i.e. a sample ID). Graph-based analyses have the advantage of easier interpretability and lower computational requirements whereas, overall, kernel-based methods provide higher predictive performance [70].

There are several methods available for transformation-based unsupervised analysis. Regularised Multiple Kernel Learning for Locality Preserving Projections (rMKL-LPP) [71] and PAMOGK [72] are examples of kernel- and graph-based methods that can be used for clustering. Meta-analytic SVM (Meta-SVM) [73] and NEighborhood based Multi-Omics clustering (NEMO) [74] are other methods available for transformation-based unsupervised analysis. Most of the methods for transformation-based supervised analysis are kernel- or graph-based algorithms [70]. The kernel-based integration approaches include Semi-Definite Programming SVM (SDP-SVM) [75], Multiple Kernel Learning with Feature Selection (FSMKL) [76], Relevance Vector Machine (RVM) [77] and Ada-boost RVM [78]. The graph-based integration approaches include graph-based semi-supervised learning (included in supervised analyses following Reel et al. 2021 [58]) [79], graph sharpening [80] and composite network [81]. Graph-based analyses have the advantage of easier interpretability and lower computational requirements whereas, overall, kernel-based methods provide higher predictive performance [70]. However, see Multi-Omics Graph Convolutional Networks (MOGONET) [82] for a high performing graph-based classification method.

Contents of this section were created by Iñaki Odriozola and Antton Alberdi.

References

58. Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv. 2021;49:107739.

70. Yan KK, Zhao H, Pang H. A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits. BMC Bioinformatics. 2017;18.

71. Speicher NK, Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics. 2015;31:i268–75.

72. Tepeli YI, Ünal AB, Akdemir FM, Tastan O. PAMOGK: A pathway graph kernel-based multiomics approach for patient clustering. Bioinformatics. 2021;36:5237–46.

73. Kim S, Jhong J-H, Lee J, Koo J-Y. Erratum to: Meta-analytic support vector machine for integrating multiple omics data. BioData Min. 2017;10:8.

74. Rappoport N, Shamir R. NEMO: Cancer subtyping by integration of partial multi-omic data. Bioinformatics. 2019;35:3348–56.

75. Lanckriet GRG, De Bie T, Cristianini N, Jordan MI, Noble WS. A statistical framework for genomic data fusion. Bioinformatics. 2004;20:2626–35.

76. Seoane JA, Day INM, Gaunt TR, Campbell C. A pathway-based data integration framework for prediction of disease progression. Bioinformatics. 2014;30:838–45.

77. Tipping ME. Sparse bayesian learning and the relevance vector machine. 2001.

78. Wu C-C, Asgharzadeh S, Triche TJ, D’Argenio DZ. Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics. 2010;26:807–13.

79. Kim D, Joung J-G, Sohn K-A, Shin H, Park YR, Ritchie MD, et al. Knowledge boosting: A graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc. 2015;22:109–20.

80. Shin H, Hill NJ, Lisewski AM, Park J-S. Graph sharpening. Expert Syst Appl. 2010;37:7870–9.

81. Mostafavi S, Morris Q. Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics. 2010;26:1759–65.

82. Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12:3445.