About this guidebook
Contents
Protocols, exercises and tutorials
Data sets
About the authors
How to cite this work
Acknowledgement
I INTRODUCTION
1
Introduction to holo-omics
Why do we need holo-omics?
What is holo-omics?
1.1
Omic layers
Host genomics (HG)
Host transcriptomics (HT)
Microbial metagenomics (MG)
Microbial metatranscriptomics (MT)
Host proteomics (HP)
Microbial metaproteomics (MP)
(Meta)metabolomics (ME)
2
Study design considerations
2.1
Hologenomic complexity
2.2
Control of variables
2.2.1
Controlling host genomes
2.2.2
Controlling microbial metagenomes
2.2.3
Controlling the environment
2.3
Molecular resolution
2.3.1
Resolution of host genotypes
Resolution of microbial metagenotypes
Resolution of envirotypes
2.4
Spatiotemporal factors
Spatial factors
Temporal factors
2.5
Explanatory and response variables
Phenotype as a product of genotype, metagenotype and envirotype
Genotype expression influenced by metagenotype and envirotype
Metagenotype as a product of genotype and envirotype
II FIELD PROCEDURES
3
About fieldwork
4
Sample collection
5
Sample preservation
III LABORATORY PROCEDURES
6
About labwork
General considerations
Procedures for generating multi-omic data
7
DNA/RNA extraction
Sample preprocessing
Chemical isolation
Physicochemical isolation
Available protocols
8
Protein/metabolite extraction
9
Sequencing library preparation
Sequencing strategies and platforms
PCR-based vs. PCR-free library preparation
Indices and multiplexing
Unique molecular identifiers (UMIs)
9.1
Host genomics and microbial metagenomics
List of available protocols
9.2
Host transcriptomics
List of available protocols
9.3
Microbial metatranscriptomics
Capture-based rRNA depletion
RNAse-based rRNA depletion
CRISPR/Cas9-based rRNA depletion
List of available protocols
IV BIOINFORMATIC PROCEDURES
10
About bioinformatics
10.1
Prepare your shell environment
Required software
Install conda / miniconda
Install mamba (optional)
Create a conda environment
Activate the holo-omics conda environment
Install software in conda environment
10.2
Using snakemake for workflow management
11
Sequencing data preprocessing
Preprocess the reads using fastp
Splitting host and non-host data
12
Host genomics (HG) data processing
12.1
Host reference genome
12.1.1
Genome quality
12.1.2
Genome profile analysis
12.1.3
Genome assembly using hifiasm
12.1.4
Assembly evaluation
12.1.5
Assembly scaffolding
12.1.6
Final genome evaluation
12.1.7
Reference genome annotation
12.2
Host genome resequencing
13
Microbial metagenomics (MG) data processing
13.1
Reference-based
13.2
Assembly-based
Individual assembly-based
Coassembly-based
Gene annotation
Read mapping
13.3
Genome-resolved
Binning
Bin refinement
Bin quality assessment
Bin curation
Dereplication
Taxonomic annotation
Functional annotation
Read mapping
14
Host transcriptomics (HT) data processing
14.1
Reference-based host transcriptomics (HT) data processing
Quality-filtering
Ribosomal RNA removal
Reference genome indexing
Read mapping against reference genome
14.2
Reference-free host transcriptomics (HT) data processing
15
Microbial metatranscriptomics (HT) data processing
15.1
Reference-based microbial metatranscriptomics (MT) data processing
Quality filtering
Ribosomal RNA removal
Host genome indexing
Host genome mapping
Generating and indexing the microbial genome catalogue
Mapping against the microbial genome catalogue
Calculate gene counts
15.2
Reference-free microbial metatranscriptomics (MT) data processing
16
Host proteomics (HP) data processing
17
Microbial metaproteomics (MP) data processing
V STATISTICAL PROCEDURES
18
About statistics
A step-by-step approach
18.1
Prepare your R environment
Required packages
Package installation
18.2
Create / clone a Github repository
Install git in you local computer
Create a Github repository
Create a version-control project
Set-up RStudio-Github connection
19
Single omic analyses
20
Data transformations
20.1
Transformations to account for statistical assumptions
Transforming data to meet normality assumption
20.1.1
Transformations to account for compositional data
20.1.2
Transformations to account for scaling
21
Unsupervised exploration
21.1
Cluster analysis
21.1.1
Hierarchical clustering
21.1.2
Disjoint clustering
21.2
Dimension reduction and ordination
21.2.1
Principal Component Analysis (PCA)
21.2.2
Principal Coordinate Analysis (PCoA)
21.2.3
Non-metric Multidimensional Scaling (NMDS)
21.2.4
t-Distributed Stochastic Neighbour Embedding (t-SNE)
21.2.5
Uniform manifold approximation and projection (UMAP)
22
Supervised analysis
22.1
Regression methods
22.1.1
PERMANOVA
22.1.2
ANOSIM
22.1.3
Redundancy analysis (RDA)
22.1.4
Canonical Correspondence Analysis (CCA)
22.1.5
Generalised linear modelling (GLM)
22.1.6
Generalised linear mixed modelling (GLMM)
22.2
Classification methods
22.2.1
Random Forests (RF)
22.2.2
Support Vector Machines (SVM)
23
Multi-omic integration
24
Multi-staged omics integration
25
Meta-dimensional omics integration
25.1
Concatenation-based integration
25.2
Transformation-based integration
25.3
Model-based integration
VI RESOURCES
26
Useful links
Data access
Documentation
27
References
A Practical Guide to Holo-Omics
14.2
Reference-free host transcriptomics (HT) data processing
Contents will be added shortly.