• About this guidebook
    • Contents
    • Protocols, exercises and tutorials
    • Data sets
    • About the authors
    • How to cite this work
    • Acknowledgement
  • I INTRODUCTION
  • 1 Introduction to holo-omics
    • Why do we need holo-omics?
    • What is holo-omics?
    • 1.1 Omic layers
      • Host genomics (HG)
      • Host transcriptomics (HT)
      • Microbial metagenomics (MG)
      • Microbial metatranscriptomics (MT)
      • Host proteomics (HP)
      • Microbial metaproteomics (MP)
      • (Meta)metabolomics (ME)
  • 2 Study design considerations
    • 2.1 Hologenomic complexity
    • 2.2 Control of variables
      • 2.2.1 Controlling host genomes
      • 2.2.2 Controlling microbial metagenomes
      • 2.2.3 Controlling the environment
    • 2.3 Molecular resolution
      • 2.3.1 Resolution of host genotypes
      • Resolution of microbial metagenotypes
      • Resolution of envirotypes
    • 2.4 Spatiotemporal factors
      • Spatial factors
      • Temporal factors
    • 2.5 Explanatory and response variables
      • Phenotype as a product of genotype, metagenotype and envirotype
      • Genotype expression influenced by metagenotype and envirotype
      • Metagenotype as a product of genotype and envirotype
  • II FIELD PROCEDURES
  • 3 About fieldwork
  • 4 Sample collection
  • 5 Sample preservation
  • III LABORATORY PROCEDURES
  • 6 About labwork
    • General considerations
    • Procedures for generating multi-omic data
  • 7 DNA/RNA extraction
    • Sample preprocessing
    • Chemical isolation
    • Physicochemical isolation
    • Available protocols
  • 8 Protein/metabolite extraction
  • 9 Sequencing library preparation
    • Sequencing strategies and platforms
    • PCR-based vs. PCR-free library preparation
    • Indices and multiplexing
    • Unique molecular identifiers (UMIs)
    • 9.1 Host genomics and microbial metagenomics
      • List of available protocols
    • 9.2 Host transcriptomics
      • List of available protocols
    • 9.3 Microbial metatranscriptomics
      • Capture-based rRNA depletion
      • RNAse-based rRNA depletion
      • CRISPR/Cas9-based rRNA depletion
      • List of available protocols
  • IV BIOINFORMATIC PROCEDURES
  • 10 About bioinformatics
    • 10.1 Prepare your shell environment
      • Required software
      • Install conda / miniconda
      • Install mamba (optional)
      • Create a conda environment
      • Activate the holo-omics conda environment
      • Install software in conda environment
    • 10.2 Using snakemake for workflow management
  • 11 Sequencing data preprocessing
    • Preprocess the reads using fastp
    • Splitting host and non-host data
  • 12 Host genomics (HG) data processing
    • 12.1 Host reference genome
      • 12.1.1 Genome quality
      • 12.1.2 Genome profile analysis
      • 12.1.3 Genome assembly using hifiasm
      • 12.1.4 Assembly evaluation
      • 12.1.5 Assembly scaffolding
      • 12.1.6 Final genome evaluation
      • 12.1.7 Reference genome annotation
    • 12.2 Host genome resequencing
  • 13 Microbial metagenomics (MG) data processing
    • 13.1 Reference-based
    • 13.2 Assembly-based
      • Individual assembly-based
      • Coassembly-based
      • Gene annotation
      • Read mapping
    • 13.3 Genome-resolved
      • Binning
      • Bin refinement
      • Bin quality assessment
      • Bin curation
      • Dereplication
      • Taxonomic annotation
      • Functional annotation
      • Read mapping
  • 14 Host transcriptomics (HT) data processing
    • 14.1 Reference-based host transcriptomics (HT) data processing
      • Quality-filtering
      • Ribosomal RNA removal
      • Reference genome indexing
      • Read mapping against reference genome
    • 14.2 Reference-free host transcriptomics (HT) data processing
  • 15 Microbial metatranscriptomics (HT) data processing
    • 15.1 Reference-based microbial metatranscriptomics (MT) data processing
      • Quality filtering
      • Ribosomal RNA removal
      • Host genome indexing
      • Host genome mapping
      • Generating and indexing the microbial genome catalogue
      • Mapping against the microbial genome catalogue
      • Calculate gene counts
    • 15.2 Reference-free microbial metatranscriptomics (MT) data processing
  • 16 Host proteomics (HP) data processing
  • 17 Microbial metaproteomics (MP) data processing
  • V STATISTICAL PROCEDURES
  • 18 About statistics
    • A step-by-step approach
    • 18.1 Prepare your R environment
      • Required packages
      • Package installation
    • 18.2 Create / clone a Github repository
      • Install git in you local computer
      • Create a Github repository
      • Create a version-control project
      • Set-up RStudio-Github connection
  • 19 Single omic analyses
  • 20 Data transformations
    • 20.1 Transformations to account for statistical assumptions
      • Transforming data to meet normality assumption
      • 20.1.1 Transformations to account for compositional data
      • 20.1.2 Transformations to account for scaling
  • 21 Unsupervised exploration
    • 21.1 Cluster analysis
      • 21.1.1 Hierarchical clustering
      • 21.1.2 Disjoint clustering
    • 21.2 Dimension reduction and ordination
      • 21.2.1 Principal Component Analysis (PCA)
      • 21.2.2 Principal Coordinate Analysis (PCoA)
      • 21.2.3 Non-metric Multidimensional Scaling (NMDS)
      • 21.2.4 t-Distributed Stochastic Neighbour Embedding (t-SNE)
      • 21.2.5 Uniform manifold approximation and projection (UMAP)
  • 22 Supervised analysis
    • 22.1 Regression methods
      • 22.1.1 PERMANOVA
      • 22.1.2 ANOSIM
      • 22.1.3 Redundancy analysis (RDA)
      • 22.1.4 Canonical Correspondence Analysis (CCA)
      • 22.1.5 Generalised linear modelling (GLM)
      • 22.1.6 Generalised linear mixed modelling (GLMM)
    • 22.2 Classification methods
      • 22.2.1 Random Forests (RF)
      • 22.2.2 Support Vector Machines (SVM)
  • 23 Multi-omic integration
  • 24 Multi-staged omics integration
  • 25 Meta-dimensional omics integration
    • 25.1 Concatenation-based integration
    • 25.2 Transformation-based integration
    • 25.3 Model-based integration
  • VI RESOURCES
  • 26 Useful links
    • Data access
    • Documentation
  • 27 References

A Practical Guide to Holo-Omics

26 Useful links

Data access

Host reference genomes

  • NCBI Genome (website):
  • Ensembl (website):
  • Vertebrates Genome Project (website):

Metagenomic data

  • HoloFood Data Portal (website):
  • MGnify (website):
  • Earth Hologenome Initiative (website):

Documentation

Genomics

  • Data Wrangling and Processing for Genomics (website):
  • Vertebrate Genomes Project assembly pipeline tutorial (website):

Shell command line usage

  • Introduction to the Command Line for Genomics (website): general overview of basic command line usage.

R usage (General usage and programming)

  • Intro to R and RStudio for Genomics (website):
  • Efficient R programming (website): best practices for programming in R.

R usage (Graphics and visualisation)

  • Fundamentals of Data Visualization (website): guide to making visualisations that accurately reflect the data, tell a story, and look professional.
  • R Graphics Cookbook (website): a practical guide that provides more than 150 recipes to generate high-quality graphs using ggplot2.

Statistics

  • An Introduction to Statistical Learning (book): freely available book about general statistical learning covering regression and classification problems through linear modelling and machine learning.
  • High dimensional statistics with R (website): virtual lesson specialised in dealing with high dimensional data.