Clustering procedures group features or observations into homogeneous sets by minimising within-group and maximising among-group distances
Hierarchical clustering produces a stratified organisation of features or observations where relatively similar objects are grouped together. The clustering can be performed using different criteria to measure the distance between clusters, which will affect the final outcome of the analysis (e.g., single linkage, complete linkage, average linkage and Ward’s minimum variance).
A useful exploratory analysis to reveal general patterns in an omic layer can be obtained by simultaneous application of hierarchical clustering to the rows and columns of the data matrix, and visualising the results in a heatmap.
# Load the dataset data <- read.csv("mydata.csv", row.names=1) # Perform hierarchical clustering of rows and columns row_clusters <- hclust(dist(data)) col_clusters <- hclust(dist(t(data))) # Plot heatmap with row and column dendrograms library(gplots) heatmap.2(as.matrix(data), Rowv=row_clusters, Colv=col_clusters, scale="row", dendrogram="both", key=TRUE, keysize=1.5, col=redgreen(75))
Disjoint clustering techniques aim at separating the objects into individual, usually mutually exclusive, and in most cases, unconnected clusters. K-means clustering is one of the most typical algorithms where objects are assigned to k clusters using an iterative procedure that minimises the within-clusters sums of squares. Other available clustering methods include twinspan, self-organising maps, dbscan and Dirichlet multinomial mixtures (DMM). DMM were specifically developed to analyse MG data but can be equally useful for other sequencing-based omic datasets.