Skip to main content

✨🧬 Useful Clustering Algorithms for Bioinformaticians! 🧬✨


🧠 In the realm of Bioinformatics data comes in myriad forms. Clustering algorithms sift through mountains of data points, grouping them into meaningful categories based on similarities, ultimately shedding light on biological relationships, structures, and functions. 

Here are some clustering algorithms you should know about (and use cases too! 😎):

1️⃣ CD-HIT (Cluster Database at High Identity with Tolerance):
📚 How it works: CD-HIT clusters similar biological sequences based on sequence identity, with an adjustable threshold.
  💡 Use Case: Clustering protein or nucleotide sequences to reduce redundancy and accelerate sequence searches in databases like UniProt or GenBank.

2️⃣ K-Means Clustering:
  📚 K-Means partitions data into 'k' clusters by iteratively assigning each data point to the nearest cluster centroid and updating centroids based on the mean of data points in each cluster.
  💡 Use Case: Segmenting gene expression data to identify distinct groups of genes with similar expression patterns.
  
3️⃣ Hierarchical Clustering:
  📚 Hierarchical clustering builds a tree-like hierarchy of clusters by successively merging or splitting clusters based on their similarity.
  💡 Use Case: Unraveling phylogenetic relationships by clustering DNA sequences based on similarities in genetic sequences.
  
4️⃣ DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
  📚 DBSCAN groups together closely packed points based on density, identifying core points, border points, and noise points in the data.
  💡 Use Case: Detecting anomalous protein structures in molecular dynamics simulations.
  
5️⃣ Spectral Clustering:
  📚 Spectral clustering uses the eigenvalues of a similarity matrix to reduce the dimensionality of the data before applying standard clustering techniques.
  💡 Use Case: Identifying functional modules in protein-protein interaction networks.

6️⃣ Self-Organizing Maps (SOM):
  📚 SOM maps high-dimensional data onto a lower-dimensional grid, preserving the topological properties of the data.
  💡 Use Case: Mapping high-dimensional gene expression data onto a 2D grid to reveal underlying structures.

Comments

Popular posts from this blog

 Genomics_command_line_quiz1 For all projects, you may use your own Unix-based system and, where applicable, ensure that you are running the version of the software specified in the assignments. Alternatively, you may use the VMBox virtual machine environment provided with the course materials. Instructions on how to download and use the environment can be found on the course web site. For the following questions, refer to the class workflow and use the data in the Online materials (‘gencommand_proj1_data.tar.gz’) to answer the questions. Assume you sequenced and assembled the genome of Malus domestica (apple), and performed gene annotation. You then collected samples and ran RNA-seq experiments to determine sets of genes that are expressed in the various tissues. This information was stored, respectively, in the following files: “apple.genome”, “apple.genes”, “apple.condition{A,B,C}”. NOTE: The apple genome and the apple gene annotations for this project were extracted from the Rosace

Immunotherapy

 

Introduction to Molecular Biology

 Introduction to Molecular Biology Cells are fundamental building blocks of living organisms. Cells contain a nucleus, mitochondria and chloroplasts, endoplasmic reticulum, ribosomes, vacuoles, etc.  The nucleus is important organelle because it houses chromosomes which include the DNA.  The DNA is in essence a blueprint of the organism as it encodes information needed to synthesize proteins . Molecular biologist s would like to understand how human biology works with the hope to treat diseases like cancer. One can look at simpler organisms such as yeasts to understand how human biology works.  Admittedly, unicellular yeasts are very different from humans who have approximately 1014 cells. However, the DNA is similar across all living organisms. For example, humans share 99% of DNA with chimps. Naturally, we would like to know what information contained in that 1% of DNA is so critical to determine all the distinguishing features of humans,  DNA            DNA stands for deoxyribonucle