Skip to main content

 Genomics_command_line_quiz1

For all projects, you may use your own Unix-based system and, where applicable, ensure that you are running the version of the software specified in the assignments. Alternatively, you may use the VMBox virtual machine environment provided with the course materials. Instructions on how to download and use the environment can be found on the course web site.


For the following questions, refer to the class workflow and use the data in the Online materials (‘gencommand_proj1_data.tar.gz’) to answer the questions. Assume you sequenced and assembled the genome of Malus domestica (apple), and performed gene annotation. You then collected samples and ran RNA-seq experiments to determine sets of genes that are expressed in the various tissues. This information was stored, respectively, in the following files: “apple.genome”, “apple.genes”, “apple.condition{A,B,C}”.


NOTE: The apple genome and the apple gene annotations for this project were extracted from the Rosaceae Genome Database (RGD). Actual data have then been modified, and hence may not directly reflect the information in the original RGD records.


1. How many chromosomes are there in the genome?

grep -c ">" apple.genome

## 3

2. How many genes?

cut -f1 apple.genes | sort -u | wc -l

##     5453

3. How many transcript variants?

cut -f2 apple.genes | sort -u | wc -l

##     5456

4. How many genes have a single splice variant?

cut -f1 apple.genes | uniq -c | grep " 1 " | wc -l

##     5450

5. How may genes have 2 or more splice variants?

cut -f1 apple.genes | uniq -c | grep -v " 1 " | wc -l

##        3

6. How many genes are there on the ‘+’ strand?

cut -f1,4 apple.genes | sort | uniq -c | grep "+" | wc -l

##     2662

7. How many genes are there on the ‘-’ strand?

cut -f1,4 apple.genes | sort | uniq -c | grep "-" | wc -l

##     2791

8. How many genes are there on chromosome chr1?

9. How many genes are there on each chromosome chr2?

10. How many genes are there on each chromosome chr3?

cut -f1,3 apple.genes | sort -u | cut -f2 | sort | uniq -c

## 1624 chr1

## 2058 chr2

## 1771 chr3

11. How many transcripts are there on chr1?

12. How many transcripts are there on chr2?

13. How many transcripts are there on chr3?

cut -f2,3 apple.genes | sort -u | cut -f2 | sort | uniq -c

## 1625 chr1

## 2059 chr2

## 1772 chr3

14. How many genes are in common between condition A and condition B?

cut -f1 apple.conditionA | sort -u > sortA

cut -f1 apple.conditionB | sort -u > sortB

comm -1 -2 sortA sortB | wc -l

##     2410

15. How many genes are specific to condition A?

comm -2 -3 sortA sortB | wc -l

##     1205

16. How many genes are specific to condition B?

comm -1 -3 sortA sortB | wc -l

##     1243

17. How many genes are in common to all three conditions?

cut -f1 apple.conditionC | sort -u > sortC

comm -1 -2 sortA sortB > AB_common

comm -1 -2 AB_common sortC | wc -l

##     1608

Comments

Popular posts from this blog

Immunotherapy

 

Introduction to Molecular Biology

 Introduction to Molecular Biology Cells are fundamental building blocks of living organisms. Cells contain a nucleus, mitochondria and chloroplasts, endoplasmic reticulum, ribosomes, vacuoles, etc.  The nucleus is important organelle because it houses chromosomes which include the DNA.  The DNA is in essence a blueprint of the organism as it encodes information needed to synthesize proteins . Molecular biologist s would like to understand how human biology works with the hope to treat diseases like cancer. One can look at simpler organisms such as yeasts to understand how human biology works.  Admittedly, unicellular yeasts are very different from humans who have approximately 1014 cells. However, the DNA is similar across all living organisms. For example, humans share 99% of DNA with chimps. Naturally, we would like to know what information contained in that 1% of DNA is so critical to determine all the distinguishing features of humans,  DNA            DNA stands for deoxyribonucle