What kind of research can be done with genomic data?

It is well known that science has given us large amounts of free accessible data, such as https://www.1000genomes.org and https://www.ncbi.nlm.nih.gov/genbank. How can we play around with the data and apply data science/machine learning to it? What could be some ideas?

My own ideas:

  • Biological data visualisation
  • Gene prediction using hidden-markov-model

Any more?

Topic data bioinformatics classification machine-learning

Category Data Science


  • Determine the function of genes and the elements that regulate genes throughout the genome.
  • Find variations in the DNA sequence among people and determine their significance. The most common type of genetic variation is known as a single nucleotide polymorphism or SNP (pronounced “snip”). These small differences may help predict a person’s risk of particular diseases and response to certain medications.
  • Discover the 3-dimensional structures of proteins and identify their functions.
  • Explore how DNA and proteins interact with one another and with the environment to create complex living systems.
  • Develop and apply genome-based strategies for the early detection, diagnosis, and treatment of disease.
  • Sequence the genomes of other organisms, such as the rat, cow, and chimpanzee, in order to compare similar genes between species.
  • Develop new technologies to study genes and DNA on a large scale and store genomic data efficiently.
  • Continue to explore the ethical, legal, and social issues raised by genomic research.

Source: https://ghr.nlm.nih.gov/handbook/genomicresearch?show=all


You may build models to classify genomes by population. Run unsupervised learning (clustering) to see if populations are reconstructed in the model. Build models to infer missing genotypes

To do a Scalable DNA analysis you may check Adam software based on Apache Spark

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.