Question: Let's say we are given an intragenic region of DNA, that we know begins in an exon,
contains exactly one 5'-splice site (this is where the 'cutting' occurs at the beginning of the intron),
and then ends in exactly one intron. We would like to determine which section of the DNA is the
exon, which section is the 5'-splice site, and which section is the intron.
Furthermore, let 's say that if a nucleotide is part of an exon, then there is a 90% chance of the
next nucleotide also being part of the exon. Similarly, if a nucleotide is part of a 5'-splice site.
Also, suppose that each nucleotide base occurs with equal frequency in an exon. In addition,
th re is a 5% chance of a splice site being an A, and a 95% chance of a G. Lastly, let 's say that for
an intron, there is a 40% chance of obtaining A or T, and a 10% chance for C or G.
Construct a Hidden Markov Model which models this behaviour. Then, for the sequence,
TCCAGCAT, determine which regions are most likely part of an intron, an exon, and a 5'-splice site.
exon splice site intron
~ ,......,..._, ~
For example, one possible answer would be TCC AG CAT. Show all your work.
TutorTeddy.com & Boston Predictive Analytics
[ Email your Statistics or Math problems to email@example.com (camera phone photos are OK) ]
Boston Office (Near MIT/Kendall 'T'):
Cambridge Innovation Center,
One Broadway, 14th Floor,
Cambridge, MA 02142,
Dallas Office (Near Galleria):
15950 Dallas Parkway,
Dallas, TX 75248,