# Approaching Rosalind Problems

I’ve been working through the Rosalind problems over the last week. They are an interesting set of problems exploring the space between Biology and Computation. I decided to take notes while doing one problem. This blog post is a basic outline of my process for problem solving.

### Step 1 - Read The Problem And Background

Always read the problem. I’ve gotten a number of problems wrong in programming challenges because I didn’t read the instructions thoroughly. Rosalind provides some background info which I’ve found helpful as well. It is especially good for learning the relevant biology vocabulary. For this blog post I did the problem Mendel’s First Law.

### Step 2 - Verify That You Can Read The Data

All of the Rosalind problems supply a data file and I’ve fallen into a pattern of starting out by reading in the file to ensure I know what’s going on. For this problem the provided file is `2 2 2`. Since all the entries are the same it would be hard to verify I was reading and interpreting the file correctly. Because of this I used `1 2 3` as my example file. Then I wrote the code to put each of these values in a well named variable.

 ```1 2 3 4 5 ``` ```dominant, hetero, recessive = File.read("iprb.txt").chomp.split(' ').map(&:to_i) puts "Dominant #{dominant}" puts "Heterozygous #{hetero}" puts "Recessive #{recessive} ```

### Step 3 - Ponder And Devise A Strategy

My first strategy was to calculate how many ways there were to have the at least one dominant allele. Here’s some basic thoughts on that: From the background given I know that if one parent has the dominant allele the the resulting organism must have the dominant allele. Probability of one parent have the dominant allele is

``````dominant     not-dominant    dominant
--------  + ------------- * ----------
total         total         total - 1
``````

Then I remembered that since there are fewer ways get two recessive alleles I should calculate that instead and then subtract the result from one. (Basic law of probability).

#### Math:

Probability of two recessive parents mating:

``````recessive    recessive - 1
---------  * -------------
total        total - 1
``````

Probability of two heterozygous parents mating:

``````heterozygous     heterozygous - 1
------------  * ------------------
total           total - 1
``````

Punnet square for heterozygous mating

``````|   | Y  | y  |
|---+----+----|
| Y | YY | Yy |
| y | Yy | yy |
``````

Only 1/4 of those will be recessive so we take the probability of two heterozygous mating and multiply it by 1/4.

``````heterozygous     heterozygous - 1      1
------------  * ------------------ * -----
total           total - 1            4
``````

Finally the probability of a heterozygous and a recessive organism mating:

``````heterozygous     recessive        recessive     heterozygous
------------  * ------------  + ------------ * --------------
total           total - 1       total          total - 1
``````

Punnet square for heterozygous & recessive mating

``````|   | Y  | y  |
|---+----+----|
| y | Yy | yy |
| y | Yy | yy |
``````

In this case half the offspring have two recessive alleles so multiply the probability by 1/2.

### Step 4 - Write The Code

First thing I realized was that we need floats so I had to change the import code slightly. I changed the to_i to a to_f.

``````dominant, hetero, recessive = File.read("iprb.txt").chomp.split(' ').map(&:to_f)
``````

All my calculations require the total so I do that once:

``````total = dominant + hetero + recessive
``````

Now calculate the probability of two recessive organisms mating:

``````r_r = (recessive / total) * ((recessive - 1) / (total - 1))
``````

Now heterozygous organisms mating:

``````h_h = (hetero / total) * ((hetero - 1) / (total - 1))
``````

Now the hetero + recessive matings:

``````h_r = (hetero / total) * (recessive / (total - 1)) + (recessive / total) * (hetero / (total - 1))
``````

Now I incorporate the fractions from the punnet squares:

``````recessive_total = r_r + h_h * 1/4 + h_r * 1/2
``````

This is the probability of a recessive organism. The problem asked for the probability of a dominant organism so I take 1 - recessive_total

``````puts 1 - recessive_total
``````

All together:

 ```1 2 3 4 5 6 7 8 9 10 11 12 ``` ```dominant, hetero, recessive = File.read("iprb.txt").chomp.split(' ').map(&:to_f) total = dominant + hetero + recessive r_r = (recessive / total) * ((recessive - 1) / (total - 1)) h_h = (hetero / total) * ((hetero - 1) / (total - 1)) h_r = (hetero / total) * (recessive / (total - 1)) + (recessive / total) * (hetero / (total - 1)) recessive_total = r_r + h_h * 1/4 + h_r * 1/2 puts 1 - recessive_total 1 - recessive_total ```

When all this code runs it gives `0.78333` which is the expected result.

### Step 5 - Download The Real Data Set

Now I downloaded the real dataset, ran it through my code, and pasted the result into the text box.

### Step 6 - Celebrate

I recommend celebrating with cookies.

### Post-script

This problem can also be solved by simulating all possible matings and then calculating the percentage that have the dominant allele.

 ```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ``` ```dominant, hetero, recessive = File.read("iprb.txt").chomp.split(' ').map(&:to_f) def is_dominant? organism organism.include? "Y" end d_ary = Array.new(dominant) { ["Y", "Y"] } h_ary = Array.new(hetero) { ["Y", "y"] } r_ary = Array.new(recessive) { ["y", "y"] } ary = d_ary + h_ary + r_ary children = [] (0...ary.length).each do |i| (0...ary.length).each do |j| next if i == j p1 = ary[i] p2 = ary[j] children << [p1, p2] children << [p1, p2] children << [p1, p2] children << [p1, p2] end end d_children, r_children = children.partition { |o| is_dominant?(o) } puts d_children.count.to_f / children.count ```