Probability in Genetics

 Probabilities are mathematical measures of likelihood. Probability refers to the chance or likelihood that a specific outcome will occur in an event in a particular situation. In other words, the probability is a way of quantifying (giving a specific, numerical value to) how likely something is to happen. The empirical probability of an event is calculated by dividing the number of times the event occurs by the total number of opportunities for the event to occur. The values of probability range from 0 to 1. A probability of 1 for an event means that it is guaranteed to happen, while a probability of 0 for an event means that it is guaranteed not to happen. For example, a coin is tossed four times and it appears to head all the times (4/4), the probability of getting head is 1. If it appears tail all the times (0/4), the probability of a head is 0. If it appears twice head and twice tail (2/4), then the probability is 0.5.

Probabilities can be either empirical, meaning that they are calculated from real-life observations, or theoretical, meaning that they are predicted using a set of rules or assumptions. The empirical probability of an event is calculated by counting the number of times that event occurs and dividing it by the total number of times that event could have occurred. For instance, if the event you were looking for was a wrinkled pea seed, and you saw it 1850 times out of the 7324 total seeds you examined, the empirical probability of getting a wrinkled seed would be 1850/7324 = 0.253, or very close to 1 in 4 seeds.

The theoretical probability of an event is calculated based on information about the rules and circumstances that produce the event. It reflects the number of times an event is expected to occur relative to the number of times it could possibly occur. For instance, if you had a pea plant heterozygous for a seed shape gene (Rr) and let it self-fertilize, you could use the rules of probability and your knowledge of genetics to predict that 1 out of every 4 offspring would get two recessive alleles (rr) and appear wrinkled, corresponding to a 0.25 (1/4) probability.

In general, the larger the number of data points that are used to calculate an empirical probability, such as shapes of individual pea seeds, the more closely it will approach the theoretical probability.

The product rule

One probability rule that's very useful in genetics is the product rule, which states that the probability of two (or more) independent events occurring together can be calculated by multiplying (product) the individual probabilities of the events. When a single coin is tossed repeatedly, the chance of heads occurring twice in succession would be ½ x ½ = ¼. The chance of three such occurrences would be (1/2)3 or 1/8. Similarly, when two coins are tossed together, the tail-tail combination expected is ½ x ½ = ¼.

In general, you can think of the product rule as the “and” rule: if both event X and event Y must happen for a certain outcome to occur, and if X and Y are independent of each other (don’t affect each other’s likelihood), then you can use the product rule to calculate the probability of the outcome by multiplying the probabilities of X and Y.

We can use the product rule to predict frequencies of fertilization events. For instance, consider a cross between two heterozygous (Aa) individuals. What are the odds of getting an “aa” individual in the next generation? The only way to get an “aa” individual is if the mother contributes an “a” gamete and the father contributes an “a” gamete. Each parent has a 1/2 chance of making an “a” gamete. Thus, the chance of an “aa” offspring is: (probability of the mother contributing “a”) x (probability of father contributing “a”) = (1/2) X (1/2) = 1/4.

The sum rule

In some genetics problems, you may need to calculate the probability that any one of several events will occur. In this case, you’ll need to apply another rule of probability, the sum rule. According to the sum rule, the probability that any of several mutually exclusive events will occur is equal to the sum of the events’ individual probabilities.

For example, if you toss a coin, the chance of getting either head or tail is ½. You could never get head and tail same time, these outcomes are called mutually exclusive. Thus the chances of getting either head or tail are ½ + ½ = 2/2 = 1.

You can think of the sum rule as the “or” rule: if an outcome requires that either event X or event Y occur, and if X and Y are mutually exclusive (if only one or the other can occur in a given case), then the probability of the outcome can be calculated by adding the probabilities of X and Y.

As an example, let's use the sum rule to predict the fraction of offspring from an Aa x Aa cross that will have the dominant phenotype (AA or Aa genotype). In this cross, three events can lead to a dominant phenotype:

• Two A gametes meet (giving AA genotype), or

• A gamete from Mom meets a gamete from Dad (giving Aa genotype), or

• a gamete from Mom meets A gamete from Dad (giving Aa genotype)

In any one fertilization event, only one of these three possibilities can occur (they are mutually exclusive). Since this is an “or” situation where the events are mutually exclusive, we can apply the sum rule. Using the product rule as we did above, we can find that each individual event has a probability of ¼. So, the probability of offspring with a dominant phenotype is: (probability of A from Mom and A from Dad) + (probability of A from Mom and a from Dad) + (probability of a from Mom and A from Dad) = (1/4) + (1/4) + (1/4) = ¾.



The laws of probability can be applied to the genetic mechanism as well as to other processes in which uncertainty exists. We can find the similarities between tossing a coin and Mendel’s pea plant experiment. Two coins having two faces (head and tail) are tossed freely are likely to fall head or tail or combinations of both. Similar results are expected when two plants (round and wrinkle seeded) are crossed.

Test of goodness of fit

The results of exact ratio or probability that parallel Mendelian segregation are merely chance occurrences of sets of independent events. Out of 100 events, the experiment would usually not obtain exactly 25 heads-heads (round-round), 25 head-tail (round-wrinkled), 25 tail-head (wrinkled-round), and 25 tail-tail (wrinkled-wrinkled). It would be surprising if precisely those results were obtained very often. The ratio represents only an average of expected results when independent events occur. The observed data may deviate from expected data, but the experimenter must know how much the observed results can differ from calculated or hypothetical figures and still be regarded as statistically close to expectations.

In evaluating the results of crosses and determining which modes of inheritance are involved, how much deviation is permissible without casting some doubt as to whether the data agree with the given hypothesis. Too much deviation would surely make investigator question their hypothesis or discard them entirely. Where should the line be drawn? Unfortunately, there is no precise answer to this question. The best a geneticist can do is to determine the likelihood of deviation of the results observed in the experiment from the predicted results occurring by chance and use statistical inference to decide whether a particular result supports a given hypothesis. These numerical data are the only means of evaluating the goodness of fit of an experimental result as compared with a particular expectation.

An important question to answer in any genetic experiment is how can we decide if our data fits any of the Mendelian ratios we have discussed. A statistical tool that can test out ratios is the Chi-Square or Goodness of Fit test.


Comments