# Probabilities for heterozygote genetic markers in hybrids

by *Ketil Malde*; **March 10, 2016**

Nature is nothing if not flexible, and when the environment changes, plants and animals will try their best to adapt to new conditions. Migration can be one such adaptation, and one which sometimes brings previously separate populations into contact with each other. For example, we have seen recent examples of Antarctic minke whale migrating all the way to the North Atlantic to breed with the whales native to that region. We don't know precisely what drives this, but one likely candidate is changes in the ecosystem, perhaps caused by global warming.

In order to monitor this migration, we would like to use genetic markers to identify migrants and their offspring. Ideally, we want markers that are *fully diagnostic*, that is, they always give one value (or *allele*) in one population, and always a different value in the other. In practice, even for quite good markers there is often some occurrences of the foreign allele, and even if the marker appears fully diagnostic, we can't say for sure, since our testing will always be limited by our sampling. Usually, the best we can do is to quantify the allele frequencies as confidence intervals.

In order to classify hybrids, we need to know the probability for the different combinations of alleles (genotypes) in the various types of hybrids. To simplify the analysis, we limit our interest to the case where a migrant enters a native population and interbreeds with it, and where the offspring (referred to as the F1 generation) continues to interbreed with the native population, resulting in new generations (F2, F3, and so on) of back-crossed hybrids.

## Assuming fully diagnostic markers

As mentioned, a marker is fully diagnostic if no allele occurs in both populations. If we restrict analysis to single nucleotide polymorphisms (SNPs), where there are two possible alleles per marker, all non-hybrid individuals are homozygote.

An F1 hybrid by necessity inherits one allele from each population, and thus it is always heterozygote. An F2 back-cross inherits one allele from the native population, and one from the F1 hybrid. The probability of heterozygosity is therefore the probability of inheriting the foreign allele (*a*) from the hybrid, i.e. 0.5. Similarly, the probability of homozygosity is the probability of inheriting the native allele (*A*), also 0.5. In general, the probability of retaining the foreign allele is halved for each subsequent back-cross.

If we label the native allele *A*, and the foreign allele *a*, we can list the probabilities for the different genotypes, as seen in Table *diagnostic* below.

BC Gen | P(AA) | P(Aa) | P(aa) |
---|---|---|---|

migrant | 0 | 0 | 1 |

native | 1 | 0 | 0 |

F1 | 0 | 1 | 0 |

F2 | 0.5 | 0.5 | 0 |

F3 | 0.75 | 0.25 | 0 |

: | : | : | : |

Fn | 1-2^{1-n} |
2^{1-n} |
0 |

## Arbitrary allele frequencies

Although fully diagnostic markers are the ideal case, in practice the foreign allele often occurs in the native population, and vice versa. In any case, with limited testing we cannot ascertain that the markers are fully diagnostic; at best, we can give a confidence interval for the minor allele frequency.

To address this we let *A* and *a* no longer represent the actual allele *values*, but instead the allele *origin*. In other words, *A* means an allele inherited from the native population, and *a* represents an allele with foreign origin, inherited from its migrant ancestor. We see that we can then use Table *diagnostic* to determine the probability of a back-cross having two alleles from the native population, or retaining one allele from its migrant forebear.

B | b | |
---|---|---|

native | p_{n} |
q_{n} |

foreign | q_{f} |
p_{f} |

Now, we turn to the actual allele values. Let the alleles be labelled *B* and *b*, and allele frequencies defined as in Table *freqs*.

Case | Probability |
Genotype BB | Genotype Bb |
Genotype bb |
---|---|---|---|---|

Two native alleles | 1 − 2^{1 − n} |
p_{n}² |
2p_{n}q_{n} |
q_{n}² |

One native, one foreign | 2^{1 − n} |
p_{n}q_{f} |
p_{n}p_{f} + q_{n}q_{f} |
q_{n}p_{f} |

Table *probs* combines the allele frequencies with Table *diagnostic* to calcuate the probability for the different genotypes. From this, we see that the probability of a heterozygote (genotype *B**b*) in an Fn hybrid is therefore:

If we assume that the population has the same minor allele frequency, that is *p*_{n} = *p*_{f} = *p* and *q*_{n} = *q*_{f} = *q* = 1 − *p*. From the relationship *p* = 1 − *q*, it follows that *p*² + *q*² = *p*² + (1 − *p*)² = 2*p*² − 2*p* + 1 = 1 − 2*p**q*, and we get:

We observe here that if *p* = 1, the probability of a heterozygote is 2^{1 − n}, as in Table *diagnostic*, and as *n* increases, the heterozygote probability converges to 2*p**q*, the heterozygote probability in the native population. Table *probs* gives the heterozygote probabilities for back-cross generations F1 to F10 under various minor allele frequencies.

Gen | 0.1 | 0.05 | 0.025 | 0.01 | 0 |
---|---|---|---|---|---|

1 | 0.820 | 0.905 | 0.951 | 0.980 | 1.000 |

2 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 |

3 | 0.340 | 0.298 | 0.274 | 0.260 | 0.250 |

4 | 0.260 | 0.196 | 0.162 | 0.140 | 0.125 |

5 | 0.220 | 0.146 | 0.105 | 0.080 | 0.063 |

6 | 0.200 | 0.120 | 0.077 | 0.050 | 0.031 |

7 | 0.190 | 0.108 | 0.063 | 0.035 | 0.016 |

8 | 0.185 | 0.101 | 0.056 | 0.027 | 0.008 |

9 | 0.183 | 0.098 | 0.052 | 0.024 | 0.004 |

10 | 0.181 | 0.097 | 0.051 | 0.022 | 0.002 |

native | 0.180 | 0.095 | 0.049 | 0.020 | 0.000 |

When *q* is small, we can make the following approximation by ignoring *q*^{2}:

## Fixed markers in the native population

For the minke whale, we tested fifty markers on about 400 specimens. There are some cases of non-zero minor allele frequency in the Antarctic population, but the common minke population (Atlantic and Pacific) appears to be entirely homozygote for these markers. One possible explanation can be that the larger Antarctic population allows the maintenance of a wider genetic diversity. Since we are primarily interested in the introgression of (foreign) Antarctic minke into (native) common minke populations, we might assume that *p*_{n} = 1 and *q*_{n} = 0. In that case, we can simplify (1) and the probability of heterozygotes becomes:

## Previous literature

As usual, I don't check the literature too closely before putting pen to paper. But Andersom and Thompson give an overview of various methods, and is probably a good starting point. Many of the methods mentioned depend on fully diagnostic markers, and many apply to a limited number of generations. Some methods attempt to identify hybrids without known allele frequencies in the native populations -- this is an implicit requirement in our analysis above. We have also used more-or-less standard classification methods (e.g., programs like Structure and Geneclass), but as I understand it, these only look at allele frequencies, and don't take into account the special distribution of genotypes (i.e., increased number of heterozygotes) that is particular to hybrids.

## Acknowledgments

Thanks to Hans J. Skaug for helping out with the math. I am still to blame for any remaining errors, of course - if you find any, I appreciate being made aware of them.

comments powered by Disqus