SNP Evaluation Systems (continued)
As mentioned earlier, there is a SNP about every 700 base pairs in the Bos taurus genome, and since the genome is about 2.8 billion base pairs in length, there are about 4 million SNPs. There are too many SNPs to deal with all of them practically, so smaller samples of SNPs are used. Specific SNPs can be identified in a variety of ways, but the current, most practical approach is the SNP chip, which is a small piece of plastic or glass with dozens to hundreds of thousands of small dots on it that bind DNA. Each dot corresponds to a specific SNP and a small bit of adjacent DNA, and for a given animal, the SNP can be present in zero, one, or two copies, corresponding to having been inherited from neither, one, or both parents. The most common SNP chip used for cattle is from the company Illumina (Illumina Inc., San Diego, CA). This chip has around 50,000 SNPs and thus is called a 50K SNP chip, but only about 40,000 of these are reasonably useful for a variety of reasons; for example, some SNPs provide redundant or ambiguous information. An attempt was made to scatter these SNPs throughout the 2.8 billion base pair genome so if evenly spaced (and they are not), the 40,000 SNPs would provide a marker at about 70,000 base pair intervals. Although far from perfect, this number of SNPs turns out to be very useful for selection purposes in some populations of dairy cattle. Subsets of 10,000 SNPs are almost as useful as the 40,000 (VanRaden et al. 2009). Much larger and more expensive SNP chips are used for studying the genetic basis of disease in human populations (Edelson 2008), and much smaller and cheaper SNP chips are being planned for cattle (e.g., choosing the 300 most useful SNPs from the Illumina 50K chip). The current cost to researchers for one 50K SNP chip plus analysis is around US$200; smaller chips could cost as little as US$20 to $50.
Definition of Genomic Selection
Whole genome selection (or genomic selection) might be defined as using genotypes defined by a set of SNPs to select for optimal phenotypes. Considerable mathematics are involved in the process, and some of the properties of SNPs are illustrated by the 27 possible configurations of three SNPs shown in Table 1. Each SNP can be in one of three configurations in the diploid genome, designated arbitrarily by using letters A and B for SNP-1, and CD, EF for SNP-2 and SNP-3. There are billions of combinations when there are thousands of SNPs as opposed to 27 for three SNPs. For this example, the optimal SNP configuration for percent milk protein is BB DD EE or BB DD EF. When SNP-3 is in the EE or EF configuration, the more B’s and D’s, the higher the milk protein; when SNP-3 is in the FF configuration, SNP-1 and SNP-2 have no effect on milk protein. Consider the trait of productive herd life: This is not affected by SNP-3 but is negatively correlated with the numbers of B’s from SNP-1 and D’s from SNP-2. Another way of illustrating the same point is the positive correlation with the numbers of A’s and C’s in SNP-1 and SNP-2; that is, selecting for certain SNP configurations is equivalent to selecting against others.
|SNP-1||SNP-2||SNP-3||% milk protein||Productive herd life (months)|
Things get further complicated when selecting for more than one trait, as is true for conventional animal breeding, and are often dealt with by using a selection index approach or using conglomerate variables such as “net merit.” In Table 1, productive herd life is negatively correlated with percent milk protein when SNP-3 is in the EE or EF configuration, but not the FF configuration. This implies that the configurations of SNP-1 and SNP-2 drive productive herd life, not the percent milk protein itself. However, these relationships are unrealistically simplified and meant only to illustrate principles. Although it is unclear whether SNP systems will help unravel relationships such as those illustrated, the potential could be considerable.
Putting SNP Chips to Use
The 50K SNP chip provides useful markers for most alleles of genes affecting phenotypes of cattle. Especially important is that essentially all phenotypes, from docility to protein content of milk can be evaluated (Lee et al. 2008). The problem then becomes obtaining accurate phenotypes from thousands of animals from which one also can obtain DNA for SNP analysis. This is difficult to do accurately because phenotypes are greatly influenced by environment and thus can be misleading when matched to the SNP profile for an individual animal. Fortunately, with certain populations of dairy and beef cattle, phenotypic information has been accumulated in the form of sire proofs derived from hundreds to thousands of phenotypes of the respective sire’s offspring. Thus, the SNP profile of a bull can be evaluated and correlated with the phenotypic characteristics such as birth weight, weaning weight, milk production, somatic cell count in milk, etc. of his progeny. This results in a reasonably accurate phenotype averaged over many progeny (suitably adjusted for various factors such as overall herd performance, age, etc.).
The next step is to take the information from thousands of bulls and determine which SNP profiles correspond to which phenotypes (undesirable phenotypes are just as valuable because, as indicated earlier, one selects for desirable and against undesirable). This process involves using thousands of simultaneous equations, but what is remarkable about the process is that phenotypes are matched with SNPs that match desirable alleles, but it does not require knowing what alleles of what genes are actually involved (Lee et al. 2008).
Probably the best characterized system of selection with SNPs is Holstein dairy cattle in North America (VanRaden et al. 2009). This information has been provided to the public in the form of enhanced dairy bull proofs. To develop the system, information from over 5,000 bulls and a few females, with millions of progeny was used. The rate of genetic improvement can nearly double using this technology (Hayes et al. 2009) because genetically valuable animals can be identified more accurately and at younger ages. A measure of the power of this approach is that rather than progeny-test 1,000 bulls per year, the same genetic progress can be made by progeny-testing about 500 bulls that have been screened from a larger population with a SNP analysis. Another measure is that the additional information a SNP analysis provides to the pedigree analysis is equivalent to having an additional 10 to 20 daughters per bull for most traits in dairy cattle (VanRaden et al. 2009), and these daughter equivalents increase as more bulls are genotyped.