Geneticist Shamil Sunyaev on the difference between Mendelian and complex traits, linkage methods in genetics, and functionally relevant allelic variants in genes
Does a genotype always match a particular phenotype? When several genes control one trait, do they interact or influence it individually? Why do quantitative traits follow normal distribution? These and other questions are answered by Associate Professor at Harvard Medical School Shamil Sunyaev.
About a hundred years ago there was a fierce debate between Mendelian geneticists and a community which called itself ‘biometricians’. Mendelian geneticists suggested that genetics works following Mendel’s laws, there are specific variants segregating in population according to Mendel’s laws, and this is all what it is to genetics. Biometricians, however, suggested that this is not of any interest. So Mendelian genetics is some sort of oddity. Because if we go and analyze corn in the field or a sample of humans, for example, students in the college or, I don’t know, cows, we have multiple traits which we can observe. We can measure height, we can measure, I don’t know, productivity of cows, we can measure various agriculturally important traits or evolutionarily important traits. And those usually don’t segregate, they don’t follow Mendelian laws.
However, it doesn’t mean that they are not genetic, because we certainly see that relatives are similar to each other. And if we try to explain how much of variation in the trait is due to familial segregation, then, for example, for height we can explain 80% of height by height of both parents. And the idea was that this has nothing to do with Mendelian genetics meaning that Mendelian genetics is not really useful for agriculture, and Mendelian genetics has nothing to do with evolution because most of phenotypes, most of traits important for evolution actually follow this complex trait pattern.
In 1918, there was a fundamental paper published by Fisher suggesting the following model. These complex non-Mendelian traits follow laws of genetics, but in a different way. They are influenced by many different genes. And each locus actually follows Mendelian segregation of the population, which is now very obvious because we know about the nature of DNA variance, and in 1918 it wasn’t as clear to Fisher and the community. So each locus follows Mendelian segregation. However, the resulting trait is determined by the influence of each gene separately, so some of these individual influences result in the value of the trait plus extra non-genetic influences.
Fisher was able to explain a lot of observations. He was able to explain why many of the traits which we call quantitative follow what is known as normal distribution in statistics, so well-known bell-shaped distribution. He was able to explain the response to artificial selection in agricultural species, he was able to explain a lot of evolutionary observation. So his model became well accepted by the community and created this bridge between Mendelian genetics, evolution-agricultural genetics of complex traits, and now, of course, we are mostly interested in traits important for medicine.
We still use this model. However, there are many unknowns.
The unknowns arise from the question how many genes, how many individual loci would actually impact an individual trait.
Are variants which are involved in the trait very rare? Does this combination of individual variants – each variant is found once in 1000 people, in 2000 people or we can talk about population of not necessarily people, we can talk about various traits of drosophila or E. coli population – do they interact? Is this model of additivity where they each influence trait independently a reasonable model or they don’t act independently and there are specific nonlinear interactions between them?
All of these questions are still not answered. And they are very important in this postgenomic era to design studies of genetics of these complex traits. There are several attempts to approach genetics in a different fashion. First, methods used in Mendelian genetics have been applied to a number of complex traits, largely unsuccessfully. The idea of these methods – these methods are all linkage methods – is that you follow a specific pedigree, a specific family. And you try to find genetic variants or specific places in the genome which follow the same pattern as traits in the family. So you try to correlate the changes in DNA to the changes in the trait.
These methods have been tremendously successful in Mendelian variation, but did not bear a fruit for the analysis of complex trait. Then the community of statisticians realized that a simpler design of association, of trying to look at a frequency of a specific allele and correlate it in a sample of unrelated individuals with the trait value is a statistically better way to approach complex traits.
Multiple studies of associations along the genome have been done for many phenotypes, and there are thousands of loci that have been discovered starting 2007–2008. However, most of those have very small effects on the trait. They carry very small amount of risk. Does it mean that most of these complex traits are influenced by a multitude of variants of a very small effect? Does it mean that most of the variants are rare in the population, so we do not have statistical means to find these associations? Or does it mean these variants interact with each other in a nonlinear manner masking these associations and it is difficult to find them?
There are several attempts to answer those questions. One is to study models of population genetics trying to see what is feasible. There are several key results out of this model. The question whether alleles are more common versus rare, they have a larger effect or a smaller effect is related to natural selection. We do not assume that natural selection necessarily acts on the trait itself. For example, if there are genetic variants influencing type II diabetes, it doesn’t mean that selection actually cares about the diabetes risk. It may or may not. However, those variants which have effects on biological function are more likely to be under selection. Purifying negative selection, as we call it, prevents allelic variants from becoming common in the population, therefore most of the alleles, most of the genetic variants influencing the trait would remain at low frequency and we would need different ways to analyze them. And now with sequencing technology we start discovering these low frequency variants.
A separate possibility is that these alleles have very small effects, but there are many of them. There are new statistical methods, relatively new, developed around 2010, which instead of finding individual genes, individual variants contributing to variation in complex traits, try to estimate the total amount of variation due to variation in DNA. So they take samples of unrelated individuals and they try to fit a model with certain amount of variation being explained by this DNA variation collectively. And these models are additive, there is no assumption of interactions. You can think of this model – its not an exact analogy in mathematical terms, but you can think of this model as you try to analyze how similar are the phenotypes of individuals who are more similar genetically on the average.
They claim that the data on common genetic variation explain the larger fraction of phenotypic variation. For example, for human height, 45% of variation of human height can be explained by already collected data on common variants. So this is not theoretical modeling, this is an analysis of data itself.
Separate line of thought and separate debate is ongoing about the importance of genetic interactions and how much of genetic variation in complex traits is due to interactions. And when the others contributed to different sides of these debate, on one hand, surprisingly to many, nobody observed statistically significant interaction between different genes in complex trait genetics of humans in spite of studying thousands, tens of thousands, and for some traits hundreds of thousand of individuals.
This is surprising because all of the biology is about interactions, it’s common to study what is now called systems biology, pathways and networks, various interactions between different gene systems. However, human genetics didn’t uncover any of that.
It is very possible that these interactions are weak, but numerous and these models would be in agreement with all current observations we have.
So we cannot exclude possibility of these interactions and architectures of complex traits.
So there are many unanswered questions. However, there are some comforting ideas in genetics of multiple traits. I will use human blood lipids, cholesterol levels – levels of bad or good cholesterol. In this case, we know that genes involved in Mendelian genetics and individuals with hypercholesterolemia segregating in Mendelian fashion, many of the same genes have this very weak effect alleles which are found in those large population studies of individuals without such a severe disease – many of the same genes in sequencing studies looking at rare coding variants also show association with the trait.
This proves that this view of genetics of complex traits being Mendelian genetics at the level of individual genes is actually correct, and more importantly, all different genetics studies point to exactly the same biology. From a functional perspective, however, Mendelian or larger effects variants usually are in genes coding proteins. And this small effect multitude of variants are primarily in the non-coding fraction of the genome. And we think –and there are more functional data coming out about that – they are involved in regulating genes. So this is where we stand and we really hope that the following 5–10 years we’ll probably answer many of those questions.