Chapter 8.2: Genetic Correlations (Video Transcript)
Title: Genetic Correlation and Partitioning
Presenter(s): Patrick Turley, PhD (Analytic and Translational Genetics Unit, The Broad Institute of Harvard and MIT)
Patrick Turley:
So now, we’re going to talk about genetic correlation and partitioning. So, we’re going to, we’re going to start with genetic correlation.
I’m going to talk more broadly about a term that we call genetic overlap. So, a lot of traits have really similar genetic architecture. That is, if we find a SNP that is causal for some trait, like educational attainment, it’s more likely, once we found it for education attainment, we might think that it’s more likely to also be associated with things like cognitive performance or a bunch of other traits, you know, schizophrenia. So, we maybe want to get an estimate of how strong that relationship is. You know, if we do find a SNP for one trait, how likely is it that it’s going to also be causal for the other? Just as a vocab that I don’t think has come up, we haven’t defined specifically. So, there’s the state where one SNP is associated with a variety of phenotypes, that’s called pleiotropy. So, someone says, “Oh, there’s pleiotropy,” it means that genes do more than one thing. And, because there’s pleiotropy, that’s why there’s going to be genetic overlap. So, why might we care about overlap? Well, you know, it might help us untangle complicated causal relationships. So, if we see that all of the SNPs that are important for education are also important for schizophrenia, then, you know, might say, “Oh, so those two things have a genetic relationship.” You know, if we see that depression is highly genetically correlated with neuroticism, I mean, like, “Okay, yeah, the genetic, you know, additive contribution to those two traits is similar,” and so that can help us as we’re thinking about what might be causing both of them. It also can help us prioritize causal pathways. And so, like, let’s, this is sort of related to maybe the proxy phenotype method. And so let’s say that we know that education and cognitive performance are highly genetically correlated, but we don’t have very large samples for cognitive performance. Well, if we know that they’re highly related, then we could, you know, limit the space over which we’re looking at the genome by just taking, you know, SNPs that are associated with educational attainment above a certain level and look for the association with cognitive performance there, and as a result, we don’t have to do as large of a multiple testing correction because we’re doing fewer tests. You know, also if we know that things are related and we’re trying to figure out how, you know, it’ll just point us in the right direction.
Audience member: How does that differ from candidate gene?
Patrick: So I think that when at least historically when people have said candidate gene, it was based on sort of a theoretical biological relationship. And so we said, “oh yeah, this is this gene we think it has to do with this,” and so we’re going to test for that. Whereas when I say proxy phenotype, it’s less of a theoretical relationship that we’re using to select about an empirical one. Yeah, great.
So when we say overlap, there’s sort of two ways to think about it. So one’s enrichment and one’s genetic correlation. And so enrichment is this idea of kind of a proxy phenotype method, you know, our SNPs that are important for a phenotype A, also important for a phenotype B. And so we could do this test, we’re just going to take the SNPs with a p-value of less than p0 and we’re gonna test them in a GWAS for phenotype B. And so that could be pretty successful in seeing like if, you know: do the SNPs in phenotype B in that subset are they, you know, more likely to be more significant? That’s sort of the very high-view question. But there are a few technical questions. First off, how would you pick the threshold? Are we just going to look at SNPs that are genome-wide significant or should we look maybe lower in the distribution? How do you deal with LD? So, let’s say you have two SNPs and they’re highly correlated with each other, but this SNP is really important for phenotype A but not at all important for phenotype B, and this SNP is really important for phenotype B, but not for phenotype A. If we just compare GWAS summary statistics, we would say, “Oh yeah, these are both important” and so these traits might be related. But when it comes down to the actual effect of those particular SNPs, they’re not related at all. And so, we should keep in mind that we have this sense when we have GWAS summary statistics, you shouldn’t think SNP, you should think sort of like region around the SNP.
Audience member: So this is another generality. Is it usually the case that genes that are implicated in the same phenotype, are they on the same chromosome?
Patrick: Oh yeah, that’s a question I don’t know the answer to actually. Does anyone, is there anyone with a bio background who knows this? I was wondering this just the other day. So the question is if you have a gene that you know is important for some phenotype, are other genes that are going to be important for that phenotype maybe going to be nearby, like on the same chromosome or nearby? Yeah, either close to each other on the chromosome or just on the chromosome.
Audience member: We talk about the cis transcription regulation elements, local control of the gene expression.
Patrick: Yeah, yeah. So I guess for expression, we know that there’s some localness.
Audience member: Most of the genes that control expression are nearby but there can be some that are far away. My understanding that it depends on the phenotype. There are some things like immune function where there’s a huge density of immune-related genes in an area called the major histocompatibility complex on chromosome 6. And then, as you and I were talking about the other day, Patrick, acetylcholine nicotinic receptor genes that tend to be close to one another, so those are ones that are associated with smoking behaviors, but for other things, like educational attainment, there tend to be hits over the entire genome. I wonder if there’s local correlation.
Patrick: I mean if it’s highly polygenic.
Audience member: I guess my question is related more directly to one biological pathway.
Patrick: Mm-hmm. Yeah, yeah, I wish I knew the answer as well, and someone maybe does, but I don’t know. And it’s relevant for a lot of the things that we do. Like remember how I told you one of the assumptions in LD score regression is that these betas are independent, and if genes that are important for specific things tend to be close to each other, that also means that they’re in high LD with each other, and so it might break, you know, might violate the assumptions about these score regressions. So, I don’t know the answer, but I wish I did.
So here’s another question, so like let’s, let’s say that we, you know, we have our education, our 74 education-associated SNPs, and then we looked them up in a height GWAS, and we find that all of them are associated with height. You know, do we think that that is a signal that the two traits are strongly related? I’m going to say no, who can give me a reason why we might not expect that? You, yeah, so, so you wouldn’t want to interpret the fact that every single education SNP is associated with height as well.
Audience member: Direction effect?
Patrick: Yes, I guess the direction could be one thing, what else might we think about because the direction if it’s associated could go in. But we’re just looking for general enrichment, we’re not thinking about direction.
Audience member: I don’t know if this is maybe missing the point, but maybe if height and education attainment are correlated?
Patrick: Mm-hmm, yeah, I mean, if that’s the case then we actually do think they’re related.
Audience member: It’s just active through the channel that taller people get better education.
Patrick: I guess, so, so yes, I guess maybe I should have picked something that are unrelated, let’s say there are traits that aren’t. I guess maybe I shouldn’t make you keep guessing, but maybe it’s good that you were thinking. But height’s really polygenic, right? So why is that relevant? Yeah, height is really polygenic.If we had a trait that wasn’t polygenic, let’s say there was only one gene, and it was the only gene, and we find that that gene also is important for height, because of random SNP drawn from the genome is likely to be associated with height. And so we should keep that in mind also when we’re doing these tests. So we’ve got to think, what is our expected significance? And so, because we’re talking about “are things more significant than we expect”? You have to ask the question: what do we expect? And saying that we expect null may be not right.
There’s also a question of one-sided or two-sided tests, which gets to this idea: do we want to think about the sign or not? If we see a positive effect in education, if we just want to see if that leads to positive effects in height, then that’s a one-sided test. But if we just want to say, “do we expect a large effect in height?”, we don’t care if it’s negative or positive, then that would be a two-sided test. And so that’s just a decision that we need to make.
Here are some examples of kind of the thing I told you, so we take the 74 education SNPs. We’re about out of time, but I will stop at the end of the overlap stuff. And then I want to see if it is associated with the GWAS on the size of your thalamus in your brain. The 45-degree line is what we would expect to see if they were all null, and so we see that, you know, it does sort of peel away from null, so there might be a little bit of extra signal relative to the null. But we can test how much enrichment there is based on how polygenic we think the trait is. And we have a p-value of 0.08, we can’t reject that the amount of inflation that we’re seeing is just due to chance. However, if we look at the education SNPs and a GWAS of cognitive performance, you know, we start seeing it peel away pretty quickly, and our p-value there is 0.002, just based on, you know, how significant are these relative to what we’d expect. If you look at schizophrenia, it peels away even more. Schizophrenia, I think is a funny case. So if we look at cognitive performance, the sign concordance is 90%, so that means a SNP that increases education, you know, 90% of the time we’ll also estimate it to increase cognitive performance. So it is highly enriched. Schizophrenia, we see, you know, it’s highly, highly enriched – we can see just by looking at these points that there’s a lot of enrichment. However, the sign concordance is 51%, which means that a SNP that’s associated with education is likely to be important for schizophrenia, but we can’t really guess if it’s going to be protective of schizophrenia or what’s the opposite, more causal, it’s going to be more likely to lead to schizophrenia. And so that’s kind of a funny case. So I think that we should break for lunch and we can talk about genetic correlation briefly after lunch.