Software Tutorials: MTAG (Video Transcript)

Title: Multi-trait Analysis of GWAS (MTAG)

Presenter(s): Patrick Turley, PhD (Analytic and Translational Genetics Unit, The Broad Institute of Harvard and MIT)

Patrick Turley:

We’ve been toying with what we were going to do with this afternoon lecture. We had talked about maybe doing something that replication, and we also talked about doing MTAG. We talked about doing both – I don’t have time for both, so we’re just going to talk about this multi-trait analysis of GWAS paper. It’s one part self-indulgent because this is kind of the thing that I’ve been working on for the past year, but also I think it’s a nice illustration of a lot of the topics that we’ve been talking about for the past several days. So, I figured we’d go through, talk about this paper, and focus on some of the things that we’ve been learning about for the past few days.

Audience comment: It’s also something that is going to become very common in the next few years.

Patrick: It’s going to become so common, yeah.

Audience comment: It’s good to know about. And also, Patrick is going to talk about things to look out for, so you know if it’s being applied well or not being applied well. Response: Yep, okay, great.

Audience question:Where are the slides?

Patrick: Oh, sorry. Oh yeah, so this is going to be a really informal lecture, which means that there’s actually very little information on these slides. I mean, like, I’m going to show a bunch of pictures, but I’m actually going to do a lot of work on the whiteboard. If you want, I mean, like, you know, but the work that I’m going to do on the whiteboard is in the paper but not the paper that you guys have access to. We’re working on it. The reason why you don’t have access to it is because it’s not written yet. The version that we gave you was a previous iteration of this method that was based on maximum likelihood. So if you read it, then that’s what you read. We realized that it could be generalized to be a GMM framework with much weaker assumptions, and so we’re going to talk about that framework today. Once that paper’s finished being updated and written, we’ll post it to bioRxiv, and then you’ll have that one. But, oh yeah, so that’s why. The plan was to have done this before the Institute, but you guys only give me like 10 minutes a day to do any of my own stuff.

The idea here is; we’ve been talking about. For our people doing genome-wide association or any study, what’s the first thing we ask them about? Sample size and power. And so, this is the problem that we have. So, a lot of times, we want to look at outcomes, and we just don’t have big enough samples, you know, and so if you have a really small sample, you don’t get very many hits, you have weak polygenic scores, and so this is a problem. However, in a lot of cases, there are people who are studying different but related outcomes. So, for example, what I’m going to show you today is, you know, there’s a big or like medium-sized GWAS of depression with maybe 100 to 300 thousand individuals, and we also have a GWAS of subjective well-being, so how people respond to the question “how happy are you?” Now, these two things aren’t exactly the same thing, and so you might think that it’s probably incorrect just to meta-analyze them together. But they seem like they’re probably really closely related. We’ve seen, Raymond showed us the LD score regression results, and we see that knowing the βs for one trait, the effect sizes for one trait, is related to the βs for another trait, and so it would be nice if we could figure out a way to get the information out of the GWAS for subjective well-being and import that into a GWAS of depressive symptoms or the other way – just so that we can increase our power and get all these nice benefits that we get from running well-powered studies. So, what’s the right way to do this?

So, MTAG, let me just talk about some of the hurdles that we’re going to face in trying to design a method to do this. First off is, in a lot of cases, we don’t have the individual-level data. So, you know, we just get people to release their GWAS summary statistics because that’s something that they’re allowed to post without any privacy issues. But, you know, we don’t have the actual genotypes of the individuals. The method that we want to come up with is one that can use just these publicly available summary statistics. Another problem is that the summary statistics often include the same individuals, and so with the UK Biobank coming out, if the UK Biobank asks the question about the phenotype that you’re interested in, then, you know, it’s going to be in this GWAS and that GWAS, and so we’re going to have a lot of overlap in what’s used. And the reason that this is a problem is if we took the summary statistics for one trait that was estimated in the UK Biobank and for another trait that was estimated in the UK Biobank, they may co-vary partially because they’re genetically correlated, but they’re also going to co-vary because they came from the same sample, so the estimation error is going to co-vary. And so, we’re going to need to be able to address that, especially because a lot of times, we don’t even know how much overlap there is. We suspect that there might be some, but we’re just not sure. Another thing that would be useful. There are a bunch of methods that allow you to jointly analyze different traits, but it would be great if, when we were done, we could get estimates that we said, “Yeah, these estimates correspond to the trait that we’re interested in.” So, we don’t want to get results and say, “Well, because we jointly analyzed this with depression and subjective well-being and whatever else we threw into it, it corresponds to some weird composite phenotype that we don’t understand.” It would be nice if the results that we have actually correspond to the trait that we’re interested in. And just, a computation issue, it should be nice if we could actually get them in finite time. So, MTAG, the reason that I’m excited about it is because it deals with all of these really practical issues that we deal with in running these analyses.

So, if you read the paper, which, of course, you did. You saw that we originally made this assumption, based on this random effects model. So I don’t think that we talked about this super carefully, but in the LD score regression, as you recall, we were treating these βs as if they were random variables and that may or may not be just a little bit weird, but it seems to get us a lot of mileage and so we just run with it. And so, we’re going to do that again.

In the original version that you’ve read, we made this assumption that the βs were normally distributed with mean 0 and variance-covariance (and I called it Ω). Okay, so these [annotates β and 0] are all vectors, and this [points to Ω] is a matrix. But so what these βs here – so previously when we talked about a vector of βs, we meant it was the effect size for one trait across a bunch of different SNPs, so if we had 10 million SNPs, β had 10 million elements, one for each SNP. In this case, I should actually put a j; that would make it more clear. In this case, we’re going to assume that the β corresponds to a single SNP but for a set of T traits. So, if we’re looking at three traits, then β has three elements. And so we’re going to say that they have mean 0, and I’m going to say that they have a variance-covariance, which I just defined to be Ω. So this is going to be related to the genetic correlation, because if the βs are correlated, then these traits will be correlated genetically.

So, I said this is the one that you read, but we’re going to relax this and instead just say that the expected value of β_j is equal to zero – these are vectors. And I’m going to say that the variance of β_j is equal to Ω. So, I’m not assuming anything else about the distribution, as long as we just know what the expected value is, and then we assume that there’s this Ω here. So I’ve not put a subscript on this node, and so this is actually going to be one of the key assumptions that allows us to get the mileage out of MTAG: that there’s this Ω that exists that corresponds to every single SNP that determines the relationship between these effect sizes. So that’s big – we’re going to go back and talk about how this might be violated and then how that’s going to affect all of the results that I’ll show you.

So given this, MTAG is a method of moments estimator, and so for those of you who haven’t seen method of moments, I thought we would just take a minute to talk about what that is. And so, we’ve actually seen a bunch of moment methods since we’ve been here; so for example, David, when he was talking about the twin lectures, that’s a moment method. And the idea is we need to have some function of the data that has some expectation that we know about. And so, for David, our moment method was that the phenotype, y. One second, let me think for a minute, I’m going to write this differently. So we would say that the correlation between monozygotic twins for the outcome is – I should have written this down before because I’m going to mess this up – we have some additive factor. I’m really bad at twin models, I shouldn’t have even gone here. We have some correlation, yes, A squared plus C squared, right, good enough, right? Okay, that’s right. So, this is going to be one moment condition, so we know something about our data. We know, and we know something, and we want to estimate these parameters. And then for the R squared for dizygotic twins was just one-half A squared plus C squared, right? So now you know, we can estimate this from the data. We can estimate this from the data. And so, now we have two equations and two unknowns, and we can just solve it all, right? And so that’s why it’s called method of moments, because we have these two things which we call moment conditions and we solve for them.

So this is fine, it’s perfectly identified, we have exactly the number of equations and unknowns. Let’s say that we also had, like, non-twin siblings. We could add another moment condition like this, and under some assumptions about how we’re treating siblings versus twins, like even though they’re different ages, these relationships hold, we could add a moment condition like this if we wanted. And note that this is adding more assumptions, so these moment conditions are defining the assumptions of our model, but now we have three equations but still only two unknowns. So, we could take, like, these 2 [gestures to first and second equations] and solve for A and C or we could take these two [gestures to first and third equations] and solve for A and C. And it might give us different answers, right?

So, the question is then, how do we deal with that? Well, generalized method of moments is dealing with the types of questions that are like this. So I could rewrite each of these so that they’re equal to zero, all right? So I can do this – I’m just subtracting these terms onto this side, right? So, what we could do is we could instead of saying that these are going to be exactly equal to 0, I can make each of them as close to 0 as possible. But then, we still have this decision about which one are we going to put the most weight on. And so, we could put more weight on this one [gestures to second equation] and less weight on this one [gestures to third equation] if we think that this one’s [gestures to second equation] maybe more reliable, but we’re making this kind of weight decision. But then you could minimize, so we’d actually estimate these things. So for certain values of A and C, each of these are not going to quite be 0, and so there’s some sort of error. And we could then maybe square that error and then minimize the square of the error, or we could do a bit of weighting across the moment conditions. And so that’s the general principle behind generalized method of moments is we have more conditions than we have covariances, and so we need to have some weight function, but we’re still okay and the estimate that we get is going to depend on the weight function.

Okay, so we’re going to take this now into the MTAG model. So let’s say that I have a GWAS summary statistic for some trait, so this is β-hat for SNP j, trait s – just one of my traits. And I want to find what the best linear predictor of this GWAS coefficient, conditional on the effect of a different GWAS coefficient for a different trait. And so, just using – if we know our OLS properties – expectation, this thing is going to be equal to the covariance of β-hat (I’m going to leave GWAS off everywhere just because all of these are going to be GWAS coefficients, so if it’s a hat, I mean the GWAS estimate, just to make notation easier) so “J, s” and “J, t” – so this is the effect of the SNP on trait t [gestures to β_j,t] and this is the estimated effect on trait s [gestures to β-hat_j,s] – over the variance of β_j,t.

Audience question: Don’t you need a hat?

Patrick: On this one [gestures to β_j,t], yeah, no, no, because I just told you I didn’t need it because what I’m trying to predict the GWAS estimate as a function of the true value. And so, in this case, I’m saying I want to regress the GWAS estimate on the true value.

The coefficient of such a regression is going to be this thing here [gestures to β-hat_j,s]. So because this is the estimate, it’s equal to the true value plus some error, and that error is going to be independent of the actual effect size. So, the covariance of the estimate with this true value here is just going to be the covariance – I mean, I could write this as β plus E and then say that E is independent of β_j,t and then erase it, but I’m just going to erase the hat to skip those steps.

And so now, we have the covariance of the effect on trait s, the covariance of s and t over the variance of t, and so we know what those are because of this assumption that we made here. And so, I can just write. I’m going to call the t,s-th element of Ω, ω_t,s, and I’m going to say the t,t-th element is ω_t,t. β_j,t.

Okay, so now, we have a moment condition because I’m saying the expected value of our GWAS estimate is equal to this thing. So, I could subtract this over to the side, and then we have one of these moment conditions for every trait. If we were to predict the GWAS estimate for trait 1 using trait t, that’s one element. If we did it for trait 2 using trait t, that’s a second element. We’re going to have a vector of moment conditions, and so I’ve rewritten this in vector form here [note that slide has not changed for video – go to minute 25 for reference]. And so, in this case, β-hatis our vector of GWAS estimates. So, instead of writing a whole vector of ω_t,s’s, I’ve turned this into just a vector here. I call it ω_t, and that’s just a vector of the covariances between the effect size for trait t on everything else. This is just the same, and this is just the same [points to slides]. So, now, now we have some moment conditions. You have T of them, okay? We have moment conditions, and so now we can solve them.

Audience question: [Unable to hear on video].

Patrick: So, I guess I should have said this up front. So, what we really want to know, we’re trying to get an estimate we want, is β_j,t, which is the effect of SNP j on whatever trait t, whatever trait t we’re focused on. So, if we want to know the effect of SNP j on depression, then t is depression. And so, you know, even though I’ve used a different subscript here, s could be equal to t, and so it’d be like, “What’s the expected value of our estimate for the beta for the estimate for trait t, given the true value of trait t?” And you see how this reduces here. The covariance of the effect size with itself is just equal to the variance. And so, we have these two things cancel, and so the expected value of the GWAS coefficient given the truth is just the true GWAS coefficient.

Okay, I kind of just glossed over how we might get at these weights. So, if we have a vector of moment conditions, which I’m just something call m – so this is my principles of GMM, again – just to give you some notation, so in this case, we might be interested in some parameter θ. In this case, θ is β_j,t. We have some parameter θ, and this is a vector-valued set of moment conditions. And so, so m of θ, in this case, is this thing [points to upper half of slide] of β_j,t. And then the weight function, we have some big matrix here, and that’s just a matrix of θ. So, when we do generalized method of moments, this is –when I was saying, we do kind of a weighted sum that we’re minimizing, this is actually the thing that we’re minimizing [points to equation on board]. And so, if you’ve had a little bit of background with linear algebra, this sort of function is called a quadratic form. And even though there’s a vector on each side of these… let me try to think of the right way to explain this. So, let’s assume that W was equal to I – so this is the identity matrix; it’s just ones down the diagonal. When you multiply this out, what you’re going to get out is just the sum of the squared moment conditions, which is how I described it at first. But if you wanted to give more weight to a certain moment condition, you would maybe just change the diagonal element corresponding to that moment condition to be a little larger or a little smaller. You also have all these off-diagonal moment conditions that can account for potential covariance between your moment conditions.

And so, we have to pick this W, and the question is, how do we get at it? Well, by the properties of GMM, any W, in principle, will get you to the right answer. It’ll be a consistent estimator if you have a big enough sample; you’re going to converge on the truth no matter what you choose. But maybe we want something that’s more efficient, something that’s going to get us to the right answer more quickly, and lots of theory about GMM, it turns out that the right weight matrix, the efficient weight matrix, is just equal to the variance of the moment condition, inverse. So, if we want to figure out what the efficient GMM estimator is, we need to take the variance of this thing [points to top half of slide].

And so, I’m not going to work through that carefully, but I’m going just kind of point out where the different pieces come from. These are missing inverses as well. This should say inverse; this should say inverse; I just noticed that [points to bottom half of slide]. But this matrix in the middle is actually the weight function that you get, and you can kind of see that.

So, the variance of our estimate... oh, I never showed this last thing. I need to find some notation. The variance of β-hat [sub j] is equal to is equal to the variance of β_j plus the error ε_j. All right, and then assuming these things are independent [points to β_j and ε_j], this is going to be equal to the variance of β_j plus the variance of ε_j. Okay, we’ve defined this thing [points to β_j] to be Ω from here, and then we’re going to define the variance-covariance of the error to be Σ_j. Now, I put a subscript on this one [points to Σ_j], and the reason I’ve done that is because this is supposed to be the sampling variance and so it’s supposed to capture… you can think of this like at a baseline as just the standard error, so the standard error is measuring how much variance there is due to the error in your estimate and that’s what this is supposed to do. And if, for one SNP, you have a really large sample size, but then for another SNP, maybe a few cohorts got dropped and so you have a smaller sample size. We’re going to allow for that to happen by putting the subscript here, and so when you have a bigger sample, Σ is going to be smaller to account for the more precise estimate there.

Audience question: [Unable to hear on video].

Patrick: Yeah, so it depends on why there’s a difference. So, if the variance is different just because there’s a difference in an estimation error because of sample size differences, then it will capture that. If the difference is because we actually think that Ω is different across SNPs, then it won’t be able to capture that. We’ve had to assume that Ω is the same for all SNPs, and we need that in order to get our estimates.

Let’s just think for a minute. So, we want to know what the true β is, the true genetic β, absent any confounds. So, I’m going to treat β throughout as true β, not including stratification biases or anything like that, and I’m going to actually throw anything that causes the GWAS estimate to deviate from the truth, and I’m going to store that in here. So, now, Σ not only includes a sampling variance, but it also will include stratification bias and things like that. And so now, when we’re estimating, when we want this β_j – I mean true β_j, like not β_j with error in it or like biases in it. So we can take our moment condition, so I told you, it’s the variance of the moment condition. So, the variance of β_j, we worked it out here, is Ω plus Σ_j. And the variance of this term, with a little bit of work, you can get this term here. So that’s where these come from.

And so then, we plug this into our objective formula here, we do some calc. We take the derivative with respect to β, solve for 0, and what tumbles out is this guy [points to bottom formula on slide]. And so this is great because we now have a closed form solution for the efficient GMM estimator. The only assumption that we’ve made is this one here [points to initial assumptions on slide]. And so, so, yeah, so it’s clear kind of where identification is coming from, and so we can now use this to apply it to different traits.

Audience question: [Unable to hear on video].

Patrick: So you’re saying you could add additional moment, so we would have not only just one for each trait but we would have T squared of them. We could choose to not only vary s, but we could also be varying t. Is that kind of what you mean?

Audience response: [Unable to hear on video].

Patrick: Okay, well, so the problem is going to be that we need to come up with this Ω, we’re not going to just kind of make it up. We’re going to estimate it, and so we’re going to estimate it using all SNPs. And so if the Ω only holds for some subset of SNPs and we don’t know which subset that is, then I don’t know how you estimate this parameter. Like, in principle, you could say, “Let’s run MTAG and let’s restrict ourselves just to, like, coding SNPs,” and maybe the set of coding SNPs have a different Ω. You could estimate the Ω among the particular type of SNP that you’re interested in, but you’d have to make a choice before you get started that there is some class of SNPs that all have the same Ω matrix.

Audience question: [Unable to hear on video].

Patrick: Okay, I should have explained that, and I didn’t. Okay, so the question is, we spend a bunch of time just talking about LD regression. Okay, I don’t want to talk about that. I’ll talk about that after we show results. Well, so Ω, I mentioned, is what’s capturing the genetic correlation, right? The correlation of the effect sizes is going to live in here, just because that’s what we’ve defined it to be. Now, when I was talking about one of the reasons that you can’t just take the correlation of your GWAS coefficients, it’s because the error is correlated due to overlap. And so, by including this Σ, if there is overlap across your traits, what that’s going to mean is that this Σ matrix that you have has nonzero entries on the off-diagonal. So, if they were all estimated independent samples, then the error is going to be uncorrelated across traits, and so this thing will be diagonal, because the error for each of your estimates is uncorrelated because they’re drawn from independent samples. But let’s say that it’s all the same people, then that’s going to mean that your error covaries. By allowing the Σ matrix to be non-diagonal, then that’s how we deal with the overlap problem.

I want to show you just some quick results of what happens by applying this, and then after we get through that, and I want to talk about some of the limitations and how strong the assumptions that we’re making and how far we can take these results.

Audience question: [Unable to hear on video].

Patrick: It feels like it’s going to be similar, but I’d have to think about it. Yeah, because seemingly unrelated regressions allows you to deal with correlation and the error, right? Okay, so it feels like it’s going to be related concepts, but I’m not sure.

Okay, so, for our application, we took this as an expansion of the sample from the Okbay et al. subjective well-being paper that I think you guys have already seen. And so, we were looking at depressive symptoms, neuroticism, and subjective well-being. And so, our samples all come from different places, and so one of the biggest additions relative to the Okbay paper you saw before is the inclusion of a large 23andMe sample. But you can see that we have a big sample for subjective well-being, a big sample for neuroticism. In the UK Biobank, we have measures for all three of these phenotypes. And what MTAG is going to be really helpful for is this 23andMe sample. We have this really big sample. They overlap, we presume. We have no idea how many people are in this overlapping segment, but we want to be able to combine the information across these things. And so, MTAG is going to allow us to analyze all these together despite the large amount of overlap across these summary statistics.

So, to give you a sense of how much you gain by applying MTAG, so this is a Manhattan plot of a GWAS for depressive symptoms. So, we’ve expanded our sample size a lot, and so we’ve already jumped up to 30 SNPs from – [asks audience] how many did we have before?

Audience response: Three.

Patrick: Okay, we’ve gone from 3 to about 30.

Audience comment: Two?

Patrick: Just because of our expansion of the GWAS. Once we apply MTAG, we jump up to 64, so we’re gaining twice as many hits because of this. For neuroticism, we start with 9 and jump to 37 after MTAG. For subjective well-being, we started at 15 – okay, I was, like, does that say 19? It says 49. I’m, like, I thought that it was much bigger than 15 to 19, and it doesn’t look like 19 X’s here either.

Okay, so victory – are we good? Like do you believe all of this? You shouldn’t, at least not yet. David can’t because he’s seen everything else that I’m about to show you. So someone sees doesn’t say, “No way, like, these aren’t reliable.” This is your crazy new voodoo method, like, “I don’t believe you.” So what can we do to help show that maybe we’re okay?

Audience response: [Unable to hear on video].

Patrick: I’m talking about something even easier than that. Whenever we want to talk about credibility, what’s the first thing we should do? Replication! So, let’s replicate, okay?

So, let’s say that what I want to do is I take the subjective well-being, all of the top hits from subjective well-being, and I have this replication sample and I’m going to compare my hits from MTAG to my effect sizes in my replication sample. Okay, I’m just going to do a direct – I’m going to regress one on the other and see how it looks. How do you feel about that?

Audience response: [Unable to hear on video].

Patrick: Yeah, yeah, they’re non-overlapping, so I’ve done a really good job. Why don’t we want to just do a direct comparison of the results from here to just results in an independent sample?

Audience response: [Unable to hear on video].

Patrick: Yeah, winner’s curse! I taught you useful stuff. So, if we wanted to do this comparison, if we didn’t deal with winners’ curse, what’s going to happen is we’re going to look at our top hits here and then we’re going to look in our replication sample, and these hits are all going to be a lot bigger or bigger – I don’t know how a lot, it depends on how noisy this is. And so, we might show this to someone, and they say, “Oh, well, you know, MTAG’s just cheating. Look, your replication sample, your effect sizes are a lot smaller,” because they know they’re obviously a lot smaller. And even better than showing them that, you can just fix it and so they don’t worry in the first place. And so what we did is we took these hits, and we did a winner’s curse correction, exactly like the one that you guys are doing in your problem set. Almost exactly like the one that you’re doing in your problem set. There’s a little bit of nuance we just discussed in the problem set. You guys can read through that. Well, yeah, so we’re going to shrink these according to how much winner’s curse we have, and then we’re going to regress these estimates or we’re going to regress the replication estimates onto the estimates, the genome-wide significant hits that we have here. And if we replicate, then expect a slope of about 1.

So what I’ve plotted here is the estimates of the slope for the genome-wide significant hits for the 3 traits, and it looks pretty good. I mean, these are kind of noisy because in our replicate our replication sample here is the Health and Retirement Study and Add Health. And so, it’s not a really big sample and so the estimates are kind of noisy, and as a result, this regression is a little bit noisy. But in every case, you know, we can reject 0, which means that these new hits that we’re finding are likely to be something real, and we can’t reject 1, which is probably about as good as we could expect to see for something like this.

Great, so that’s replication. Another kind of replication you think about is what if we did prediction?

Audience question: [Unable to hear on video].

Patrick: So in this case, I just used the genome-wide significant hits.

Audience response: [Unable to hear on video].

Patrick: The number of hits you’re going to get, so do you mean hits, SNP hits? What kind of hits are you asking about?

Audience response: SNP hits.

Patrick: So what we did is we took this – so the HRS was in one of the discs, it was in the subjective well-being discovery sample, and so the first thing that we did is we removed it from discovery. Then we applied MTAG, and it gave us a set of between 40 to 60 hits and those are the ones that we used. I don’t think that probably any of those hits are actually genome-wide significant in the Health and Retirement Study. I don’t even know if any of them are Bonferroni significant. But on average, they should be the same. And so when we do this regression… I’m not sure if I’m answering your question because I wasn’t totally certain what you meant about the replication hits.

Audience response: [Unable to hear on video].

Patrick: Yeah, so we have 60 hits that are significant in our discovery sample. We only do the regression on the 60. You’re asking if we did it with all.

Audience response: [Unable to hear on video].

Patrick: So in the replication regression, it’s just kind of one number. We’re taking the effect sizes, the effect size estimates. Maybe I didn’t explain this very well. So, what we did is we did a GWAS just in the HRS, right? And so just for the 12,000, however many that are in there, and none of those are significant because the HRS is way too small.

Audience response: [Unable to hear on video].

Patrick: Yeah, yeah, so in expectation, these two estimates should be the same, assuming that MTAG is working. But our estimates in the replication sample are super noisy, which is why we’re going to test all of the genome-wide significant hits at once, because even each of them are very noisy. But if we do them all in a regression, we’re hoping to say, on average, do our hits replicate?

Audience response: [Unable to hear on video].

Patrick: Yeah, so if we had a bigger replication sample, we could even do a SNP by SNP.

Audience response: [Unable to hear on video].

Patrick: Yes, that’s true. And in fact, we kind of want to do this. So, UK Biobank is going to be available in, like, a couple of weeks or however soon, and with the additional 300,000 from the UK Biobank, in principle, we could do a SNP by SNP replication of all of our hits. So, and we intend to – I think it would be interesting.

So, I don’t know if that’s actually you meant, but what I thought you meant at one point was, so this is just to check to see how confident we are in the set of hits. We might be interested in how confident are we in general across all sets. We’ve done the GWAS in all of the hits. So, so do we have maybe some – are we going to get an increase in performance throughout the whole genome? So, we could do this with prediction. So, what we’re going to do in this analysis is I’m going to take the GWAS summary statistics and I’m going to use and create a polygenic score like you’re going to learn all about next week, just based on the estimates from the GWAS summary statistics. And then I’m going to also do the same thing. I’m going to take the MTAG summary statistics and make a polygenic score just based on those, and I’m going to then go into our replication sample and I’m going to see how well the GWAS predictor does relative to the MTAG predictor.

And so, that’s what we see here. So, none of these traits are incredibly heritable, and so kind of you notice that, in levels, each of these things are not incredibly predictive. They, you know, they all predict in between 1% and 2%, but what we can see is that in all three cases, when we start with the GWAS and then go to the MTAG predictor, we’re seeing pretty substantial – these are about 25% increases in the predictive power of these scores. Now, it’s a little deceiving when you look at these standard errors because you say, “Oh, but those aren’t significant differences,” right? The problem is that these estimators are highly correlated. So, if we actually just tested the difference between the levels of those bars, we actually have statistically significant increases in all three cases. And so, this is helping us be even more confident. We are getting additional information. Our predictor is getting even better because we’ve applied MTAG. So, you know, it helps us think that it’s even more credible.

Another claim that I made was that it could just be that these summary stats we’re giving you are just this weird Frankenstein GWAS that’s contaminated by all of these different traits. And so, you know, but I’m going to claim that actually the GWAS summary stats from MTAG are giving you trait-specific estimates, and so the GWAS summary stats is for GWAS and not for neuroticism. And so, in order to test that, what I’m going to do is I’m going to take the score that we made for depression and the score that we made for neuroticism and the score we made for subjective well-being, and I’m going to try to predict depression with each of those scores and I’m going to do the same thing for each of the other traits. In all three cases, the score for the own trait predicts better than the score for the other two. So, you can see, depression’s the best here: the depression score. In the middle one, neuroticism, the best score. And for subjective well-being, we see that the subjective well-being score is the best one. And again, if we want to look at how significant these differences are in five out of six cases, the own trait score is more predictive than the other trait score. So, you know, even this claim about that we’re getting phenotype-specific information by combining these traits, it seems to be valid.

Okay, so then we say, “Well, what can we learn? Is there anything that we can...?” Oh, go ahead.

Audience response: [Unable to hear on video].

Patrick: Yeah, sorry. So, this is how much better the depression score is than the neuroticism score, and this is how much better the depression score is than the subjective well-being score. So, a positive number means that the own trait score does better than the other trait score. So, yeah. So, depression does this much worse than the neuroticism score when you want to predict neuroticism.

So we might want to be interested in what can we learn about pathways and biology and what’s involved, and looks like we get a lot of information as well. So, these are results from DEPICT analysis, James did all of these for us, and so we’re going to use DEPICT and say, “What are the tissues that seem to be implicated in the GWAS summary statistics?” And you see there’s, like, maybe a little bit of enrichment here in the nervous system, so that’s seems reasonable. Depression, nervous system. When we apply MTAG, we see enormous enrichment in the nervous system, plus a little bit in neuro stem cells in the retina, but the retina is just a neuron, so this is totally reasonable. We’ve added additional information, and we’re seeing inflation kind of where we expect to and then we could then dig into what those tissues are if we want to try to better understand what depression is.

You can do the same thing looking at gene sets, and so these are the lists of the gene sets identified by DEPICT, and you can’t really read this, so the SNPs that we’re finding for the depression results are things related to fear response or things related to synapse. I think that James was telling me that one of these gene sets is a mouse phenotype that measures how quickly they give up when you put them in mazes or something like that. James can correct me if I’m wrong, but I like that story.

Audience response: [Unable to hear on video].

Patrick: Because they’re so sad. Yep, but, again, it seems like we’re trying to study depression and it seems consistent with kind of the bio stuff that we already know, and so this is kind of exciting, interesting things.

And so, you know, I’ve shown you these results here. I can also give you a preview. So, cognitive performance, I think, is something that a lot of us are very interested in. It’s really hard to study because it’s only available in very small sample sizes, right? So, if we took the largest GWAS of cognitive performance and we use the summary statistics to make a score for that, the predictive power of that score is 1% or something along those lines. Robbie King knows this better than I do, so you can tell me if I’m wrong. If you, instead of taking the cognitive performance score, actually use our education analysis and you created an education polygenic score and tried to predict cognitive performance, now that might be reasonable because these two things are related. But we have a much larger sample for education and so we might think we can do better – and we do. We get about 6% predictive power if we try to predict cognitive performance using a score that’s designed to predict educational attainment, alright? But if we MTAG the education results with the cognitive performance results and then create a cognitive performance MTAG score, the predictive power jumps to 8.5%, and so this is like – I’m really excited because before these cognitive performance scores were just not big enough to be useful, and suddenly using methods like this, we’re getting enormous… I mean, 8.5% is enough to start doing some really interesting polygenic score work.

This is a QQ plot, also, of what happens if you look at the number of hits that you get. So, cognitive performance, I think, in the largest, most recent study had, like, 30 hits or so, something on that order. I think that when we MTAG it, we get 192 genome-wide significant hits for cognitive performance based on MTAG. So, this is exciting, exciting times. This is why Dan is like, “Oh, maybe we’ll start seeing it.” I’m excited. I hope that everyone in the world gets excited about it.

There’s a couple problems, though. So, let’s go back to some of the assumptions that we’ve had to make along the way. So, the first thing is not really an assumption. It’s just a practical problem. Now, this whole time, I’ve just been using this Ω and this Σ, and I just said, “Yeah, we know what those are. It’s great, right?” We don’t actually know what they are. We have to estimate them. And because we have to estimate them, it’s going to add additional noise into our estimator.

As an aside, let’s chat a little bit about, just briefly, about how we’re going to estimate them. So, first, let’s talk about this Σ. I claimed before that this is capturing both sampling variation and biases due to things like stratification. So, we’ve seen something today that measures the variance due to sampling variation and biases. Who can remember what that thing is? We’ve only talked about one thing today before this. So, in LD score regression, we had the slope and we were focusing on these heritability estimates, but the intercept, when Raymond puts this on the board, the intercept was… I’m running out of space, I’m going to erase this because who cares? So, we saw that the expected chi-squared statistic was equal to a thing that was proportional to the LD score plus 1 plus Na. Alright, so this term [gestures to “1 + Na”] here is what we’re getting when we get the intercept. This 1 is variation due to the sampling – this is what we expect if there was just sampling, and Raymond tried to claim that this a here was variation due to biases and I believe him – Raymond’s a smart guy, you know. So, in order to get these estimates along the diagonal, we’re going to use the intercept from these univariate LD score regressions, okay? And then I claimed earlier that the off-diagonal portions here captured things related to sample overlap or correlation of biases, and that also sounds like something that Raymond was talking about earlier today, all right? When he was... earlier today, we had the expected value of z₁, z₂ is equal to a thing proportional to the LD score plus this thing related to… but there’s this term here that was related to the sampling overlap and it turns out that this is exactly the term that we want in our off-diagonal terms for Σ as well. And so we’re able to use LD score regression to get that as the estimates here of this Σ. And remember that the Σ is what allows us to use summary statistics with the overlap, and so it’s a really important term. We’re really building hard off of LD score regression and taking advantage of some of the results that we talked about this morning.

And then to get Ω, what we do is we just look at the actual variance of the... we can take all of our β-hats and estimate the variance-covariance matrix of the estimates and we know that the variance-covariance is equal to Ω plus Σ. We have an estimate of Σ, so we just subtract it out. And so when we want to get an estimate of Ω, we’re just going to do this directly by subtracting out the Σ that we estimated from the variance-covariance matrix of the Σ here. And there’s a method of moments interpretation for that too, but I think it’s pretty easy to see why this works, okay? So I said we estimated this, they’re noisy estimates. But when we plug them into MTAG we didn’t do anything to account for that noise, which might make us worry that our standard errors from MTAG are going to be too small because those standard errors are only capturing variation due to sampling in the beta hats and the GWAS estimates and not sampling in these variance-covariance matrices.

This is going to become especially a problem if you want to apply MTAG to lots of traits, right? Because the more traits you add, then the number of elements in these matrices is going up by the square. And so, as you add more traits, there’s more noise and there’s more noise that we’re not accounting for, and so we might think that MTAG is going to do worse and worse and worse as you add more traits. And that’s actually true.

And so, what we’ve done here, we’ve done a set of simulations where we generate summary statistics, but because we’ve generated them, we would generate just pretend summary statistics that we know what the Ω and the Σ are. And so, we run MTAG using the true values of Ω and Σ and then we run MTAG using the estimated values of Ω and Σ, and we look to see how much inflation there is in the test statistics in that case. And so, you know, for just one trait, there’s not really much inflation. In fact, even when you have, like, 2 or 3 traits, it looks like the inflation is not that large. But so, the mean chi-squared is a measure of the power of the GWAS. So, if you take the mean chi-squared statistics from all the chi-squared statistics from your GWAS and just take the average of them all, the mean of them all, then we use that as how powerful the GWAS is to give you a sense of what 1.4 is what we see in our three traits. And so, depression, which has a heritability of 4% or 5% in our data and a sample size about 300,000, that’s about 1.4. Or neuroticism is more heritable. I think it has a heritability of 0.8, but it’s available in only 150,000, so that’s also 1.4. This number here scales with sample size – and so, a GWAS that has a mean chi-squared of 1.1 is 4x as small as a GWAS that has a sample size of 1.4. And going from 1.4 to 2 is like 2.5x larger. And so you can see that if you have a low-powered GWAS and you combine a bunch of low-powered GWASs together in this way, as you get up to like 20 traits, the inflation goes up about 3%. So, I mean that’s maybe not a ton, but maybe it’s enough to worry about. However, when you have kind of medium to high-powered GWAS summary statistics, the error, the problem that I was telling you may exist, doesn’t seem to be a problem.

Audience question: [Unable to hear on video].

Patrick: Okay, yes, so when I say “traits,” I mean GWAS, and that’s an interesting question because there’s nothing about a trait that means that it’s a different thing. So, we could, for example, use MTAG and use it as a form of meta-analysis, right? So, let’s say we have six cohorts, and those cohorts maybe come from overlapping samples and so we can’t just meta-analyze them because there’s overlap across the cohorts. However, MTAG knows how to deal with the potential overlap between your cohorts, and so you could maybe instead of just doing a standard inverse variance weighted meta-analysis or sample or something like that, you could just pass it in to MTAG and MTAG should fix those problems – except for this thing here.

Right, so in educational attainment, EA2, how many cohorts did we have? I see, like, 70?

Audience response: [Unable to hear on video].

Patrick: Yeah, so 64. So we’re way out here – I’m leaving the camera and I’m back. We’re three times further over there for something like EA2. So, if we had meta-analyzed EA2 with MTAG, each of these cohorts has a really low mean Chi-squared, probably like 1.01 or something on that order, and so the bias, it goes up through the ceiling for... for something like that. So you need to be a little bit careful in applying those settings, but if you have moderately powered cohorts but you’re looking at the same trait, then MTAG in principal could work.

Audience question: [Unable to hear on video].

Patrick: So actually, this is only just for one trait at a time. Everything I’ve shown you is under a model where we only care about the single trait, but we have information on these other traits. I mean, the MTAG estimator has these whole matrices in them, and so I don’t think you can get it out. I mean, maybe you can. I don’t see how one would.

Audience question: You have two proxies for some underlying trait, and you have different samples where you have estimated the effect of some variable on those proxies. You basically do this to get a better estimate. Seems like, say you need that sample of a 100,000 to have power, so couldn’t one adjust these methods for the same thing to a sample of 10,000 with 10 different traits that proxy for the same thing and basically get kind of the same result? It seems like if you do this correctly, you can basically stop to worry about all this and look at proxies within the same sample. You can basically just use one of these samples.

Patrick: So the question is, so let’s say you have thirty traits and they’re in three groups, right? And so, you could just first meta-analyze each of these three groups and then do MTAG on each of these three groups. Is that kind of what you’re saying or no?

Audience response: No. Here you are basically working on the summary stats level, because you don’t have access. Let’s say you have Add Health, which only has 10,000 people, but let’s say you have 50 proxies for educational attainment. It seems like each proxy is the same underlying plus noise, right? You could rerun everything for each individual and then basically cluster at the individual level to take care of standard errors, but you should be getting 10x the number of observations. I’m sure you could play with the math – that’s why I was asking about the Ω. Here, the Ω you are using the β_js, all the betas in each cohort, and correlate to find the Ω. But let’s say you have only one sample.

Patrick: Yeah, you could. So, if you just did a GWAS, so let’s say we just had UK Biobank data, right? And so, we do a GWAS of… so the UK Biobank has data on depressive symptoms, neuroticism, and subjective well-being, so we could do a GWAS on all three of those traits and then pass those straight into MTAG. And the overlap is going to get addressed by this thing. So given that we have the individual-level data, there may be more efficient things that could be done and methods like that exist.

Audience response: The simple thing is you can calculate what Ω should be.

Patrick: How? If you just have the individual-level data, why can you get Ω any better? Because you’re still just estimating the SNP effects.

Audience response: [Unable to hear on video].

Patrick: Oh, Σ. So Σ here, we wouldn’t have to estimate. Well, we still kind of need to estimate a little bit. It’s not quite as bad because if you have perfect overlap, then Σ turns out to just be the covariance variance matrix of the phenotype itself, approximately.

Audience response: [Unable to hear on video].

Patrick: This was designed to do kind of to deal with very specific data constraints that we face, which are the ones I showed you before. We don’t have the individual level data in the first place.

Audience question: So suppose you really had like 20 phenotypes. It seems like you maybe could do better and get less inflation if you either make your Z-scores and then take an equally weighted sum or maybe you do the simple regression predictor from the other 19 things. And then you just have 2 things and you kind of impose that. It would be interesting to see those simulations.

Patrick: Yeah, so things that you could do if you’re willing to make assumptions about... So, for example, if you wanted to assume that your traits are perfectly correlated or that we know stuff about the traits, so if we’re doing meta-analysis and we really believe that we have the exact same trait in every case, then we can impose that assumption. So, if it is the same trait and the only reason that they’re different is because of noise, then that means that the Ω should correspond to perfectly correlated traits, and so we can impose this assumption directly on here and that should give us a little bit more precision.

Audience response: I’m saying that might be a robust way to do things when the assumption is wrong.

Patrick: Yeah, so, we actually explored that a little bit in the paper – what happens when you just make these assumptions and fix these parameters? In general, we do a little bit better when we allow them to be free, but I mean, it really depends on how close. So, if the genetic correlation is just 0.95 and you fix it to 1, you’re not going to gain a lot by allowing it to a free parameter.

Audience response: I was talking about the case where you have a huge number of phenotypes.

Patrick: But even then. So, if the truth is close to whatever assumption that you’re making, then you probably are fine just fixing it to that assumption and you probably gain a little bit, in fact, since there’s less noise. But if it’s not quite there, then you know how much you gain depends on how good your assumptions are.

Audience question: [Unable to hear on video].

Patrick: But then like that thing that you’re describing, it also can be described exactly in an equivalent way of just putting structure on the... So you know, in practice, you could just meta-analyze things you think are similar and then put it in. It’s equivalent to just doing MTAG with some structure on these parameters.

So, that’s just one problem. The other problem is actually a pretty major theoretical problem, and it’s this constant Ω assumption. Okay, so by this constant Ω assumption, it means that there’s kind of just one class of SNPs, and the relationship between the SNPs are the same no matter where you are in the genome. So now I’m going to imagine that I’m doing a regression, and I have two phenotypes that are pretty highly correlated just in general, but I have a SNP where it’s highly correlated… so, for the trait I’m interested in, it’s actually no effect size for that, this β is equal to zero for that one. But then for another trait that I’m including, it’s actually really important; it’s super strongly significant; it has a big, large effect size. So what’s going to happen in MTAG with something like that, if we assume that there’s this kind of strong relationship between the βs throughout, but it just so happens that there are SNPs that meet that criterion that I described there, where it’s actually no for the trait you want, but the other trait is strongly significant? Well, what MTAG is doing is it’s kind of using information from the other traits to get a better guess of what’s happening for this one that we want, and so it’s, like, “Wow this is... this is really strong, you are so confident that this is real. And then you have this one here, like I don’t really know, it’s kind of low. But this one’s for sure, it’s great, right?” And so MTAG is going to look at that and say,“Well, then this one’s probably great too, because you’re doing at correlation is like 0.8.” And so, it’s going to pull that away from 0; we’re going to be biased away from 0 in a case like that. And so, it may even pull it away enough that that it becomes genome-wide significant when it really has no business being genome-wide significant. And so, that would be an inflation of the false discovery rate because we’re going to find more of these SNPs that are actually null for the trait we want, but because they’re causal for a different one, those SNPs are getting pulled into the nominal category.

Audience question: [Unable to hear on video].

Patrick: So, this inflation I’m talking about here is because when we calculate our standard errors, we assume that β and Ω and Σ are known, but because they’re unknown, we probably should add a little bit more to the standard errors to account for that uncertainty, and so that’s why that’s why we’re getting inflation here. Whereas in this case, it might be even if we knew exactly what Ω was, we estimated there’s a genetic correlation of 0.9, right? Even if we knew that exactly, we would still have some of these SNPs that are being pulled away from 0 or we may still have some of these SNPs that are pulled away from 0, just because there are SNPs that are of the type that I described: where it’s actually null for the trait you want to know about, but nominal for the one that you’re just using to boost power.

I’m trying to think of what a good example of such a SNP would be. So like, let’s say, okay, who can think of an example? Raymond?

Audience response: And I mean, for both smoking and alcohol dependence, there are loci of large effects that correspond specifically to the metabolism of the substance. So, in alcohol, there’s a couple of mutations that make consuming alcohol extraordinarily unpleasant because of how you metabolize the alcohol and, thus, people with those mutations just basically never drink. In smoking, it’s a mutation in the cholinergic receptor that makes the nicotine very enjoyable. So you can imagine that while there is general overlap between ADHD and substance use phenotypes, that the effects of these substance-specific variants are indeed specific to the substance use and have nothing to do with ADHD. And so, we have exactly this kind of problem of a very large effect in one of those substance use phenotypes, but it is generally correlated with ADHD but we’re fairly confident that it’s basically null.

Patrick: So that’d be a problem, that’d be problematic. So, we would see, it’s like, “Oh, we’ve done MTAG, and we see that the nicotine receptor that’s really strong in smoking all of a sudden is associated with ADHD,” and that’s silly. I mean maybe it’s real, but it seems a little odd, right? So, given the story I told you, it should be pretty clear that this is going to be a bigger problem when the trait that we’re interested in is not very well-powered and the trait that we’re adding to it is super highly powered, because remember, this story, it’s like: “I’m, you know, when I say I’m really confident about this SNP, if we had a really large sample, we become really confident about that SNP, right? We’re just totally sure we’ve estimated with perfect precision and this one here is really noisy, and so MTAG’s, like, ‘Oh, well, there’s a lot of noise here, but you’re so confident.’” It’s going to have a lot of pull when the trait that we want is weak and the trait that we’re adding is very strong.”

So, I just showed you some results that were exactly like that, right? So I showed you what happens when you MTAG cognitive performance, which is a really weakly powered GWAS, with educational attainment, which is really strongly powered. And so, even though I just showed, “Oh, there’s 200 hits,” I think that you should be very, very cautious in thinking that those hits might be real, because this is exactly the type of circumstance I’m going to show you.

So, let me show you a picture to kind of highlight this. So, the big issue is that we don’t know what SNPs these are, right? So, there may be SNPs of the type that are going to break MTAG, but we just don’t know what SNPs those are. And we don’t even know what the joint architecture is: if there are any SNPs like that. And so, what I’ve done here is I’ve calculated the maximum false discovery rate under some assumptions about a class of genetic architectures. On the x-axis here, I have the genetic correlation between the two traits, and the y-axis, I have the maximum false discovery rate you can get given the correlation between the two traits, and so each one of these lines, for the trait that we’re interested in, I fix the mean chi-squared to be 1.1, so this is kind of the low power setting, and then I’m going to allow the mean chi-squared of trait 2 to get larger and larger and larger. Let me actually highlight something at this point here, so if the genetic correlation is exactly 1, there can be no SNPs like the ones I told you about. Why not? Why will there be no SNPs like the ones that I was explaining that break us? Well, it’s not the p-values that are relevant; it’s the βs. Well, if they’re estimated in different samples, the chi-squares might not even be similar. What’s relevant is the βs, right?

Audience response: [Unable to hear on video].

Patrick: If they’re perfectly correlated, r_β of 1, that means that one is just a constant multiple of the other. And so, if it’s causal for one, it has to be causal for the other. If β is 0 for one, it has to be zero for the other. And so, the false discovery rate tends to be limited when the genetic correlation is pretty high, but it’s in kind of this medium range where it can really get really bad. As one example, you can see false discovery rates up to like nearly 20%, so 20% of the things that you find are likely to just not be real or up to 20% could be – in the worst-case scenario. So, this is just a word of caution if you’re combining.

Oh, go ahead.

Audience question: Patrick, would this be that you take some SNPs across 5x10^-8 and understand which one of those are false discoveries?

Patrick: Yeah, that’s what I meant. So, I say of the genome-wide significant hits, how many of them actually have an effect of exactly 0?

Audience question: Could this be a problem in the following scenario? Let’s take IQ and educational attainment. So, there’s something that was previously not significant for IQ, but you throw together in MTAG and it’s significant for educational attainment. Maybe it’s false, maybe it’s not. We’re trying to figure out how big a problem that is. It seems that MTAG has the potential to identify completely novel SNPs, like you’re combining the power from both of these things – could this be a problem for that, too? You get what I am saying? It could be that you put IQ and educational attainment together, there could be some set of SNPs for both of those things, but if you put them together, MTAG might discover completely new SNPs.

Patrick: We had 30 hits for depression in the GWAS, and then we have 60 and MTAG, so 30 of those, in principle, are new. So, is that not what you mean, or what do you mean exactly?

Audience response: Gotcha. But were any of those 30 hits previous hits in neuroticism or subjective well-being?

Patrick: Oh, no, not really, because I mean, those are all really low powered. I don’t think we’ve actually looked at that, but I just don’t think so. There were only nine neuroticism hits, for example.

Audience response: My question is: are we worried about false discovery in this novel set, like the 30 you were talking about? Or are we really only worried about previously significant educational attainment SNPs? Or previously significant depression SNPs dragging up educational attainment or neuroticism?

Patrick: So, in this calculation, we’re talking about the whole set, so we’re not talking about which, like, additional ones have been added. We’re talking about false discovery in the complete set that MTAG finds. That’s an interesting point. Well, I’d have to think about that some more if there is kind of nuance in talking about which SNPs were added because it could be even that we lose a couple after applying MTAG as well.

Audience question: [Unable to hear on video].

Patrick: I saw I’m a little confused by what you mean.

Audience response: [Unable to hear on video].

Patrick: Well, I mean, no, I think it’s kind of related. But if you think about it, in if there are any SNPs that are these “MTAG killers” where it’s zero effect for this guy and a non-zero effect here, putting any weight on this guy will drag it in the wrong direction.

Audience response: [Unable to hear on video].

Patrick: Is this is this one, null SNP, or there’s real hit?

Audience response: [Unable to hear on video].

Patrick: So you have a big sample for the true one.

Audience response: [Unable to hear on video].

Patrick: And I think that’s what’s reflected here, right? So, this corresponds to the big sample, and so as you’re giving more information, you’re saying, “Oh, I have more information in the nominal hit, and so I’m going to put shove that information into the trait that I’m interested in.” And so, that’s why the false discovery goes up as you add more information here, because you’re saying, “I’m more and more confident about this guy, and because I’m making assumptions about this relationship, I’m allowed to take some of this information that I keep adding and push that information into this estimate.”

Audience response: It is feeling a little bit like this method of moments procedure is good for creating compelling genes to explore, but if you really want to carry over hits from educational attainment to cognitive, maybe you want to do the Bayesian imputation of the spike-and-slab kind.

Patrick: So, MTAG originally was a spike-and-slab model with three phenotypes. I think it took like three weeks to run.

Audience response: [Unable to hear on video].

Patrick: It’s on the docket. And so yes, this is going to be really good in cases where you don’t think that there are a lot of these really awful SNPs. But just to give you also a sense, so, this is again I was trying to do this, this is worst-case scenario, alright? If you have low-powered SNP, a low-powered trait, and a high-powered trait, and you want to know what the effect sizes are for the low-powered trait, then this is a measure of how bad things can get, right? And again, this is worst-case scenario. Might be that actually, given the true joint architecture, the true false discovery rate is down here, but because we don’t know what the joint architecture is, I plot this to say, “This is a scenario that maybe could kill you.” You could, in principle, if you think you have a guess of how many of these killer SNPs there are, you could impose that and actually calculate what the false discovery rate is in that setting, and maybe you’d find that actually the false discovery rate is down here. And so, in that, you just need to take that information, so say like, I have a hundred hits, my false discovery rate is 9%, here’s some hits, I think about 9% of them are probably not real, but 91% of them are fine.

Audience question: [Unable to hear on video].

Patrick: Yes, so I’m going to get there on the next slide.

So, I was trying to show this, because it’s worst-case scenario. So let’s say that instead, we’re combining comparably powered GWASs for those traits. So, in this case, I plotted max FDR again, and you’re combining two traits that both have a 1.1, and so this is actually the same line, but if both of them have a power of about 1.4 or both of them have a power of 2.0, then the maximum false discovery rate is actually quite low. In the results, I showed you for depression, all of our traits are about 1.4, and they have a genetic correlation of about 0.7, and so, you know, it looks like we’re kind of living right there.

And so, the intuition about why, even if there are really bad sets of SNPs, why we’re okay. So, before I told you the story, it’s like we have this, you know, this estimate, which is actually 0, but it’s a very noisy 0, and we have this one here, and it’s a really strongly powered, you know, non-zero because there’s so much uncertainty here, a lot of information is getting pushed in here. It’s like, well, this guy’s like, I don’t know, so I’m just going to use this, right? But if we actually had a really powerful estimate here, then even if we had an enormous super-powerful estimate here and so this guy says, “Hey, hey, like, I think that we’re probably good, right?”, this guy’s like,“No, because I’ve been estimated in a large sample size, and so even though, even though I know that you think that I’m really big because of Ω, I don’t think I’m very big,” and so this one doesn’t budge as much. And so, as a result, it leads to false discovery rates that tend to be a lot lower.

Audience question: [Unable to hear on video].

Patrick: It has to do with some of the assumptions that we’re making. And so, we’ve assumed that at least 10% of SNPs are causal in calculating this max false discovery rate. You can get some really funny things happening if you assume that, like, 99.5% of SNPs are null for all traits, and then there’s a really high correlation in a small set of them and things just get really wonky. And so, but we think it’s likely that at least 10% of SNPs are nominal for any trait that we’re interested in. You can actually estimate this. In depression, we think that about 60% are non-null, but yeah, so the peak here has to do with the boundary being really binding but not quite as binding in this area.

But yeah, if you have kind of high-powered things and you’re combining them, then maybe you’re okay. But the thing to really, really be nervous about is settings where you have something that’s low-powered and you’re combining with something that’s very high-powered. And something to be kind of worried about is if you have a whole lot of low-powered things that you’re trying to combine. I think that’s my last slide, so like I said, we’re excited about this. It seems to work really well in practice. I think like Miles was saying that we have some theoretical results which suggest that even if you have these death SNPs, predictive power, if you do MTAG, should always get better. And so, if you just want to make a polygenic score, then I think MTAG is pretty much always a good choice. If you want to find novel GWAS hits, then you have to worry about this depending on the setting. So, it just really depends on what question you want to ask and what your data look like, and so you should think about kind of these assumptions that we’re making, and this is true about kind of all the things. You know, we’ve been focusing so hard this week on what are the assumptions of this model, because depending on where you are, the assumption may or may not be more or less reasonable. And depending on what you want to do with the results, violations of those assumptions may be better or worse.

Audience question: [Unable to hear on video].

Patrick: I mean, except for this problem, right? So that would be the only reason why maybe you wouldn’t do just every single trait you can get your paws on. But if you have a bunch of things that you think are moderately powered and you just want to make a polygenic score, then I don’t know why you wouldn’t do that.

Audience question: [Unable to hear on video].

Patrick: As long as you live more here – if each of your each of your GWASs have a chi-squared of 1.4-ish, then combining all of these things together is fine. I mean, even if you’re at 1.1, you’d expect to see a 1% inflation with 10 questions. Maybe you’re okay with that. This is the case of the mean chi-squared, so each SNP has its own chi-squared statistic and we take the mean of all of them.

Audience question: [Unable to hear on video].

Patrick: Yeah, I guess you could. So yeah, you could run a simulation like this and with whatever the mean chi-squared is and your number of traits. That’s an interesting idea. So yes, I guess you could actually figure out how much inflation there is and then account for that.

Audience question: [Unable to hear on video].

Patrick: Yeah, we don’t know that, but it’s something we could look into. It’d be interesting.

Audience question: [Unable to hear on video].

Patrick: I wrote it and then I erased it to say, “I’m going to” – wait, what did I erase? Did I actually write the original and then erase it?

Audience response: [Unable to hear on video].

Patrick: Okay, so the question is what exactly then?

Audience response: [Unable to hear on video].

Patrick: So this was the thing. So, before we assumed that the βs were normally distributed, right? Yeah, so the βs maybe aren’t normally distributed, right? In fact, they probably aren’t normally distributed, and so then people like worry about what’s going to happen if they’re under violations of that assumption, and so we could go through and we could try to like – and this is what we did in the original paper, we did it with like a t-distribution and with the spike-and-slab, we went crazy and we tried to look at what violations of this distributional assumption, what the consequences were for MTAG. But we found in general that it didn’t seem to matter, and we realized that the reason it didn’t matter is because our assumption was too strong. In the original paper, our estimating equation was this thing, and again, it’s this exact same thing, the equation didn’t change. And so, the reason we weakened the assumption is we can say, so now we can say it doesn’t matter what the distribution of β is, as long as this holds, everything about MTAG holds. And so then, people don’t panic as much about, “Oh, whoa, my effects have thick tails? It’s okay – is Ω constant?

Audience question: [Unable to hear on video].

Patrick: Yeah, that’s true, and so the standard errors that we use hold exactly if they are normally distributed. It turns out they hold asymptotically in general, but we had we’re a little funny about the asymptotics here, because the unit of observation here is number of traits. And so, it holds asymptotically with number of traits that, we don’t go to infinity there, but you can also show that actually even if the distribution assumptions violated, it turns out that the MTAG standard errors are conservative on average. And so, the standard errors should be okay for the most part. Except for the SNPs that are null for one trait and nominal for the other, you can show that it is anti-conservative for those.

Audience question: [Unable to hear on video].

Patrick: That is a great question. So, the MTAG standard errors are... well, I’ll just point to it here, it’s easier. So, MTAG standard errors are 1 over this thing, assuming that we fix to the inverse, right?

Audience response: [Unable to hear on video].

Patrick: And so, and what, oh yeah, that’s another... I wrote these last night at like midnight. This should be minus, and so, yeah, so but if you actually use the standard errors that are in the paper, then you know what Ω is and what your Σ is, maybe, or you can get a guess at it for your power calculation, then you could calculate the standard error should be. You can then say, “Okay, what do I think my effect sizes are?” And so, once you know what your standard errors are, power calculations are the same as in a GWAS. In GWAS, you would say that the standard errors are 1 over n; in this case, you would say it’s square root of this. Yeah, that’s a good question. I like it when people ask about power. Okay, I think that’s it.