Chapter 10.6: GDPR for Dummies: A Survival Guide for Genetics Research (Video Transcript)

Title: GDPR for dummies: A survival guide for genetics research

Presenter(s): Heidi Beate Bentzen, LLM (Centre for Medical Ethics, University of Oslo)

Heidi Beate Bentzen:

My name is Heidi Bentzen. I’m a researcher at the University of Oslo. My background is in law, and I will give you a GDPR for Dummies survival guide for genetics research.

GDPR in the European Union, there used to be a personal data directive. That meant each member state was free to figure out how to implement the rules. This meant that we were left with very varying interpretations across member states. So, to resolve this issue, the EU decided, “We’ll put a regulation in place instead,” and that is the GDPR, the General Data Protection Regulation. What this did was to harmonize data protection law in the EU member states. It also specifically regulates genetic data. The GDPR has two purposes. One is to manage data flow. In our response, in sorry, start over. The GDPR, um, sorry, I don’t know what’s going on right now. Sorry, I’ll start this slide over again.

Host: Okay, that’s fine. Go ahead, and it’s still recording, so we’ll just edit this piece out and then okay.

Heidi: Thank you.

Host: Right over. No problem. Start when you’re ready.

Heidi: The purpose of the GDPR is to manage data flow in a responsible, uniform, and predictable manner. Its objectives are to protect fundamental rights and freedoms of natural persons, and in particular, their right to data protection. But it’s also to ensure that the free movement of personal data within the European Union shall not be restricted or prohibited. So the rules in the GDPR are strict, but for a very good reason. You will know that the data is being processed just as securely at a hospital in Spain as with a private company in Finland, that the individual’s fundamental rights are respected not only in Sweden but also in Greece. So the GDPR is not about restricting data processing; it is about managing the data processing responsibly.

One of the fundamental objectives of the European Union as such is to strengthen its scientific and technological basis, and this entails also free circulation of scientific knowledge. It has therefore declared that there should be as few legal obstacles to research collaboration as possible, and this, of course, necessitates movement of personal data. At the same time, the Charter of Fundamental Rights in the European Union guarantees protection of personal data, so the GDPR tries to balance these two objectives. So, therefore, the GDPR is built as a tool to facilitate research while promoting the individual, protecting the individual research participant. So it’s a template for collaboration in a knowledge society.

And we see this very clearly also in the text of the GDPR itself, which speaks very highly and favorably about scientific research, and it mentions myriad exceptions to the main rules if the processing is for scientific research purposes. So does the GDPR apply to the data processing you are doing in the European Economic Area? Well, the tentative answer is most probably, but it only relates to data, not samples. Blood samples are not considered personal data, but the labeling on them and the data obtained from the samples through laboratory analysis are. And if the data is anonymous, it is exempt from the GDPR, but to determine whether it is anonymous, you need to use a kind of anonymization tests, which means that you need to look at the factual circumstances. You need to consider all the means reasonably likely to be used to try to identify someone, and you need to consider this dynamically, so if you want to process the data for 10 years, you also need to consider, well, what would it cost then? Which technology would we have available nine years from now to potentially identify this person? Looking at the re-identification literature, we know that we can usually, in genetics research at least, usually not promise research participants that they will not be identified. In scientific research, data is typically processed pseudonymously, and pseudonymized data is considered personal data subject to the GDPR.

Now, how do you ensure that the data processing is compliant? Well, let me help you: GDPR cheat sheet. I’m going to run you through a GDPR cheat sheet, so I will teach you about the seven principles the GDPR is built on. So all the other rules you see on the GDPR will somehow just be specifications of these seven principles. So if you know these, you’re pretty, you’re pretty well compliant already.

The first principle is lawfulness, fairness, and transparency. It is that personal data should, it shall be processed lawfully, fairly, and in a transparent manner in relation to the data subject. Let’s look at lawfulness first. One thing we should do here is to keep different types of consent apart. So we often talk about informed consent when we’re speaking about research. Now, informed consent is a term used in the ethics literature, and it typically refers to a research ethics instrument as informed consent to research participation. What we are talking about in terms of GDPR is something a little different. We are talking about the need for a lawful basis for data processing, and that can, but does not need to be, consent. For instance, let me give you an example: if you’re running a clinical trial, there will be laws, um, mandating you to process certain data for safety purposes. The basis for the data processing for that portion should be law, and it should not be consent. So here you see the difference. When you can, however, combine these two types of consents in one document, but they should be clearly separated within.

With the genetics research, we have a bit of a problem in the sense that it’s so dynamic. Well, it’s not really just a problem, it’s very, very nice, but it also means that it’s challenging to keep the consents valid because the information you have provided a couple of years ago might be very out of date at this moment. So there are currently several initiatives looking into alternative consent forms, so I’m just mentioning this to you. One option is, for instance, dynamic consent, which is electronic and it allows for continuous information updates and it can also allow for very easy re-consent processes. With genetics, it has the additional benefit that if you make a finding that you’re unsure if you should report back, then you can ask the research participants if we make certain classes of findings, would you like to know or not to know? And so you will know how they feel about it.

As you saw, transparency was the other part of the first principle, and the principle of transparency entails that personal data shall be processed in a transparent manner in relation to the research participant. Let me give you an example. AI tools for assistance with medical diagnosis can be remarkably efficient. So, one example is facial image analysis for the diagnosis of genetic disorders, and that is so efficient that it can even reveal information that can’t be picked up by experienced health personnel, or, and it can even detect carrier status. So, in this study, for instance, it had a 99% top 10 diagnostic accuracy. The tool mentioned here is a 10-layer deep convolutional neural network. Neural networks are, as you all know, notorious for being difficult to explain. However, we found that it’s actually not that hard to fulfill the transparency requirement, as many will have you believe. Here, for instance, it was accomplished by a three-layered explanation: one very easy explanation about pattern recognition, a more detailed explanation with photos and examples of transmitted data as you see in the slide, and finally, for those particularly interested, a link to the Nature Medicine paper explaining the algorithm and, prior to the publication of that article, to the preprint.

Fairness is the last element of the first principle. AI, for instance, can be both sexist and racist, and the principle of fairness entails that the data processing should be fair in relation to the data subject. So this means, for instance, that one should strive to avoid bias and once you disclose any potential bias in the data, and this is then an example of how this was done from the article I mentioned on the previous slide.

The second principle of the GDPR you need to know about is purpose limitation. Purpose limitation consists of two dimensions: one is specification and the other is compatibility. Specification means that personal data shall be collected for specified, explicit, and legitimate purposes, and compatibility means that the data shall not be further processed in a manner that is incompatible with those purposes.

Here you see one of the places where scientific research really enjoys a privileged position, because for scientific research, you still need to specify the purpose and make sure it’s explicit and legitimate, but you are allowed to do further processing even if it’s incompatible with the original purpose.

There are other sides to the purpose limitation principle you need to be aware of, though, and particularly for genetics research. You are often building large genetic databases, and those are tremendously useful for medical research, but they also have enormous misuse potential. And it’s crucial to understand the impact of scandals, and this instills a responsibility for you to think about how to protect the database from unintended third-party access. And this is a task that goes well beyond information security, and I’ll explain to you through an example or two how, the type of issues we typically face here. For instance, a few weeks ago, the Norwegian Supreme Court denied law enforcement access to a scientific research biobank. A baby had died, supposedly of formic acid poisoning, and the father is a suspect in the case. So, tissue sample from the scientific research biobank is necessary evidence, and without it, it will not be possible for the police to prove its case. The Supreme Court stated that the biobank access can decrease trust in scientific research and associated biobanks and denied access.

However, looking across the border here to Sweden, there was a case in which the former Minister of Foreign Affairs, Anna Lindh, was murdered and the biobank acceded to the police request without the court order and handed out samples of a named suspect, uh, who turned out to be the actual killer. However, what happened? The consequences for the Swedish newborn screening biobank, which was the one that was used in this case, was that 2000 people immediately withdrew their consent to the biobank when they heard about this, and an additional 50 people each month did so. So, this shows that if you start allowing for other purposes for others to use the same data, you can actually destroy the main purpose of the data collection, and this is why I’m telling you to watch out for this. We also see this a lot in other disputes, for instance, related to biological kinship, we see it after mass casualty events, we see it with immigration issues and reprioritization issues.

The third principle of the GDPR is data minimization. The data must be adequate, relevant, and this is the difficult part for genetics research, it must be limited to what is necessary in relation to the purposes for which they are processed. So, that means that you need to make sure that the purpose of the processing aligns with the data you are generating. So if you’re only looking at one gene, you need to be able to explain why you’re doing a whole genome sequencing. So make sure that you have a purpose for the processing that aligns with the tests you’re running and the data you are generating and processing. Why do we have this rule at all? I mean, we’re looking into big data analysis and the more data, the better, right?

Well, it’s the Orwellian argument. Indiscriminate data retention is not considered okay. The data processing needs to be proportional, and this is, for instance, why you see that we don’t have universal forensic databases. This is a basis in human rights law and, therefore, does not just apply to the EU but more generally. And we saw that, for instance, when there was a suggestion to establish a universal forensic database in Kuwait and that was stopped by their supreme court. But this explains why there is so much pressure in the healthcare and research sector from third parties to our databases. They simply want access because they’re not in a position to establish similarly good databases themselves.

The fourth principle of the GDPR is that personal data shall be accurate and up-to-date, and I don’t think I need to explain this much more because as researchers, you very much appreciate this aspect.

The fifth principle of the GDPR is integrity and confidentiality. A lot of this relates to information security. That personal data shall be processed in a manner that assures appropriate security, protection against unauthorized or unlawful processing, and against accidental loss, destruction, or damage. If you look at the specific rules of the GDPR, you will, for instance, also see that you may, in some instances, need to or want to conduct a data protection impact assessment. And that is where you assess what risk there is to the research participants and how you can mitigate those risks.

The sixth principle is storage limitation. This means that personal data shall be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed. But personal data may be stored for longer periods if it’s only processed for scientific research purposes. So again, a very positive exception in the scientific research field. This does, of course, necessitate that you have appropriate technical and organizational measures in place, but still, you see that data retention can be lawful outside the regular bounds if the processing is for scientific research.

So, the last main principle of the GDPR is accountability, and this means that the controller, that’s the one deciding the purpose of the processing, they shall be responsible for a purpose and means of the processing, I should say, they shall be responsible for compliance with all the previous principles. And I can also remind you that if you look back at this list again and you think about transparency and accountability, together, those comprise the two elements of trustworthiness. So you see here that what the EU is aiming for is simply trustworthy data processing. So they want the flow of the data, they just want it to be managed responsibly and in a trustworthy manner towards the research participants.

So as you’ve seen then, the GDPR is built to function as an instrument for facilitating responsible scientific research, and it functions as such. Both within the EU and for collaborations such as with countries the European Commission has decided offer an adequate level of data protection. For instance, Japan. So when Japan and the EU mutually decided that they offer adequate levels of data protection, this created the world’s largest area of safe data flows. Other countries that similarly to Japan have a decision in place from the European Commission that they offer an adequate level of protection, so the data can flow freely between the EU and those countries include, as you see on the top of this slide, Andorra, Argentina, commercial organizations in Canada, Faroe Islands, Guernsey, Israel, Isle of Man, Japan, New Zealand, Switzerland, and Uruguay. And there are also adequacy talks ongoing with South Korea.

If I switch away now from the GDPR and see, well, how did we actually get the GDPR at all? Well, it has a mother, and the mother is the Council of Europe Convention 108, which now also exists in a modernized version. That is the only internationally legally binding instrument in the field open to any country, and there are currently 55 countries worldwide bound to that convention. It’s all the countries you see on this slide, so those are the 47 countries who are members of the Council of Europe, and in addition, it’s Argentina, Cape Verde, Mauritius, Mexico, Morocco, Senegal, Tunisia, and Uruguay. So, in all of these 55 countries, the elements of the GDPR you will also see in these countries’ data protection legislation because they build on the same convention, which is the Council of Europe Convention 108.

And finally, with the exception of human rights and the Council of Europe Convention 108 being possible to enter into for any country worldwide, most legislation related to processing of data for research is regional. However, with colleagues from across the globe, we have identified the functions that governance of genomic data should fulfill as the basis for the design, implementation, and the evaluation of governance frameworks. So we acknowledge that different governance functions may be in tension with each other. For instance, access to data versus introducing oversight and restrictions to ensure appropriate data uses. So we’ve used the governance framework of six large-scale international genomic research projects from across the globe: Africa, Asia, the US, and Europe, to illustrate governance choices as well as their approaches to important trade-offs and how these are reflected in their governance functions. This may be work that shows the core global elements to consider in genomics research. Thank you so much for your attention, and please get in touch if you would like me, at any point, to answer any questions. Thank you.