TRANSCRIPT

Eric Jorgenson:
Hello and welcome everyone. My name is Eric Jorgenson, and I'm a senior director of statistical genetics at the Regeneron Genetics Center. This is "The Nucleus" by Regeneron, a podcast series where we discuss science that fascinates and intrigues us.
Today's guest will be Marylyn Ritchie, Vice Dean of Artificial Intelligence and Computing in the Perelman School of Medicine at the University of Pennsylvania.
Welcome to the show, Marylyn.
Marylyn Ritchie:
Thank you for having me.
Eric Jorgenson:
Marylyn, we've known each other for a long time. I can tell you one of my oldest friends in the science world. But for the listeners who don't know you, can you tell us a little bit about yourself, maybe giving us your journey from junior scientist to vice dean?
Marylyn Ritchie:
Yeah, absolutely. Yeah, and it's great to see you as always, like family reunion coming to this meeting.
I was a biology major in undergrad, I'll start there, at the University of Pittsburgh. I went to Vanderbilt University for my PhD in Statistical Genetics. At the time they didn't even have a human genetics program, but I was doing research with Jason Moore and Jonathan Haynes, who were at Vanderbilt at the time. I helped them piece together courses that ended up being human genetics. As you know, I then finished my PhD at Vanderbilt and went home to Pittsburgh for Christmas break. Came back in January and started as a tenure track assistant professor.
I was at Vanderbilt then for another seven and a half years then got tenure there, moved to Penn State. I describe this often as my Goldilocks journey because Vanderbilt was an academic medical center, and I didn't realize at the time how for the type of work that we do it was just so important to have the health system and the school of medicine and the college of science all in one campus. And so, Penn State, I was in a college of science, but I really missed that medical clinical side. I started a partnership with Geisinger Health System, and I was jointly appointed with Penn State and Geisinger for a few years. And then I shifted more to Geisinger, which was the clinical but not the science. It was like too hot, too cold. Ended up moving to Penn in 2017. Again, I'm back in an academic medical center. It's just right, has the right pieces that I need.
Through that journey though, I did do several leadership training type programs and realized that I have a lot of interest in leadership. Over time also realized I was interested in administration, which is not something I would've ever predicted. But as AI really emerged over the last few years, the school pulled together faculty who work in AI. I've worked in that space since my PhD. My PhD project was on neural networks, and so I keep joking I've been doing AI since the late 1900s.
Eric Jorgenson:
It was just called something else back then.
Marylyn Ritchie:
Yeah, exactly.
But there was a lot of conversation about what should we be doing in this space and how do we make an impact. I was just I can think right place, right time. I was already doing leadership as an institute director, director of the Institute for Biomedical Informatics. And so, I knew the leadership really well, had the domain expertise. I was working really well with collaborators in the school of engineering and arts and sciences and provost office.
And so, it was just like I said, right place, right time. The dean decided to create this new role as vice dean of AI, I was really excited to take that on.
Eric Jorgenson:
Great, thanks.
In addition to being vice dean, you're also the co-director of the Penn Medicine BioBank, and Regeneron Genetics Center has a longstanding collaboration with the BioBank. Wondering if you could tell our listeners a little bit about what motivated the founding of that biobank in 2013 and how it's changed over time.
Marylyn Ritchie:
I wasn't at Penn yet, but I am pretty familiar with why they started it.
But I think the original motivation was really that creating research cohorts is really expensive and really time-consuming. When you have researchers across an institution that study a lot of disease areas, the idea of let's create a kidney cohort, now let's create a cardiovascular cohort, now let's create a diabetes cohort and psychiatric traits, it's just so unwieldy. And so, by creating a resource like the Penn Medicine BioBank, it enabled the whole community to pour into one resource. To be able to recruit individuals from all of these disease areas as well as individuals who are today healthier when they were recruited. You'll have data on them pre-disease, and many of them go on to develop disease, and so it's that developmental of the disease data that's in the health record, and then you can follow them once they're diagnosed.
And so, I think really the motivation was to build a resource that aggregates the community, allows more cost-effectiveness of recruitment into research programs. By partnering the electronic health record data with biospecimens, where you can do genetics, genomics, and other types of omics, it allows for a really robust resource for precision medicine research.
Eric Jorgenson:
So, one of the really interesting things about the Penn BioBank is that it's very diverse: over 30% representation from non-European ancestry. How do you think about that impacting your research and other research done as part of this project?
Marylyn Ritchie:
I think it's one of the great strengths of the Penn Medicine BioBank. On the one hand, I think part of the reason we've been able to recruit in that way is twofold. One, it's a good reflection of our patient population. Penn has six hospitals in the region and many more outpatient clinics. The patient population is roughly 30% non-European ancestry, and so it's great to see that our research participants reflect the patient community that we see. That is one of the challenges, I think, in other biobanks, it's hard for them to recruit from communities that they just don't treat. And so that's part of it.
The other part is that our research team is really diverse, and I think it's really important for research participants to see people who look like them, who they can feel like they're in community with when they're being recruited and participating in the research. And so, I think that's part of why we've been able to recruit a diverse population.
We're trying to do the research to identify risks of disease, better treatments for disease for the patients that we see. And so fortunately, our participant population is similar enough that hopefully we will be able to identify genetic variation or social determinants of health or environmental exposures that are important for those diverse populations. Because as we know in our field, we see variety of genetic variation, gene environment interactions, gene social determinants of health interactions. In order to model those types of things, we have to have the participants and the data from those participants to do so.
Eric Jorgenson:
You brought up genetic variation, and I wanted to delve into that a little bit more. You've worked on genome-wide association studies for a long time, and now with this collaboration between the Regeneron Genetic Center and the Penn Medicine BioBank, we've exome sequenced over 44,000 individuals. I was wondering, this enables exome-wide association studies, which are a little bit different. They focus on rare variation instead of common variation.
What has been your experience with that? How do you see ExWAS? What does it produce that GWAS does not?
Marylyn Ritchie:
Yeah, I think there's great parts about exome-wide data, and then there are a lot of challenges, as you know.
The great thing about it is that we're able to pick up variants that we weren't able to ever see before. I think our ability to identify new genetic variation has really increased since we have these sequencing technologies exome-wide and even genome-wide. That has enabled, I think, discovery of some new genetic variation that is important for disease risk and that consequently end up being good targets for therapeutics. That's been a great advance.
But with that, I think the challenge is that because the variation is so rare, sample sizes are small. When you're looking at diverse populations, which variants they carry vary from population to population. And so even in 40, 60,000 people, you still may only have a handful that carry a particular variant. Our statistical genetics methods just weren't created for that. And so, I think that our community is lagging behind on creating the new statistical tools.
We need to get more creative, think differently about how we analyze the data. I think what we're seeing is a lot more aggregation of variants within a gene or within a gene region or within a pathway or within a protein, complex, things like that. But I feel like there's a lot more work to be done in that space.
Eric Jorgenson:
Part of that work, you've mentioned scale. How do you see the Penn BioBank fitting in terms of collaboration across multiple biobanks, for example?
Marylyn Ritchie:
Yeah, so we've done a lot of collaborations with other biobanks, and I think that does enable us to pick up more people with those variants. We've been increasing our recruitment quite a bit over the last couple of years. We switched from in-person consenting, which takes a lot of person power, it's all the conversations and interactions, to electronic consenting. That has increased. We're now at about 290,000 participants that have signed the consent to participate.
And then the flip of that is them getting biospecimen. When you do in-person consenting, you can get the biospecimen right then. When you do electronic consenting, you have to wait until they're getting a blood draw on campus. But we've now built the informatics to either do a blood draw order so that when they come in for a clinic draw, they add an extra tube. Or we get the residual blood from whenever they come in for a blood draw, they take what they need for the clinic, and then the rest of the blood typically gets thrown away. And so, we also see this as a really green initiative. This is bio sample that was going to be thrown away. We can actually convert that into research specimens for people who've already signed the consent.
And so that's enabled us to get on the order of several thousand samples a month. And so, it lags behind the consent, but now it's scaling and it'll be larger.
Eric Jorgenson:
After we've as a field done all of these analyses, where do you see genetics going? How do you think it might help patients in the future?
Marylyn Ritchie:
Well, I guess two areas.
One is on the predicting an individual's risk of disease. I think there are some people who say they don't want to know, but I think by and large, most people would want to know what their risks are. Especially if there are lifestyle changes they could make or if there are gene therapies which are emerging. I feel like every month we see new gene therapy trials out there. If you knew you were at risk and there was something you could do about it, I think most people would want to. I think as we identify more of the genetic risk factors for disease, we'll develop more diagnostic and predictive models that would be useful for patients.
And then I think the other side is we'd better understand what the underlying mechanisms of disease are that will enable us to identify new therapeutics, so whether it's gene therapy or CAR-T or other types of small molecules. I know that's a huge thing happening in AI right now is how to use all the data we have with AI to identify new types of molecules that might become therapeutics. That's great if we can create a new molecule, but we need to know what do we target with said molecule. I think that the genetic data that we're generating is going to enable us to find those.
Eric Jorgenson:
Sure.
How do you see AI impacting the field of genetic research?
Marylyn Ritchie:
In most AI models, most of the methods, they require data of some truth to learn on. I think especially in genetics, other than the large effect variance, we don't have a lot of complex models of truth for AI to learn. I mean, I think we all mostly believe that biology is complicated. It's probably not a single variant. It's a combination of that variant in the context of the other variants in that pathway or in that regulatory region or whatever. We don't have a lot of examples for the AI to learn.
That said, I just have seen a couple papers lately where people are taking genome sequence data; instead of analyzing it as numbers and turning it into numbers, they're turning it into images and then analyzing the images the same way that they analyze a radiology image or a digital pathology image. They're actually finding patterns that our statistical methods have never seen.
And so, I am curious to see if that turns into something useful for our field. I am still a little skeptical and cautiously optimistic, but it never would've occurred to me to take our sequence data and turn it into an image.
Where I do think we're going to see AI used a lot though is on the phenotype side. I think AI is going to enable us to go through this health record data much more robustly than we are today with just creating these if then else algorithms. If they have these codes or on these meds, they have a phenotype, I think we'll be able to analyze the data much more broadly with AI and get richer homogeneous phenotype definitions.
Eric Jorgenson:
Interesting.
What do you hear from participants about their attitudes toward genomic research and hesitancy or motivations for participating?
Marylyn Ritchie:
I think for some, the hesitancy is largely around either lack of trust that some people worry about if I give you access to my data or give you my DNA, what's going to happen to me? What if insurance companies get access to it? What if you share it with the wrong people? They just are worried about their privacy of their data.
The other though is some are concerned about the actual logistics of giving a blood sample. I recently had a friend say, "Oh, I saw the consent for the biobank. I actually said no." I was like, "Why are you telling me this? That's the study that I lead." He said, "Because it said that I might need a needle stick." I was like, "What?" We're trying to get the specimens off of either a clinical blood draw or a residual sample so that you wouldn't need an extra needle stick.
But I think the consent form in the detail says it's possible that you might come in for a research blood draw. That's largely so that we have consent. If we do need to call someone in, they're a high priority population, we could invite them to come in for a research blood draw. We needed to consent them appropriately. The reason people want to do it is largely they're hoping that we can identify cures for disease and new drugs to treat their diseases. They don't even necessarily, I think, worry about themselves, but they think about their families and their communities. Often it takes long as you know from a discovery until we actually have a new therapeutic. It just takes a lot of time, and so I think a lot of people who currently have disease don't necessarily believe that the treatments will come out for them. But they think it'll come out for other people with that condition. I think because so many conditions are inherited in families, they do worry about their children and their grandchildren. I think a lot of people do it to try to protect others and provide scientists the ability to find those new therapeutics and cures for disease.
Eric Jorgenson:
That's great.
What are the things that have come out of the biobank that you're most excited about?
Marylyn Ritchie:
Let's see.
We had a study that we did in adult-onset hearing loss that I thought was really exciting. This was a study led by a former MD-PhD student Joe Park.
In that study, one of the associations that he found was with tinnitus, which is the ringing in the ears which is often associated with adult-onset hearing loss. The gene that he found, we reached out to a hearing loss researcher at Penn, and he was able to make a mouse model of that gene in the homolog of that gene. Those mice had hearing loss, and so it turned into additional studies. But it was really exciting that this brute force, large-scale analysis found something that at least in the mouse caused hearing loss. And so, I think he's been continuing to pursue that.
Another finding that we actually just got accepted this year is a current graduate student, David Tseng. This is in collaboration with Dan Rader, so Dan and I co-mentor David. He was doing an association analysis looking specifically at variants that are more common in populations that have high genetic similarity with African reference populations. Those same variants have no or very low frequency in individuals who have high genetic similarity with European reference populations. And so, these are variants that are more frequent in populations of African descent or African ancestry. And so, he did an association analysis, and there are variants in APOL3, which is a gene that is related to and close to APOL1, which is a known gene for kidney disease. But he found this APOL3 also has association with kidney disease. Using statistical modeling, it's independent signal from the APOL1 signal.
And so, these are variants that are almost nonexistent in populations with high genetic similarity with the European populations. And so, this variation seems really unique to these African reference populations and associated with kidney disease. And so now he's trying to figure out, what does that relationship mean? I think one of our collaborators at Penn is trying to figure out what is the relationship with APOL1 and APOL3. What do you do with that? But that's the type of thing that you need a diverse resource, and then you need a lot of data at scale to be able to find.
I'm really excited to see what other findings like that come out as we continue to mine through all the data.
Eric Jorgenson:
From the Regeneron Genetics Center perspective, we are always excited to work with different biobanks, especially diverse ones, and bringing together as many different sets of data as possible.
Marylyn Ritchie:
I think broadly, industry academic partnerships are so important to move our fields forward. I mean, it just we can't do these studies in isolation. I think partnerships like this are just really important.
We've done some studies together aggregating the data from across all the biobanks, as we talked about before, with the exome-wide data and rare variants. That creates tremendous power.
I think the other thing is we have diverse groups of scientists at both organizations, and many of us know each other because our field, while it sometimes feels big, it's not that big.
Eric Jorgenson:
In fact, I hired one of your PhD students.
Marylyn Ritchie:
That's right, yeah. I think we've known each other since we were PhD students.
Eric Jorgenson:
I think that's right, yeah.
Marylyn Ritchie:
But we have different expertise in different things that we get excited about and different things that we focus on, and so that's also been really great, too. Oh, we're trying to do this thing. Oh, we have a method that we just worked on for that. Oh, we've been struggling with this. Oh, we just figured that out.
And so those types of interactions and collaborations are really fun. I do think they're enabling us to move the field forward a lot faster than either of our groups could do alone.
Eric Jorgenson:
Thanks for joining us, Marylyn. That was great.
Marylyn Ritchie:
Thank you for having me.
Eric Jorgenson:
Our guest today has been Marylyn Ritchie, Vice Dean of Artificial Intelligence and Computing in the Perelman School of Medicine at the University of Pennsylvania. I'm Eric Jorgenson. See you next time on "The Nucleus" by Regeneron.