In this episode, Dr. Bryce Meredig and Prof. Persson discuss:
- The founding of the Materials Project and how it has grown into a widely used global open-source platform
- Recruiting and supporting a multidisciplinary group that touches materials science, chemistry, high performance computing, and scalable web development
- The motivation for making all Materials Project data, applications, and algorithms open-source
- Success stories from the scientific community’s use of the Materials Project
- How researchers can best integrate computational methods with machine learning, lab-based synthesis and characterization, and commercial R&D
“We’re building a community. In the end, the goal [of Materials Project] is to make our data and algorithms available to the public so we can accelerate materials design and accelerate solutions to some of our societal problems in renewable energy.”— Dr. Kristin Persson
Prof. Kristin Persson is the Director of the Materials Project, a Staff Scientist at Lawrence Berkeley National Lab, and an Associate Professor in the Department of Materials Science and Engineering at UC-Berkeley. Known as a pioneer of materials genomics, Kristin co-founded the Materials Project in 2011 with Gerbrand Ceder at MIT. The Materials Project is now a multi-institution, multi-national effort to compute the properties of all known materials and to provide the data, analysis algorithms, and computational materials applications free of charge to the scientific community. The Materials Project aims to accelerate innovation in materials research, and has led to the discovery of new battery materials, transparent conducting oxides, and thermoelectric materials. Prof. Persson is the recipient of the Knut and Alice Wallenberg Early Career Award for Women in Science, the 2013 LBNL Director’s Award for Exceptional Scientific Achievement, and the TMS 2017 Early Career Faculty Fellow Award. She was also a 2018 Kavli Fellow.
Bryce Meredig: Welcome to DataLab, a materials informatics podcast with Bryce Meredig, Chief Science Officer at Citrine Informatics. Our guest today is Professor Kristin Persson, who is the Director of The Materials Project, a Staff Scientist at Lawrence Berkeley National Lab, and an Associate Professor in the Department of Material Science and Engineering at UC Berkeley.
Bryce Meredig: Known as a pioneer of materials genomics, Kristin co-founded The Materials Project in 2011 with Gerbrand Ceder at MIT. The Materials Project is now a multi-institution, multi-national effort to compute the properties of all known materials, and to provide the data, analysis algorithms, and computational materials applications, free of charge to the scientific community.
Bryce Meredig: The Materials Project aims to accelerate innovation and materials research and has led to the discovery of new battery materials, transparent conducting oxides, and thermoelectric materials. The Materials Project currently has over 75,000 users worldwide. Kristin is the recipient of The Knut and Alice Wallenberg Early Career Award for Women in Science, the 2013 LBNL Director’s Award for Exceptional Scientific Achievement and the TMS 2017 Early Career Faculty Fellow Award. She’s also a 2018 Kavli Fellow. Kristin, welcome to the podcast.
Kristin Persson: Thanks so much Bryce, it’s a pleasure to be here.
Bryce Meredig: We traditionally start these podcasts out with a fun fact about our guest. Yours has to do with your name. You told us that your first and your middle name are complete contradictions with one another. Could you share a little more about that?
Kristin Persson: Sure. Well I was first going to have you try to pronounce my second name so I sent it to your team. Would you venture a guess on how you pronounce it?
Bryce Meredig: I will give it a shot, my guess would be As-laug.
Kristin Persson: Not bad. So my second name is Icelandic and it’s pronounced Aus-lug. And the contradiction you were mentioning is the fact that my first name is Kristin, which means the Christian woman, and my second name Aslaug literally comes from the Asa Gods. So it’s As for Asa Gods and then Laug for being favored by, so being favored by the Gods. So I’m a heathen in my second name and a Christian in my first name.
Bryce Meredig: Sounds like you have all your bases covered.
Kristin Persson: Yeah, I think my parents were hedging their bets, yes.
Bryce Meredig: It’s a real pleasure to have you join us here today and of course one of the main topics that we’d like to chat about is your work on The Materials Project. Could you take us back to the time when Materials Project was getting off the ground back in 2011 and even before? What was the founding story of this, by now very well known sort of globally impactful scientific effort?
Kristin Persson: Sure. So I met my team somewhere around 2005. We were approached by a company Duracell, that was then owned by Proctor and Gamble and they had heard that nowadays you can do computations to actually calculate new materials and figuring out, you know what other potential materials could be out there for specific applications and of course I was working with Gerbrand Ceder who was known for his battery work so they were interested to find out if there was a way that we could calculate all the possibilities of all alkaline cathode materials that were out there. They make these alkaline batteries, they’re non-rechargeable and we got our project off the ground. It was the beginning of just one year long and it was a million dollars, but it was the very first time, at least as far as I know, that anyone ran thousands of calculations using DFT, assembled it in a database and screened for multiple properties that in this case were associated with figures of merits for alkaline battery materials and sort of made an iterative loop of it, learned from the data. We learned that certain elements definitely shouldn’t be part of it if you want your material to not dissolve in pH 15 which is the electrolytes you’re dealing with in these batteries.
Kristin Persson: And in the end, after screening more than a hundred thousand materials, we sort of delivered. But, this was very much an iterative process into working with the team at Duracell as well in getting their feedback on our crazy ideas, delivering a list of about two hundred materials that we think inherently had a better chance of beating the current cathode in alkaline batteries (which is manganese dioxide).
Kristin Persson: So that’s where it really started and from my perspective I had a couple of times before in my career thought I’m not sure I’m going to continue in academia, but this project in particular got me extremely excited because it made a connection with industry. It made me feel like I was doing something that could end up in a real product, and it used my expertise in density functional theory. So I loved it and I loved working in a team, I don’t know I don’t remember exactly how many of us where there but it was sort of like five of us or six of us and some were part-time, but the whole idea that some people were doing different things and we were all working together and there were a lot of what I would call sort of jam sessions where we figured out cool ways of describing materials properties. It was really very exciting.
Kristin Persson: So after that project and also feeling that we actually were able to do this, we were able to come up with new compounds that excited the team at Duracell and got them going in other directions. At MIT we kept on doing and mostly Ceder kept on doing this same idea sort of using a database approach, a data driven approach, for coming up with new materials specifically then for lithium ion batteries.
Kristin Persson: In 2008, I moved to LBNL and I brought this excitement about what you could do today with first-principles calculations, which I think a few groups were looking at it in the world but it was still fairly new. Most people looked at it as sort of a niche field that yeah you can explain stuff, but you can’t really drive innovation with it and I suppose it was my luck that my first post doc was really not very interested in science, he was much more interested in programming. So I thought, “okay lets just do this at LBNL and see what we can do”, so he started building a database here, and then we hung out in the cafeteria talking to people in the high performance computing center here at NERSC and the computing research division, CRD, and slowly this little team of people started coalescing and saying, “well, this is really cool, you guys should think about taking this to the next step.”
Kristin Persson: We had had conversations about this data that we’re sitting on, that we’re computing over and over again and filling out in different property space, really shouldn’t be used just for batteries, it should be used for all kinds of materials design problems, because there is so much rich materials inside being buried in all these different ways. For example, if formation energy can of course give you voltage but it also gives you stability and eventually if you build it the right way tells you something about moisture stability and so on. So there is a lot of stuff you can get from just one kind of calculation.
Kristin Persson: So that’s when I think the idea took place that we should build this into a public database and not just use it ourselves.
Bryce Meredig: Its really interesting to hear the inside story that doesn’t always make it onto the website about how you all got started. How has the sort of level of industry involvement continued over time?
Kristin Persson: Duracell really got to start it to some degree right? And then we got the center grant from DOE in 2012, I think that was, that made us more official. At LBNL, it was really …I raised a little bit of money to an LDRD and we co-founded between MIT and LBNL and I think the reason that we successfully were able to argue that this should really be at LBNL and not stay at MIT was because we had so much expertise, multi-divisional and multi-disciplinary expertise between programming, high performance computing, software engineering, and material science that all sort of came together here. We could hire staff that could take care of some of all the maintenance that needs to be done when you bring this to the next professional level.
Bryce Meredig: Well I think you hit on an interesting point there actually with the diversity of skill sets that you need in the Materials Project to be successful.
Kristin Persson: Mm-hmm (affirmative)
Bryce Meredig: So you’re not just doing science, you’re not just doing materials research, you also have a very sophisticated high performance computing infrastructure. You also have a web infrastructure, a web interface, and apps that users can use. How have you been successful in sort of assembling all this under one roof which is highly unusual for a research team?
Kristin Persson: Yeah and I think LBNL was crucial in order to make that happen. Once I raised enough money here at LBNL, I could hire staff. And those staff people tend to be coming from either very heavy programming. They don’t necessarily have to be a material scientist. And also people would sort of a dual interest in science and web development. I have to say that I’ve been extremely fortunate that I’ve been able to attract people with those skill sets. There not that many of them so I’ve indeed been very fortunate. You have to have them.
Bryce Meredig: How do you find those kinds of people? What’s your secret?
Kristin Persson: Well I suppose having a network around here. So again, having people in NERSC, in CRD, and telling them “Hey I really need somebody like this.” And then they tell you “Well there’s this website that we announce for our web developers” or “Here’s a list of people that came to our department a while ago.” So, I think it’s a lot of looking in the right direction so not just blindly advertising but finding people that know these communities better. That’s where we have the best luck. Also, just spreading the word around here. There a lot of people that go through science but then find out after while that maybe I love science but I don’t want to be a exactly a scientist I wanna support science in one way or the other. Especially then if you have programming interest, a strong programming interest then that’s the perfect person for becoming a staff or working on my team.
Bryce Meredig: You of course have few different affiliations. You have the Lawrence Berkeley affiliation, also the UC Berkeley. How do you coordinate the activities in your research group across things that are maybe better suited for the University environment or having example a PhD student work on them verses things that are better suited for the National Lab environment?
Kristin Persson: Right, yeah so my group sits together, we all have space up at the lab. I think that’s actually really important because The Materials Project sort of empowers everybody in my group to do really cool stuff. But your right, that the students need something to build a thesis around. So their products tend to be driven, to some degree, where the funding comes from. You asked me about industry funding before, some of it is industry funded and then it very specific right. It’s for a particular application. Some it comes from NSF or from DOE, but there tends to be an energy hook somewhere. For example, I mean my group has projects anywhere from recyclable polymers, to transparent conductors, to magnetocaloric materials, batteries, anodes, electrolytes.
Kristin Persson: It’s kinda all over the place, but most people in my group, from students to staff and postdocs use the Materials Project. Either it’s data, algorithms or programming interface in one way or the other so it becomes a common ground to sort of spider in the net that empowers all of my group members. So yes, for a student you recruit with a sense of this is your thesis, you’re going to explore this space, you’re gonna understand the class of materials and maybe hopefully also design one that is better.
Kristin Persson: For postdocs it’s similar but it tends to also leverage fair existing expertise and I hire post ox both on campus, as well as LBNL, it all depends on where the money comes from: how I raise through NSF or DOE or through a company. And then the staff people are really, either their staff with the track to stay sort of as an engineer or somebody who is supporting the infrastructure, or you’re a staff that wants to eventually move on to become your own PI and build your own group.
Kristin Persson: There are two kinds of staff career tracks with LBNL. I’ve mentored both, and I have both in my group currently.
Bryce Meredig: Now as we look back over the history of materials project, you’ve made this tremendous investment in building infrastructure in creating a very useful data resource. And then you made perhaps the unconventional decision to actually give it away. So what was the motivation for doing that versus keeping internal to your group?
Kristin Persson: You mean the fact that’s all the codes are open source, except for the …
Bryce Meredig: The codes, the data, the content that you make openly available.
Kristin Persson: Well I think the data was actually how it started, right? I would like to hope that everybody goes into science, has some sort of passion for helping society and it felt to us, that we couldn’t possibly leverage all the information that was in there even if we spend two hundred years digging through it. So it was more like, this has to be used but other people as well. However, when you put data out like that, you also get a lot more nervous about making sure that it’s not garbage in there. There’s more vetted.
Bryce Meredig: Yeah, of course.
Kristin Persson: In order to make it public and of course there’s still mistakes and bugs in there but before we made it public we built in a lot more checks and balances on how to look over the data to make sure that the bad ideas that would occasionally fall out of DFT for various of reasons don’t make it to the public because it does create a lot of problems.
Kristin Persson: So the data was sort of easy, that’s how it started. The codes, took a little bit more of thoughtful process. I would that in the beginning, there was some that said “Well we’ve put a lot of effort into this.” There was a lot of coding, years of experience and years of de-bugging and making it robust enough that we could use it. Were just going to give away that for free?” That was a little harder to motivate, but I think in the end what brought people to the right, in my opinion, the right conclusion was that we’re building a community. In the end the goal is really about making these algorithms and data available to the public in such a way that we can accelerate materials to Science and some of our societal problems and renewable energy.
Kristin Persson: Once you say that, if you make your code open source, you’re going to get hundreds of developers eventually around the world that will help you build that code that will find bugs for you, that will fix them for you for absolutely no money. Eventually, maybe one day, even if it is not your group or your students that come up with something great they will have used that code and you will feel some pride that you helped that discovery to come along and I think that that vision that did it.
Bryce Meredig: Well, you have, as you mentioned, users and collaborators around the world who are building on the tools that you’ve created and the data sets you created. Do you have example of and interesting story or someone that reached out to you that you weren’t expecting and told you about how they were using your work?
Kristin Persson: Yeah, well the most recent one that I can think about, so I don’t know if you noticed, I know that sixteen years a lot of that data and the materials probably but somewhere around November or December of last year, we were having connection issues because there were so many data downloads being done. And it still is but we had to spend some serious time, basically mirroring the site and making sure that more data could be downloaded without interruptions. I think somewhere around six hundred thousand data items are downloaded each day through the API and it keeps on climbing. We sort of hit the bottleneck around November to the point where I think at some point the website literally crashed, because of all this pressure for data.
Kristin Persson: I got an email from Australia, from some people working on water treatment and waste management. They said “I don’t know what’s happening, we love your Pourbaix diagram app, and we use it ever yday and we haven’t been able to use it for three days and its really hurting us. Can you please tell us what we can do to fix this?” Because they of course had no idea that this had to with connectivity issues. So that was very heart warming and I wrote back to them saying “Don’t worry, we’ll get it up very soon again.”
Bryce Meredig: When you’ve highlighted there one of the challenges in creating this infrastructure that you do the science, you publish the data, but then as demand grows you can face difficult software engineering and connectivity challenges that you would have no way of anticipating when you started out.
Kristin Persson: We constantly have to update, maintain, change our infrastructure to meet that demand as well to meet the number of different properties that were continuously calculating. The whole infrastructure is not a done deal ever. It keeps on growing, it’s changing. It gets better. We break it. We fix it. I never thought I’d be leading a software center but I suppose I am.
Bryce Meredig: That raises an interesting question, which is where do you see materials project going in the future? What would you like to see your group and the contortion as a whole accomplish over the next few years?
Kristin Persson: My overarching goal is really to empower the world and all the materials designers and the engineers and people who work with materials in any way or form to be able to use the DFT data more effectively. To understand its limitations enough to use it, most people are comfortable using instrumental data, cause you know, lets be honest most materials engineers or people work with materials that come from exponential background but they really don’t know how to use the DFT.
Kristin Persson: My goal is to make…to democratize, what were doing. You don’t have to hire your own computational scientist in order to interpret the base diagram and use it for your advancement of future engineered materials in the world, that’s the goal. That’s the goal for the materials project.
Kristin Persson: In terms of my group, I’m very fortunate to be attracting really bright and really enthusiastic and passionate students in postdocs and staff from all over the world. I think what unites us all is that we want to make a difference. We want to make renewable energies cheaper, better, more stable so that we don’t have to rely on fossil fuels and we can have a better world. Basically make a better world.
Bryce Meredig: To that point in the intro we mention some of the material advances that have come out of your group, what’s one that you’re particularly proud of, that you would like to share?
Kristin Persson: I think you know, that a couple years ago, we released the data set of piezoelectic tensor so about a thousand of them I think and we weren’t really working on polar materials at all, but we just happened to say “Okay, there’s a way of doing this so we are going to write the workflow around it and release the data.” So we did that and I don’t know if the community noticed but as some point we started looking through the data sets ourselves and we were particularly interested in looking for a material that didn’t contain lead because in the end PZT is the most common piezoelectric material being used but Europe just banned lead basically, in industries, so we’re going to have to find a replacement pretty quickly.
Kristin Persson: There a couple of candidates out there but most of them are alloy systems, some are just interested to see can we find something that’s crystalline and ordered and doesn’t contain lead with a high piezoelectric response. We found an interesting material, strontium hafnium oxide that had a very high predicted response, but the material had never been made. It was one of those hypothetical materials that we get more and more of in The Materials Project. We cover most of the ordered compound, ICSD, world, but there an increasing number of compounds that come out different design exercises that we import and they’re theoretic materials.
Kristin Persson: This material was competing with no less than four other materials of the same composition in that phase space that had been made and had a lower formation energy so they would be more favorable to make. What I like about this story is that we had some fearless collaborators at NREL that took upon the challenge to try to make this material for which there really wasn’t a guarantee that it could be made. It took them two years to get with their characterization team at SLAC to make the material in thin film form and to actually show beyond the doubt that it was exactly that, all and more strontium hafnium oxide and not any of the other four ones that were on the phase diagram. Indeed, it had a high piezoelectric response, it was actually ferroelectic as well and had a high break down strength.
Kristin Persson: So that’s one of the newest ones out of the team, there are people in my group working on auxetic materials and some nice predictions on that. Novel magnesium electrolytes, we’ve designed novel salts that hopefully will enable magnesium rechargeable batteries with wider electrochemical windows. We worked with Cal Tech, John Gregoire and Jeff Neaton here at UC Berkley and LNBL on water splitting oxygen evolution photocatalysts. So it’s really very broad … and I suppose that excites me as well, I get to learn about a lot of different applications and cool ways of using computations to drive materials design.
Bryce Meredig: I can imagine that this set of success stories that have been coming out of your group Materials Project in the high-throughout DFT community as a whole helped, for example, industry to understand that computational methods can have a real impact on materials development. Have you found that to be true as these capabilities become more and more … let’s says proven?
Kristin Persson: Yes, to some degree. I will say yes they understand that there’s a great promise, but I think also and realistically so, there’s a big gap between new material with some promise and actually commercializing them. There’s a whole set of the processing conditions, scalability. Can you make it at scale with the performance intact? Is it going to be cost effective at that point, you have enough of the different elements that go into it. Are there temperature sensitivities under the operating conditions that you’re looking at? So I think what industry understands is that even if we have new compounds, the road to success from there on is long. Even if you have something new that’s on paper and even if synthetic in the lab looks better. I know you know that timeline really well, but it takes an average eighteen years. That’s actually not from the confrontational prediction, that’s from the day its made in the lab and tested and works in the lab environment.
Kristin Persson: Yes, I think they think of it exciting but I also think they realize that there needs more development on this sort of low TRL level, if that makes sense, before they’re ready to really jump on it.
Bryce Meredig: What are some developments along those lines that you think will help push us in the right direction, in terms of being able to computationally predict and understand more of that development cycle?
Kristin Persson: Yeah, it’s difficult, right? Because some of those things are really hard to have an impact on computationally processing is very, very difficult to say anything intelligent about cost is the same, right? It all depends on how you make it. Some materials you could actually set a lower bar for cost saying that you can’t include iridium in something that’s going to be thrown away. But, you can also include dirt cheap elements but if you have to make an in particular nanoparticle, it becomes really expensive anyway. So, I think what really works is that something similar along the line where we work with Duracell and that first project, and I am working with some other companies where you work hand in hand with the experiemental team and the team will actually works all the way towards the devices. Because they have insights that may not always be codified and may not always be easy to generalize, but when they see a compound or they see an idea, they go “yeah that’s gonna work” or they go “no, that’s not gonna work”. You have to challenge each other, you have to have a bit of tug and war and say “well why is that?” And they explain why and sometimes they think yeah that’s a really good point and they challenge us and say “well how about if you do it this way instead?”
Kristin Persson: It’s really working together rather than sort of delivering a packaged product, which may be rejected because we haven’t thought about all the things that go into actually making something work.
Bryce Meredig: Yeah, I think that’s a great point and we find that the same is true surely on the machine learning side as well where there’s essential domain knowledge that’s very difficult to represent in a way that computational methods can understand so if you put these computational methods together with domain experts that really know what they are doing and can supply that additional knowledge, the combination can get us to places that we couldn’t arrive at before.
Kristin Persson: Yeah, I completely agree. That’s exactly it and sometimes you can develop a more codified screening method from it. I still remember the first time we talked to Duracell and they said “wow, all of these things are gonna dissolve in the elecrolyte” and we had absolutely no way at that time to screen for stability in aqueous media. But, they challenged us to come up with a way of doing it so that’s when we wrote a code behind the Pourbaix diagrams that is now on The Materials Project and being useful to people in Australia and other places. It does work right? It is thermodynamic so you can still have passivation layers and you can still ways of looking at materials in a more complex way but it does tell you what the thermodynamic driving force for the solution is. Once I get high enough, you’re not gonna survive in the certain conditions. So there are ways that these conversations can also lead to a more accelerated way of screening materials.
Bryce Meredig: As we look towards the future of high-throughput computational material science, high-throughput DFT, machine learning, how do you see these capabilities fitting together over the next few years?
Kristin Persson: Well they are hand in hand right? Without the data we couldn’t do machine learning, so I think there is a reason we are seeing this enormous interest and excitement about machine learning and material science because now suddenly we do have some data without something that has been a lot of work and people collecting the data and making it available, but it is at least now available. So the future of that is, you know I think we are gonna find out that it’s sort of the happy wild west right now a bit right with machine learning. Everybody is trying all kinds of different things, and we don’t really have a robust benchmark to evaluate against. I think what we are gonna see in the future is more of “well you have to beat this machine learning model” if you’re gonna claim that you’ve done something better. We have to also realize what other pitfalls. We need a lot more negative results out there which computations can help with which is great.
Bryce Meredig: Yup, that’s right.
Kristin Persson: So, I think we’re gonna sort of weather a bit the wild west of everybody doing all kinds of crazy stuff and hopefully move towards something we have a more understanding of what goes into machine learning or what needs to be there in order to be robust and not just predict something that is either incredibly anticipated or you could have done in another way that may have made more sense. I think we also have to publish them a lot more. Sometimes it’s very difficult in papers to understand what people actually did and if they don’t publish the code, it’s impossible to evaluate. So I think a little bit more rigor sort of go hand in hand with the wild west that we’re having would be good and desirable but I think it’s really exciting and it’s really cool that we are now currently… we actually have enough data that we can start looking at trends in the aggregate and trade offs and maybe also discover physical relationships that we weren’t able to do before we had the data to support it.
Bryce Meredig: Well, on that note, I wanted also to ask for listeners who are interested in finding out more about your work, what’s the best way to do that or to get in touch with you?
Kristin Persson: Email for sure, I’m incredibly delocalized. I have two offices, one at UC Berkeley and one at LBNL. I run up and down all the time and then I travel and I teach so email is pretty much the only constant that I look at almost every single, well I look at every day. Of course I do.
Bryce Meredig: Great, well thank you so much for taking the time to join us on DataLab.
Kristin Persson: It was my pleasure Bryce. Thanks.
Bryce Meredig: Thanks for listening. Please subscribe in radar podcast at iTunes, Stitcher or wherever you listen to podcasts. Listen to past episodes, learn more about guests, and submit questions and guest suggestions at citrine.io/podcasts.