John Kitchin: Using Machine Learning to Improve Molecular Simulations


My students and I are working on modeling
materials using molecular simulations. And the kind of simulation we use is based
on quantum chemistry, and these calculations are very accurate, but they’re very expensive. Some calculations may take weeks to finish,
and that limits the number of atoms in a system that we could try to simulate. So, practically, that means we can only look
at simulations with, say, a few hundred atoms for a few femptoseconds, 10 to the minus 15
seconds, so very short times using these methods. So what we’re trying to do is use machine
learning methods that we can build from the quantum chemical simulations that will allow
us to run much faster calculations on bigger systems to model more complex materials for
longer times so that we can design better materials for engineering applications. So our work will have mostly impact in how
we can design new materials or understand limitations of existing materials. Being able to do that requires us to model
more complicated materials over longer time scales, and that’s not possible with existing
methods today with the right level of accuracy. And our methods will allow us to do that for
the time scales that are required and with the accuracy that’s required to design new materials and
understand how to make better ones for the future. One of the biggest challenges in machine learning
is getting enough data for the machines to learn. People don’t think about this very often,
but if you think about your education, it took you 12 years to become, you know,
a high school graduate, another four years of college, and if you went to graduate school,
typically another four years beyond that. So you’re looking at a 20-year education of
a person. Today, we’re very impatient and think we should
be able to train a computer in a couple of days. But we need to find a way to gather, effectively,
20 years of data to train a machine on, and once we do that, then we’ll be able to have
them do very sophisticated things. That’s one of the biggest challenges, is how
do we find the data that’s required to train a machine to answer these questions. And the reason why we’re so excited about
molecular simulation is that once we know what data we need, we can actually calculate
this data using computational simulations that are based on physics. So, we’ll see an ongoing effect of our work
over the next couple of decades, I think. We already can do things now in a month that
used to take us a year, and that’s an enormous acceleration, and we will see continued acceleration
of what we are able to do. What we’ll see in the future is computers
that have been trained to solve engineering problems that are very complex, and that will
relieve us of having to do that by hand ourselves. So, in our view, these are not things that
would be used to replace scientists and engineers, but augment them and make them much more powerful
than they were in the beginning. So, I think we’re seeing a totally new age
of science and engineering where machine learning and computing is able to augment our intellectual
abilities much in the same way the steam engine augmented our mechanical abilities at the
beginning of the century. That led to a huge industrial revolution,
and today we have computing and machine learning that will lead to a huge intellectual revolution
in science and engineering.

2 Replies to “John Kitchin: Using Machine Learning to Improve Molecular Simulations”

  1. Hello,

    what specific method of machine learning have you used? Deep Learning with feed-forward networks? And how have you dealt with the fact that you might need many inputs regarding how many atoms or molecules you want to calculate?

  2. "20 years of education vs 2 days for a computer" makes a good sound bite. first off, not all 20 years are spent learning, minus sleep, eating, and shitting and you are left with 10 years at most. 2nd, the computer is just trained to optimize one specific set of functions

Leave a Reply

Your email address will not be published. Required fields are marked *