“My dream is that I will drive myself out of business as a scientist and will get computers to do science by themselves”

Professor Ilya Nemenman on machine learning, the laws of biology, and the quest for a ‘robot-scientist’

- talks | December 7, 2015

Ilya Nemenman is Associate Professor of Physics and Biology and the Head of Theoretical Biophysics Lab at Emory University. Serious Science has asked Prof. Nemenman to speak on the use of artificial intelligence tools in biological and physical research.

What is your laboratory working on?

I am a theoretical physicist and I work in biology, so it is a lab on the interface of the two. And specifically what I am interested in are two questions on this interface of biology and physics. One of them is: how is the information processed in biological systems of different scales: from single cells to neural circuits in the brain, and even to big populations of animals. The reason why I am interested in this is that information is a fundamental physical quantity, it’s as physical as energy or mass. The same general physical laws that describe information processing, information transmission, computation and all that in cells are applicable to the brain or to a population of organisms. Thus one can have a general understanding of this phenomenon that is incredibly widespread, incredibly important across the entire living world. For instance, we are talking right now, and we are exchanging information. When you walk, your body collects information about its own position in the world, processes it, and sends this information to the muscles in one way or another to respond. So in many respects life is information processing, and that’s what we study – not a specific biological system, but the general laws of information processing across biology. This is one aspect of what we do.

The other aspect is a bit more esoteric and bit more, maybe, dreamy. We look for laws of biology. You are familiar, or course, with the fundamental laws of physics: Einstein’s gravity, Schrödinger’s equation and things like that. But, in fact, most of physical laws are not fundamental, like the ones that I have mentioned. Most of the laws that we study in high school or college would be what we call ‘phenomenological laws’, coarse-grained laws. An example is the Hooke’s law – how the force is related to the extension of a string, or the Ohm’s law, or the ideal gas law. These laws are phenomenological equations that describe the relation between the multiple different physical observables, and it’s not immediately obvious how they are related to the fundamental laws of quantum mechanics somewhere deep underneath. In biology it is unclear if there are fundamental laws, but we would hope that there are enough of these phenomenological coarse-grained laws. So we are trying to find these laws using multiple different approaches that have been reasonably successful in physics, and which we are now applying to biology.

So that is a rough description of what we do: we study information processing and we try to find phenomenological laws that describe biology, and specifically information processing in biology.

As far as I understood, among the tools that you use in your research are machine learning and artificial intelligence in general. Could you please tell us more about how you use these tools and what they are applied for?

There are many different answers. The first thing that I am sure you realize is that machine learning itself grew originally out of statistical physics – this is where some of the most original and important contributions, like the Hopfield network in computational neuroscience, were developed. We still do some things similar to those – we try to, let’s say, think about the theory of machine learning processes and of learning processes more generally. This is one aspect of our work.

Chemist Mark Tuckerman on the laws of motion, observables, and the Monte-Carlo approach
Another aspect is basically being users. We take some of the algorithms that people have developed in the other branches of machine learning and statistical inference more generally, and we apply them to learn something about biology. For example, we looked at sequencing data where a friend of ours has produced hundreds of thousands of sequences of a certain piece of DNA in a bacterium E.coli, which controls the activity of what is known as the lac operon– the machinery that allows E.coli to metabolize a sugar lactose. This was a lot of sequencing data, and one hopes to find interesting correlation patterns in them: if a certain feature is present , then something else happens in these data. For example, maybe if there is a particular nucleic acid at a certain position in the sequence, then the cell produces more of the enzyme that metabolizes the sugar. Much of machine learning is about finding such patterns, categorizing such patterns. So we used existing machine learning algorithms to analyze these data and to find interesting patterns in genetic sequences. Another example of using existing machine learning methods is our recent paper, where we used Bayesian learning methods to search for patterns expressed as differential equations, as dynamics. We were trying to find a set of differential equations that could explain the data that were observed, and we used machine learning methods for doing the searching for us.

So the first aspect of our use of machine learning is using statistical physics tools to develop new methods, and the second one is being users. Now the third way in which we use machine learning methods is to build models of how the brain works. For example, with our collaborators Sam Sober here at Emory, we study how a songbird – a Bengalese finch – learns its songs. They are not born knowing their songs, they learn them from their fathers. And it is a very complicated process: How does the brain learn and update with time which command to send to muscles to produce a specific song that it wants to achieve? We are trying to model this process just like I would model a machine learning process. This helps to understand what are the learning-related computations thathappen inside of the brain of this bird when it’s trying to figure out how to move its vocal muscles. In other words, we take the knowledge that people have produced in the field of machine learning and then try to view biological systems as learning systems.

This completes the loop: we develop new methods, we use the methods, and then we take the methods and ask: does it look like biological systems actually use these methods or something similar to them when they learn?

I found that you did some research on teaching neural networks to segment the visual field, if I understand correctly?

Yes, this was a minor project that we did in the lab. Let’s put it in some context. Right now scientists know quite a bit about how visual system in primates like us works. In fact, the algorithms that Google and Facebook and other big software companies use to tag images, to ask you, “This image has your friend here, is it true? Yes or no? Is his name such and such? Yes or no?”are based on ideas that neuroscientists extracted from the workings of real brains.

The main idea is that one has a layered neural network where the 1st layer measures pixels, the black-and-white or colored brightness in the outside world. And the next layer looks at these pixels and tries to build edges out of these pixels, because we know that the visual world is made of objects, which have edges. And then the next layer starts to look at larger scale features like junctions between edges or maybe textures or something like that. After a few more layers, the representation features become more and more complex and eventually, in the last layers you would have information about the existence of actual objects inside the visual scene. Within this general paradigm, we tried to think about how would a neural network know that two different edges belong to the same object, that they should be part of the same object when it is later asked what that object is?

Professor Mitchel Resnick on the benefits of learning to code, use of programming in everyday life, and the development of Scratch
This question was first addressed in the early 1990s, and Steve Zucker at Yale contributed the most. He said that edges that belong to the same object are not independent of each other because the contours of objects are continuous. While it can be that two edges that belong to the same object do not quite fall on one continuous smooth line, this would be not very common because objects in the real world are smooth more often than not. We should be able to express these ideas mathematically to figure out which edges should be what’s called “bound together” to form a long-range contour.

This broad question was solved, but it’s actually not very often used in modern machine vision algorithms. We had some very specific additional questions in this context. For example, if I were to build a computer vision system based on the idea that if edges change smoothly, then they belong to one large contour, would this system be comparable to a human in deciding whether two edges should or should not be bound together in one large contour? The answer is ‘yes’: these algorithms are comparable to a human, they have about the same accuracy and they need about the same time as a human would need to make a decision whether two edges form a continuous line or not. They make, roughly speaking, the same errors: if a human makes an error, then this algorithm will also make an error. This means that probably a human does a very similar computation to what these algorithms do.

Later on, we did another paper based on the same ideas. We asked, “Can we simplify this model? Can we achieve, say, 90% of what the best model does, but with the model that is only 10% as complex?” It turns out that one can do it, and the model does not need to resolve individual neurons. If one just looks at the average activity of many neurons in a certain area, then that might still be sufficient to perform this type of computations with a good accuracy.

Was it implemented only on a computer or did you try to incorporate this into a robot and see how it would behave?

No, we didn’t implement it into a robot – this is for engineers to do.. What we usually do in our studies is we compare the performance of our model with the performance of a human participant. We give an image to our program and then we show the same image to a human, and we ask questions, like “Does this image have an object in it or does it not? How many objects does it have? Or does this image have more objects than that other image? Would you be able to classify this picture very quickly, maybe in a hundred milliseconds, in terms of its content? For example, is this a picture of the morning rush hour traffic? Or is this a picture of a classroom with students and teachers?” These are questions for both a human subject and a computer code.
We believe that we have understood something about how the human vision works when our computer code performs about the same way as a human performs and makes the same errors. Not just the same number of errors, but the same type of errors. Thus we develop machine vision algorithms that are valuable not for their intrinsic machine vision power, but because they tell us something about how vision works in a human.

Switching to another aspect of your research: you already touched upon it, but could you please tell us more about this idea of a robot scientist and specifically about your recent development in this field?

Let’s start from afar: how does science proceed? My favorite scientist, Richard Feynman, gave a beautiful collection of lectures, ‘Messenger lectures’ as they were called, at Cornell for the general audience. In the last lecture he talked about how we find new laws of nature. His thought was that we start by guessing the new law: we look at data and guess that maybe there is a certain equation that describes the data. And then we try to use this guess and make predictions. Something like: if this law describes the system and we were to influence the system in a certain way, such as apply a perturbation or knockout a certain gene, then according to the law that we have just guessed the system would respond in a certain way. Then we do an experiment and verify whether our predictions agree with the experiment. If we guessed correctly, if the predictions and the experiments agreed, then we do another experiment. If we guessed incorrectly, if the predictions didn’t agree with the experiment, then we go and start guessing again.

There have been some attempts where people have tried to use the power of modern computers to automate this guessing. Instead of a human staring at data and trying to write down a set of equations that describe them, we would let a computer try to produce a new law. And then we would do experiments and verify these laws. I think some of the more influential papers came from Hod Lipson, who is currently in Columbia. On the experimental side, some of the influential papers came from John Wikswo at Vanderbilt. However, the approach hasn’t been very successful so far. The reason is that the search space in which the computer is supposed to guess for a new law is too large; there are very many different equations that one can write down. And if one starts guessing naively: one equation – what does it predict? Another equation – what does it predict? then it will take the age of the universe or more to guess the right laws.

And did you find a solution to this problem?

Yes. We said that maybe the approach of using computers to speed up the guessing part of the discovery of the physical laws could be modified a bit. Maybe one does not need to search through the entire huge space of all possible laws, all possible relations between the data. Instead, maybe it’s sufficient to look at relations that are only approximately true. And while the number of all possible relations in the data is astronomically large, a set of very complex mathematical equation may in a certain way be similar to a much simpler set of equations. A set of simple equations that can approximate with a high accuracy most relations among data may not be that large. Then if you focus the computer just on such approximate searching, if you do not insist on finding an exact law of nature, but you are quite satisfied with a law that is approximately correct, then this search part becomes manageable.

And how do you make a computer guess the laws?

Well, we don’t actually make it guess. Feynman described it a guessing, but we have found a very deliberate procedure of searching through possible approximate laws. To explain this, lets first define the problem carefully. We are talking about a problem, in which we get a time series to analyze and model. This could be a time series of temperature outside our window, or it could be time series of positions of a planet at different moments when it is moving around the Sun, or it could be a time series of how strongly a certain molecular receptor is phosphorylated (chemically modified) as a function of time. Our task is to guess the law describing this time series, which means guessing differential equations whose solutions match the time series.

What are the two problems that would make it hard to guess right? The first may be that the law that describes the system is very complex – it’s not linear, it’s not quadratic, it’s some arbitrary function that explains these dynamics. And there are a lot of arbitrary functions to choose from. We can sort of order them: we can say that we are going to first look at linear functions and then at functions that area quadratic parabola, and then at functions that have a third order polynomial in them, and so on. And we know that by adding more of these polynomial terms, powers of variables that the time series measure, we would be able to approximate functions of more-or-less any complexity.

The other problem is that when you want to model a time series — let’s say, the temperature outside of your window – then the temperature is not the only thing that matters. What the temperature is here depends on what the temperature was a hundred kilometers away some time in the past, and on the wind that brings the far away weather to here. And these data are hidden, unmeasured, variables– that the data you have doesn’t measure everything that needs to be measured to figure out the equations that describe it.

It turns out that a big problem of trying to find equations that describe time series is that one needs to do both of these two things at the same time: to guess or systematically explore all sorts of weird nonlinear complexities in the dynamics, and also to figure out if the data misses some important variables, and, if yes, then how many and which ones. To see how this can be a problem, imagine that you try to find something and you are moving along a long corridor with doors leading to rooms from it. The 1st room is labeled‘linear functions’, the second is labeled ‘quadratic functions’, the third – ‘cubic functions’. You stick your nose into a door and ask: does linear work? – not quite; does quadratic work? – no, it doesn’t; does cubic work – yes, it does, and so you stop your search there. But if you need to explore these various complexities while at the same time adding new variables, then you don’t have the one-dimensional corridor along which you move. Instead, you now have to search in a plane: should you add one nonlinearity and two hidden variables, or should you add two hidden variables and seven nonlinearities? – or any other combination of those. This search space of all possibilities becomes too large, and that’s where the guessing comes from –guessing is your best strategy since systematic search in this very large space of all possible equations is not practical.

We managed, roughly speaking, to order the search process of how many nonlinearities one adds for how many hidden variables. We give a certain prescription that allows you to go back to searching a long corridor for a room, but now the rooms have both nonlinear complexities and hidden variables in them. We don’t have to guess anymore and can stick our nose into doors along this one-dimensional corridor and rather easily find approximate equations we are looking for. So in our case the computer actually doesn’t guess, but it methodically searches through possibilities in this ordered set of approximate differential equations, and it’s not a very hard search.

And these equations, do you pick them yourself?

What we pick ourselves is the whole set of all possible differential equations. They have a certain form – we don’t allow every equation to be considered. And that set is our input to the computer system. And the beautiful part is that, as I said before, even with that limited set of equations one can still approximate with very high accuracy very complex time series. So the equations are limited, but they are nonetheless very powerful. And because this set is so limited in its structure, a computer can methodically search through it.

What is the future of the field? What will we be able to achieve with our current technologies or with future technologies?

It’s really hard to say – we all have dreams, and it’s very hard to guess if they will come true. My dream is that I will drive myself out of business as a scientist and will get computers to do science by themselves. I don’t think that this is going to happen any time soon, but this is the dream. More concretely, I can tell you what we are trying to do, and what I think is rather realistic.

Let’s look at type II diabetes. It’s a big problem: a huge fraction of the population has diabetes, and no two cases are exactly the same. There has been a lot of work done that showed that the dynamics of insulin secretion, and before that, the dynamics of calcium activity, which is a precursor of insulin secretion, varies across cells and across individuals. Different cases are described more-or-less by the same equations, but these equations have

Biomedical Engineer Thomas Heldt on preventive patient care, patterns of disease progression, and quantifying an individual's health
somewhat different parameters – maybe someone is a bit older, someone has a bit more of certain cells, or certain enzymes than somebody else. So time series from different individual cells will be described by equations with the same structure, but with different parameters in that same structure. Thus if one creates a drug or a treatment protocol that is tested on an average human, it’s not going to work well on every individual human. At the same time, if we were to try to infer the model, the mathematical equations and their parameters, that describe an individual human in the traditional paper-and-pencil way, we won’t succeed since there are more people in the world who have diabetes than there are scientists. Thus one of the things that we are working on, in collaboration with my long-term collaborator and friend Andre Levchenko at Yale, is a new project where we use automatic “robot-scientist” techniques, like the one we have developed, to start with a general, averaged model, and then refine it to derive equations that describe individual subjects. We would then hope to make and test predictions about how a specific subject is going to respond to a specific treatment. And I think that this is feasible.

We may not be able to use automated modeling ideas to guess totally new laws of nature of the same fundamental importance as the Newton’s laws or the Einstein’s gravity or something like that. But I think what we will be able to do is to take a general paradigm, a general mathematical relation among physiological quantities in a patient, and then to refine the general model to the specific patient, to understand why this patient responds to the drug and why the other one doesn’t, and how we should change the treatment protocol to elicit a response we want. In other words, I think these methods may help in personalized medicine. This is one of the directions, in which we are pushing these ideas.

Did you like it? Share it with your friends!
    Published items
    0522
    To be published soon
    +76

    Most viewed

  • 1
    David E. Nichols
  • 2
    Alan Baddeley
  • 3
    Barbara Sahakian
  • 4
    Martin Rees
  • 5
    Robert Plomin
  • 6
    Chris Frith
  • 7
    Byron W. Caughey
  • 8
    Barbara Sahakian
  • 9
    John Stein
  • New