Free Energy Principle

Neuroscientist Karl Friston on the Markov blanket, Bayesian model evidence, and different global brain theories

videos | December 15, 2016

The video is a part of the project British Scientists produced in collaboration between Serious Science and the British Council.

The Free Energy Principle originally emerged from systems neurosciences as a principled way of understanding what the brain does and how it does it. Subsequently, the principles proved to be so simple and powerful that they have been applied in a variety of contexts. So one could almost regard the free energy principle as an organizing principle for any living system that shows the characteristics of life.

So, the reason I start like that is that there are two roads to explaining or understanding the free energy principle. You can either start from the perspective of people like Helmholtz in the 19th century trying to understand unconscious inference in the brain and build a story through analysis, synthesis, and psychology through to current and exciting developments in machine learning – things like Geoffrey Hinton’s Helmholtz machine. And then how that has become contextualized in the enactivist or the embodied cognition context. I’m generalizing these notions, and you end up with the free energy principle. Or you can start from the top and just ask very simple questions about what it is to be alive. And, if you are alive and you exist, what sorts of behaviors must you show? And in fact, if you answer those questions, you end up with exactly the same answers that you would have gotten had you followed the historical route.

For brevity, I’ll take the high road. I’ll go from the minimalist assumption that things exist and then try and unpack that and show how one can get to notions of the brain as an inference engine, sometimes called the Bayesian brain hypothesis. The brain is one of the best examples of an organ that is actively constructing explanations through its own sampling of the world. So, this inactive perspective is very important because not only does the brain then have to explain all the sensory input, but it also has to choose which sensory input to sample. It is in charge of gathering information and evidence for its own predictions and own beliefs about the world. But I’ve jumped ahead, so now I have to explain to you why is it that any system that exists will behave as if it has a model of the world and it’s trying to gather evidence for its own model of the world.

So, the story starts just by acknowledging that if you want to talk about something, there has to be a separation between the thing you are talking about and everything else. And, if there were no boundaries, there would be nothing because there would be no distinctions between the thing and not that thing. Statistically speaking, that distinction or that boundary is called a Markov blanket. It’s just a mathematical way of separating states of some abstract world system: organism, culture, life, cell, brain into things that are internal to the boundary that is owned by that system, and things that are outside the boundary that is external to the system. So, it could be a cell and its milieu; it could be a phenotype; it could be me and my environment. Well, at any scale, there has to be this division. Now, the very existence of that separation, that Markov blanket, in conjunction with the assumption that that system exists over time, tells you something quite profound about the behavior of the internal states and the states that constitute the Markov blanket.

This is a bit abstract, but it is actually quite simple. The Markov blanket has two bits to it. There are the sensory states that are just defined because they don’t influence the external states, but they do influence the internal states. So sensory information, for example, would be mediated by sensory states as they get from the outside world into my internal world, my brain. And there are active states that go in the other direction. So, they influence external states but are not influenced by external states. They are actually dependent upon the internal states. If I take myself as a model of my world, my active states would be how I am currently moving, whereas my sensory states would be the activities of my photoreceptors, all the sensory organs, and the sensory epithelia I had at my disposal.

Clinical Brain Imaging

Neuroscientist Sylvain Baillet on CT scanning, the way MRI works, and other techniques that revolutionized medicine

Let’s put that Markov blanket aside for one moment and just think about what it means for a system to exist over periods of time. What that means is that it is effectively resisting a dispersion by random fluctuations. Perhaps the simplest example would be that if I dropped or placed a drop of ink in a cup of water, then almost immediately, it would start to disperse as random fluctuations disperse all the molecules around. And, I would not call that drop of ink a living drop of ink because it has dispersed. If, however, I placed a drop of ink in some water and then, to your amazement, you saw it gather itself up, then relax a bit, and gather itself up again, like it was breathing as if time was reversed, you would say there’s something very peculiar about that drop of ink. It’s almost as if it was living, and you become quickly convinced it was alive. And the only reason you would endow it with the property of self-organized life, biotic self-organization, is that it’s not dispersing. And the only reason it’s not dispersing is that all of its internal states and its Markov blanket that separates it from the rest of the water are moving toward the center of the drop. The flow of the molecules of the system is exactly countering the dispersive forces that are trying to disperse it throughout the water. Now that flow, operationally or mathematically, can provably be shown to be simply moving uphill on the probability distribution of where the ink molecules should be. And that probability distribution mathematically is also the same as something called the Bayesian model of evidence.

I don’t have time to go into it, but it is a beautiful observation that the defining dynamic of any system that does not dissipate over time is that they, on average, will move or their states will flow so as to maximize model evidence, Bayesian model evidence. So, that means that if a system exists, then it will appear to maximize Bayesian model evidence; it will appear to be a little Bayesian engine. It will appear as if it has a model of its world. Why? Well, let’s now go back to the Markov blanket that comprises the active and sensory states and the internal states that are encompassed by the Markov blanket. The law, the rule which says that all of the states must maximize model evidence which is also known as marginal likelihood, is also an inverse upper bounded by free energy, hence the free energy principle. All of those states have to maximize marginal likelihood or minimize free energy, including action. That means actions and sensations on the internal states are all doing the same thing. Which means that we can understand the internal states say of the brain as modeling the world because they are maximizing the Bayesian model evidence for a model of the world or me. At the same time, my action is also trying to maximize the evidence for my model of the world. So, put very simply, almost by definition, I am in the game of garnering information that maximizes the evidence of my own existence, and that’s basically the free energy principle. It’s a corollary or a consequence of any system that doesn’t dissipate; it looks as if it has to behave as if it is maximizing actively soliciting information from the environment and modeling that information as a model of the environment to maximize the evidence for its own existence. And that’s where we started with the long history of Helmholtz’s notion of unconscious inference right through to modern-day machine learning formulations, for example, the Helmholtz machine of Geoffrey Hinton and Peter Dayan.

That can be unpacked at many, many different levels, and it has provided a very useful framework within which to understand how that free energy principle is complied with and by the biology and the anatomy and the physiology of the brain. What it tells you is that the anatomy of any system has to contain with it a model of the environment in which that system is immersed. Which means that if we live in a world that has some deep hierarchical structure, in which there is action at a distance, for example, the color of objects around me is determined by the instant light as it comes almost instantaneously to my eye, or a falling body is caused by gravity, then my brain must recapitulate that causal structure, and of course, it does.

The very fact that we have nerve cells with long slender connections connecting each other at a distance speaks exactly to the causal architectures of the world that we inhabit, have this action as a distance, and this sparse connectivity. Furthermore, the hierarchical structure of the world is recapitulated in the neuronal structures that constitute the hierarchies of the connectome or the hierarchical disposition of functionally specialized brain areas.

You can go further, if the brain is truly a statistical model of the world it inhabits, can we understand some fundaments of brain organizations, such as the distinction between what and where streams in the brain? So a very powerful observation, a principle of functional specialization, is that where processing for a stream of brain areas is roughly down here and a more dorsal stream is concerned with what. That may be a simple reflection of the fact that we live in a universe where different things can be in different positions so that we can statistically separate the «whatness» from the «whereness». If we lived in a universe where whenever something moved, it also changed its nature, we couldn’t do that. So, just by looking at the brain, I can tell you the sort of universe that you inhabit under the free energy principle, under the assumption that your brain has become a model of the environment that it inhabits.

The free energy principle has been quite useful from my perspective and that of my colleagues largely because it shows the connections between previous theories. There are many global brain theories that have been brought to bear. For example, the principle of minimum redundancy, maximum efficiency, notions of the brain extracting as much information as it can from the environment.

Social Neuroscience

Neuroscientist Mahzarin Banaji on the role of functional MRI in social neuroscience, ways our brain perceives social world, and the origins of human consciousness

There are other theories that speak to how we select and value certain behaviors. It’s useful to see how all of these become special cases of a variational principle, one of which is the free energy principle. Which means that you can now talk to different disciplines and see how one particular construct, theoretical and/or empirical evidence, speaks to another theoretical construct and essentially see how they are approaching the same problem from different perspectives. Because you’ve got a principal framework, it also allows you to make a very particular hypothesis about the process theories that would conform to the principle.

So, all I’ve said so far is that, in principle, every internal state, every action that I make, and every sensation that I gather should be at the service of minimizing variational free energy or maximizing the marginal likelihood. How? How do you do that? How does a brain do that? But, if you know what the objective function is if you know what the process is, and what the imperatives are, you can then cast it in terms of processes. For example, I can say: “well, this minimization of variational free energy or maximization of Bayesian model evidence is a hill climbing or gradient descent algorithm. So, I can now write down a differential equation where everything, every neuronal state, and physiological variable in the brain now becomes describable as a differential equation given other states in the brain. And if that equation is true, then I can now go and map the variables to physiological processes.

And if one plays that game, you can go an enormous way in starting to understand not just anatomy but also physiology and also can generate questions because there are alternative processes that don’t conform with the same principle. So does the brain use sampling techniques to maximize model evidence, or does it use hill climbing optimization schemes and variational schemes? So, you start to generate a whole testable raft of hypotheses pertaining to the process theory that are all consistent with the overarching principle.