The Human Visual Search Engine

Radiologist Jeremy M. Wolfe on the searches we perform every day, airport screening, and search strategies

videos | February 5, 2015

What types of searches are humans capable of? How can society make search tasks more difficult? Professor of Ophthalmology and Radiology at Harvard Medical School Jeremy M. Wolfe explains how we go about our searches.

You are searching all of the time. You spend lots of time searching for your socks, the subway stop, or any number of things that you are looking for in the visual world. How do you do that?

Well, suppose you are looking for your cat. If your cat is sitting in the middle of a big white carpet, this is terribly easy. You immediately know where that cat is. If you were looking for your cat, on the other hand, amongst many other cats it would be much more difficult. What’s the difference between those two kinds of searches?

Treisman back in the 1980’s suggested there really were two different sorts of visual searches. There were what she called parallel searches and serial searches. In a parallel search, it didn’t matter what else was in the scene you were looking at, your attention was immediately grabbed by the target you were looking for. The cat in the middle of the white carpet is one example, but you wouldn’t use cats in a laboratory, so the example in the lab might be, suppose you’re looking for a red dot in a field full of green dots. You can immediately imagine that it would not matter how many green dots there were. The red dot would simply be immediately visible to you. That’s what Treisman called a parallel search.

On the other hand, there are other situations where you can imagine — let’s imagine looking for one letter in a screen full of other letters. Maybe you are looking for the letter T, among a bunch of letters that are L or just other random letters. Now, the T, the target letter, won’t just jump out at you. You’ll have to search from letter to letter until you find the target T. If you’re searching for something with more items on the screen, it will take you longer to find the T. The data show that the amount of time it will take for you to find the letter you are looking for increases linearly with the number of letters on the screen.

Treisman had this idea that there were the two kinds of searches: parallel and serial. Now imagine a slightly different situation. Suppose you are looking for a red T, and the other letters on the screen are red and green. Now you can imagine that you’ll still have to search, but you won’t search through all the letters. If it’s a red T you want, you will search specifically through the red letters on the screen. You won’t search through the green letters at all. When your attention is restricted to some subset of the items on the screen, that’s what you would call a guided search, or at least that’s what I called the guided search when I began to develop the guided search model back in the late 1980’s.

Those searches are going pretty fast, remember we were talking about looking for a T among other letters. If those letters are big enough that you don’t need to fixate on each one one after the other, then you can process about 20 or 30 letters every second. If you have to move your eyes to look at each letter, then you can only do 3 or 4 a second. If you have big items, you can process about 20 to 30 objects every second. You can really move pretty quickly in your searching.

This human search engine really has parallel search tasks, serial search tasks, but they lie on a continuum with guided search tasks in the middle. There turns out to be only a very limited set of things that will guide your attention in a guided search. If you think about Google. If you do a search in Google you can type anything you want into that search box, and Google will do something with it. That is not true of the human search engine. For a human search engine, there are about one to two dozen fundamental features that will guide your attention around the field. Those include things like color and orientation so you can find the vertical thing among horizontal things easily. You can find the big thing among small things. Then there are a bunch of things that are more subtle than that, like lighting direction. If you are looking for the one object that is lit from below among items that are lit from above, that turns out to be easy.

There are lots of things that don’t work. If you’re looking for one particular face. If you have a crowd of faces and you’re looking for your mother. You might think my attention will be guided to my mother the way it would be guided to that cat in the middle of a white carpet. This is just not true. It turns out you have to search from face to face in order to decide which face belonged to your mother.

Now, that might be a guided search, because if your mother has white hair, you won’t spend a great deal of time searching through faces that are topped by black hair, but you won’t be able to immediately look at a crowd of people and simply know that there is your mother present. The search vocabulary is limited to these few basic attributes, and even within those attributes, your search is limited again by what you can say about each one of these features.

I said you could easily find the one big item, and you can easily find the one small item. It turns out you can’t easily find the one medium sized item. You can tell a medium sized item once you pay attention to it, but to find medium you’ll have to search around. Similarly you can find the red item, or the green item, but you can’t say that I’d like to find exactly a 602 nanometer red light among these other red lights, even though perceptually you could tell the difference, it will not guide your attention.

You’re also limited, it turns out probably to just one attribute per object. So, you can look for the object that’s red, but it turns out to be much more difficult to look for the item that is red and green if there are other items in the world that are red or green. What’s kind of amazing is that even though you have all these very severe limits on what you can search for, under normal circumstances out in the world most of your searches are very easy. This very limited vocabulary is enough to help you find the cat, or find your mother. The cases where it’s not easy are sufficiently notable that we have expressions for them, like finding a needle in a haystack, which isn’t going to be a very guided search and that is the sort of thing that’s going to be difficult. Most of the time search tasks are quite easy.

Now it gets more difficult when our civilisation invents search tasks that we were not really built to do. For example, when you go to the airport, and the person at the checkpoint is looking for guns, bombs, and knives in your carry-on luggage, for example. This is a very difficult search task. Or when a radiologist is looking for cancer in an X-ray of a breast or of the lungs. These are difficult searches. People become experts at doing these tasks, but interestingly they become experts at using the same search engine that everybody else does.
If you go to medical school and become a radiologist, you are marvelous at finding lung cancer, you have not grown a new search engine, you have simply learned how to use your search engine, the basic human search engine, in an effective way for that task.

So, even though you have become an expert, let’s say as a radiologist or airport screener, the experts are still not performing at the rate that we would like them to perform. For example, in North America, a really good radiologist screening for breast cancer will miss about 20% to 30% of cancers that are presented to her. Obviously we would like to miss much fewer of them. One of the ways to try to do that is to get the computer and the radiologist to collaborate together. Computers can actually be trained to do rather effective visual search too, but it’s far from perfect. A good computer is about as good as a good radiologist. You would think that a good computer plus a good radiologist would be better than either of them, but so far that’s only a promise that has not really been delivered. Computers plus radiologists or computers plus airport screeners, any of these places where an imperfect computer and an imperfect human working together, it turns out it’s a little better if they work together, but it’s not as good as it should be.

We think that many of the reasons for this have to do with human psychology, not with the computer. The problems have to do with the willingness of the radiologist or the airport screener to trust the information the computer is giving them. An interesting area for future work will be to see how we can get computer-aided detection systems to work more effectively with the experts we have doing these important search tasks.

Professor of Ophthalmology and Radiology at Harvard Medical School, head of Brigham and Women's Hospital Visual Attention Lab
Did you like it? Share it with your friends!
Published items
To be published soon