We see with our brains. Then we check with our eyes.
Our retina takes in light, varying by brightness and color. It transmits information along the optical nerve to the primary visual cortex. There, specialized cells activate on outlines and contours in various orientations (horizontal, vertical, oblique). This part of the brain separates objects from backgrounds.
Along the pathway from there to the inferior temporal cortex, face-contours go one way, object-contours go another. Here and in higher-level processing, meaning and categories are assigned to images. Then we perceive.
All of this is affected by memories of things we’ve seen before. Visible edges are supplemented by inferred ones. Depth is judged by remembered sizes, among other clues; binocular vision is only useful close-up. What we think we’re looking at determines where our eyes move in their saccades, and this determines what we get a clear view of. Vision depends on context and history.
This highly inexpert summary comes from listening to The Age of Insight, by Eric Kandel, neuroscientist. (Audible does not provide a PDF of diagrams, grr.)
Andy Clark goes farther in Surfing Uncertainty. At every level, from retinal nerve cell on up, signals from the outside are compared to expectations. Only surprises are transmitted up the hierarchy. Our vision starts with guesses, which are broken down into what we expect to see at smaller and smaller scales, and at each scale these guesses are tested against the incoming light signals.
This makes sense to me. When I hear stuff like “the retinal gangleon get the light signals and assemble them into colors and position, and then the primary visual cortex deduces edges and contours, and then the inferior temporal cortex recognizes objects and faces” I think: gah, that sounds like so much work.
Why would we do that work? I know very well that I see a sky and trees and billboards and road. Why would I ask my eyes to process the incoming data? If my retina cells don’t see blue (or gray or white) in the top part of the visual range, then I want to notice it. Otherwise, geez, take a breather. Read the billboards, they’re all different.
One day while carpooling to work, in the passenger seat, I played a game. I looked out the window and tried to see what was there. Not what my brain is trained to see, the buildings and billboards placed there by humans for humans to look at. I noticed some wild growth, some derelict corners and alleys, and many cell phone towers. Each time, I tried not to judge (categorize, evaluate) what I saw, but keep seeing.
It was exhausting! By the time I got to work, my brain was done. I didn’t get any useful code written that day. This is not what my eyes are doing most of the time.
In video transmission, we send deltas, not pixels. And we can use all kinds of protocols to describe common deltas, expected changes, to reduce bandwidth use. Our brains do that, too.
The hierarchy of vision communicates in both directions. Expectations down, surprises up. At every level, an interplay between meaning and incoming signals. Hypothesis, test. Result, new hypothesis, test. It’s a duck, OK yeah. It’s a rabbit, OK yeah.
Thinking about vision this way gives me new appreciation for how our past experience changes what we see. It also gives me new ways of thinking about hierarchies: the helpful ones pass information in both directions.
We see with our brains and our eyes and many nerve cells in between, working together in both directions. I wonder if we can work this well together in our organizations.