This article is more than 1 year old
Robot brains? Can't make 'em, can't sell 'em
Why dopes still beat boffins
At every level, even specialists lack conceptual clarity.Let's look at a few examples taken from current academic debates.
We lack a common mathematical language for generic sensory input - tactile, video, rangefinder - which could represent any kind of signal or mixed-up combination of signals. Vectors? Correlations? Templates?
Imagine this example. If one were to plot every picture from a live video-feed as a single "point" in a high-dimensional space, a day's worth of images would be like a galaxy of stars. But what shape would that galaxy have: a blob, a disk, a set of blobs, several parallel threads, donuts or pretzels? At the point scientists don't even know the structure in real-world data, much less the best ways to infer those structures from incomplete inputs, and to represent them compactly.
And once we do know what kind of galaxies we're looking for, how should we measure the similarity or difference between two example signals, or two patterns? Is this "metric" squared-error, bit-wise, or probablistic?
Well, in real galaxies, you measure the distance between stars by the usual Pythagorean formula. But in comparing binary numbers, one typically counts the number of different bits (which is like leaving out Pythagorus' square root). If the stars represented probabilities, the comparisons would involve division rather than subtraction, and would probably contain logarithms. Choose the wrong formula, and the algorithm will learn useless features of the input noise, or will be unable to detect the right patterns.
There's more: the stars in our video-feed galaxy are strung together in time like pearls on a string,in sequence. but we don't know what kind of (generic) patterns to look for among those stars -linear correlations, data-point clusters, discrete sequences, trends?
Perhaps every time one image ("star") appears, a specific different one follows, like a black car moving from left to right in a picture. Or maybe one of two different ones followed, as if the car might be moving right or left. But if the car is black, or smaller (two very different images!), would we still be able to use what we learned about large black moving cars? Or would we need to learn the laws of motion afresh for every possible set of pixels?
The problems don't end there. We don't know how to learn from mistakes in pattern-detection, to incorporate errors on-the-fly. Nor do we know how to assemble small pattern-detection modules into usefully large systems. Then there's the question of how to construct or evaluate plans of action or even simple combinations of movements for the robot.
Academics are also riven by the basic question of whether self-learning systems should ignore surprising input, or actively seek it out? Should the robot be as stable as possible, or as hyper-sensitive as possible?
If signal-processing boffins can't even agree on basic issues like these, how is Joe Tinkerer to create an autonomous robot himself? Must he still specify exactly how many pixels to count in detecting a wall, or how many degrees to rotate each wheel? Even elementary motion-detection - "Am I going right or left?" - is way beyond the software or mathematical prowess of most homebrew roboticists.