Get a grip, literally: Clumsy robots can't nab humans' jobs just yet

Amazon challenge winners roll out neural net for droids that need to grab stationary stuff


Artificially intelligent software can drive robots to perform the most menial tasks, such as reaching out and gripping objects.

However, there's one thing they can't, er, grasp easily. And that's dealing with things that move unexpectedly, which right now rules them out of a lot of real-world labor. However, it does make them good for picking up stuff that typically stays still, such as clothes on the floor or boxes in a warehouse.

It’s a surprisingly difficult skill to master, according to a paper to be presented at the Robotics: Science and Systems conference starting on Tuesday at Carnegie Mellon University in the US.

"We have been able to program robots, in very controlled environments, to pick up very specific items," Jürgen Leitner, a postdoctoral research fellow at the the Queensland University of Technology in Australia, said on Monday.

"However, one of the key shortcomings of current robotic grasping systems is the inability to quickly adapt to change, such as when an object gets moved."

Leitner and his colleagues built a neural network to help robots grasp objects that they haven’t been explicitly trained on. If they are to be useful in the real world, bots will encounter new objects and need to adapt to different environments.

"The world is not predictable – things change and move and get mixed up and, often, that happens without warning – so robots need to be able to adapt and work in very unstructured environments if we want them to be effective," Leitner added.

"For example, in the Amazon Picking Challenge, which our team won in 2017, our robot CartMan would look into a bin of objects, make a decision on where the best place was to grasp an object and then blindly go in to try to pick it up."

The neural net, a Generative Grasping Convolutional Neural Network or GG-CNN, differs from most traditional convolutional neural networks (CNNs) used in robotics. To grabs a thing, it has to determine how to position the robot's fingers to grasp an object. It does this using pixel-wise analysis of the input image instead of a more traditional sliding window or bounding box outlining the edges of the thing.

Two people holding their fists up to the camera

RoboCop-ter: Boffins build drone to pinpoint brutal thugs in crowds

READ MORE

The GG-CNN model required fewer parameters than most CNNs so it’s much faster to execute, and takes about 19 milliseconds to run on a desktop computer equipped with a 3.6GHz Intel Core i7-7700 CPU and an Nvidia GeForce GTX 1070 graphics card.

"The Generative Grasping Convolutional Neural Network approach works by predicting the quality and pose of a two-fingered grasp at every pixel," said Douglas Morrison, first author of the paper and a PhD researcher at the Queensland uni.

"By mapping what is in front of it using a depth image in a single pass, the robot doesn't need to sample many different possible grasps before making a decision, avoiding long computing times."

It was trained by inspecting 885 images of real objects that have been labelled with positive points and negative points, which are where the robot should and shouldn’t aim to grip. Using a Kinova Mico 6DOF robot fitted with a Kinova KG-2 two-fingered gripper, it managed an 83 per cent grasping success rate on a series of eight 3D-printed objects with odd shapes, and 88 per cent on a range of 12 household items such as a screwdriver, teddy bear, and mug.

“This has benefits for industry – from warehouses for online shopping and sorting, through to fruit picking. It could also be applied in the home, as more intelligent robots are developed to not just vacuum or mop a floor, but also to pick items up and put them away," Leitner concluded. ®


Other stories you might like

  • For the average AI shop, sparse models and cheap memory will win
    Massive language models aren't for everyone, but neither is heavy-duty hardware, says AI systems maker Graphcore

    As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale.

    For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains.

    That’s where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR.

    Continue reading
  • If AI chatbots are sentient, they can be squirrels, too
    Plus: FTC warns against using ML for automatic content moderation, and more

    In Brief No, AI chatbots are not sentient.

    Just as soon as the story on a Google engineer, who blew the whistle on what he claimed was a sentient language model, went viral, multiple publications stepped in to say he's wrong.

    The debate on whether the company's LaMDA chatbot is conscious or has a soul or not isn't a very good one, just because it's too easy to shut down the side that believes it does. Like most large language models, LaMDA has billions of parameters and was trained on text scraped from the internet. The model learns the relationships between words, and which ones are more likely to appear next to each other.

    Continue reading
  • Photonic processor can classify millions of images faster than you can blink
    We ask again: Has science gone too far?

    Engineers at the University of Pennsylvania say they've developed a photonic deep neural network processor capable of analyzing billions of images every second with high accuracy using the power of light.

    It might sound like science fiction or some optical engineer's fever dream, but that's exactly what researchers at the American university's School of Engineering and Applied Sciences claim to have done in an article published in the journal Nature earlier this month.

    The standalone light-driven chip – this isn't another PCIe accelerator or coprocessor – handles data by simulating brain neurons that have been trained to recognize specific patterns. This is useful for a variety of applications including object detection, facial recognition, and audio transcription to name just a few.

    Continue reading
  • IBM AI boat to commemorate historic US Mayflower voyage finally lands… in Canada
    Nearly two years late and in the wrong country, we welcome our robot overlords

    IBM's self-sailing Mayflower Autonomous Ship (MAS) has finally crossed the Atlantic albeit more than a year and a half later than planned. Still, congratulations to the team.

    That said, MAS missed its target. Instead of arriving in Massachusetts – the US state home to Plymouth Rock where the 17th-century Mayflower landed – the latest in a long list of technical difficulties forced MAS to limp to Halifax in Nova Scotia, Canada. The 2,700-mile (4,400km) journey from Plymouth, UK, came to an end on Sunday.

    The 50ft (15m) trimaran is powered by solar energy, with diesel backup, and said to be able to reach a speed of 10 knots (18.5km/h or 11.5mph) using electric motors. This computer-controlled ship is steered by software that takes data in real time from six cameras and 50 sensors. This application was trained using IBM's PowerAI Vision technology and Power servers, we're told.

    Continue reading

Biting the hand that feeds IT © 1998–2022