OpenAI – Learning Dexterous In-Hand Manipulation


Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This work is about OpenAI’s new technique that teaches a robot arm to dexterously manipulate a block to a target state. And in this project, they did one of my favorite things, which is, first, training an AI within

A simulation, and then, deploying it into the real world. And in the best case scenario, this knowledge from the simulation will actually generalize to the real world. However, while we are in the simulation, we can break free from the limitations of worldly things, such as hardware, movement speed, or, even time itself.

So how is that possible? The limitation on the number of experiments we can run in a simulation is bounded by not our time, which is scarce, but how powerful our hardware is, which is abundant as it is accelerating at nearly exponential pace.

And, this is the reason why OpenAI’s and DeepMind’s AI was able to train for 200 years worth of games before first playing a human pro player. This sounds great, but a simulation is always more crude than the real world, so do we know

For sure that we created something that will indeed be useful in the real world, and not just in the simulation? Let’s try an analogy. Think of the machine as a student, and the simulation would be its textbook that it learns from.

If the textbook contains only a few trivial problems to learn from, when the day of the exam comes, if the exam is any good, the student will fail. The exam is the equivalent of deploying the machine into the real world, and apparently, the real world is a damn good exam.

So how can we prepare a student to do well on this exam? Well, we have to provide them with a textbook that contains not only a lot of problems, but also a diverse set of challenges as well. This is what machine learning researchers call domain randomization.

This means that we teach the AI program in different virtual worlds, and in each one of them, we change parameters like how fast the hand is, what color and weight the cube is, and more. This is a proper textbook, which means that after this kind of training, this AI can deal

With new and unexpected situations. The knowledge that it has obtained is so general that we can change even the geometry of the target object and the machine will still be able to manipulate it correctly. Outstanding. To implement this idea, scientists at OpenAI trained not one agent, but a selection of

Agents in these randomized environments. The first main component of this system is a pose estimator. This module looks at the cube from three angles and predicts the position and orientation of the block, and is implemented through a convolutional neural network.

The advantage of this is that as we can generate a near-infinite amount of training data ourselves. You can see here that when the AI looks at real images, it is only a few degrees worse than in the simulation when estimating angles, which is the case of the excellent textbook.

I would not be surprised if this accuracy exceeds the capabilities of an ordinary human, given that it can perform this many times within a second. Then, the next part is choosing what the next action should be.

Of course, we seek to rotate this cube in a way that brings us closer to our objective. This is done by a reinforcement learning technique, which uses similar modules as OpenAI’s previous algorithm that learned to play DOTA2 really well. Another testament to how general these learning algorithms are.

I also recommend checking out OpenAI’s video on this work in the video description. Now, I always read in the comments here on Youtube that many of you are longing for more. 5 minute papers, 10 minute papers, 2 hour papers were among the requests I heard from you before.

And of course, I am also longing for more as I have quite a few questions that keep me up at night. Is it possible for us to ever come up with a superintelligent AI? If yes, how? What types of these AIs could exist? Should we be worried?

If you are also looking for some answers, we are now trying out a sponsorship with Audible, and I have a great recommendation for you, which is none other than the book Superintelligence by Nick Bostrom. It addresses all of these questions really well, and if you sign up under the link below

In the video description, you will get this book free of charge. Whenever you have to do some work around the house, commute to school or work, just pop in a pair of headphones and listen for free. Some more AI for you while doing something tedious. That’s as good as it gets.

If you feel that the start of the book is a little slow for you, make sure to jump to the chapter by the name “Is the default outcome doom”. But buckle up, because there is going to be fireworks from that point in the book.

We thank Audible for supporting this video, and send a big thank you for all of you who sign up and support the series. Thanks for watching and for your generous support, and I’ll see you next time!