pwned by a self-learning AI

Backchannel has a fascinating profile of DeepMind founder Demis Hassabis which although an interesting read in itself, has a link to a brief, barely mentioned study which may herald a quiet revolution in artificial intelligence.

The paper (available online as a pdf) is entitled “Playing Atari with Deep Reinforcement Learning” and describes an AI system which, without any prior training, learned to play a series of Atari 2600 games to the point of out-performing humans.

The key here is ‘without any prior training’ as the system was not ‘told’ anything about the games. It worked out how to play them, and how to win them, entirely on its own.

The system was created with a combination of a reinforcement learning system and a deep learning network.

Reinforcement learning is based on the psychological theory of operant conditioning where we learning through reward and punishment what behaviours help us achieve certain goals.

One difficulty is that in video games, the reward (points) may only be distantly related to individual actions because strategy is not something that can be boiled down to ‘do this action again to win’. Mathematically there is lots of noise in the link between an action and eventual outcome.

Traditionally this has been solved by programming the structure of the game into the AI agent. Non-player characters in video games act as effective opponents because they include lots of hard-coded rules about what different aspects of the game symbolise, and what good strategy involves.

But this is a hack that doesn’t generalise. Genuine AI would work out what to do, in any given environment, by itself.

To help achieve this, the DeepMind Atari AI uses deep learning, a hierarchical neural network that is good at generating its own structure from unstructured data. In this case, the data was just what was on the screen.

To combine ‘learning effective action’ and ‘understanding the environment’ the research team plumbed together deep learning and reinforcement learning with an algorithm called Q-learning that is specialised for ‘model-free’ or unstructured learning.

So far, we have performed experiments on seven popular ATARI games – Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, Space Invaders. We use the same network architecture, learning algorithm and hyperparameters settings across all seven games, showing that our approach is robust enough to work on a variety of games without incorporating game-specific information…

Finally, we show that our method achieves better performance than an expert human player on Breakout, Enduro and Pong and it achieves close to human performance on Beam Rider.

The team note that the system wasn’t so good at Q*bert, Seaquest and Space Invaders, and it wasn’t asked to battle the real Ko-Dan Empire after playing Starfighter, but it’s still incredibly impressive.

It’s an AI that worked out its environment, its actions, and what it needs to do to ‘survive’, without any prior information.

Given, the environment is an Atari 2600, but the AI is a surprisingly simple system that ends up, in some instances, outperforming humans from a standing start.

Essentially, the future of humanity now rests on whether the next system is given a gun or a dildo to play with.

Link to Backchannel profile of Demis Hassabis.
pdf of paper “Playing Atari with Deep Reinforcement Learning”

2 thoughts on “pwned by a self-learning AI”

Hmm. Am I odd in finding psychological and Ai theories that have words like “deep” in their names suspicious? What would articles on “deep learning” look like if “deep” were replaced with, oh, say, ASDFG, and the authors had to describe “ASDFG learning” in such a way as to persuade us that they had a theory that actually had some explanatory value. ASDFG learning allows this program to learn new structures by X, Y, and Z, and this is new because Q. At which point we’d be able to ask, what is it about ASDFG learning that allows it to learn?

FWIW, AI has already been there. Drew McDermott’s paper “Artificial Intelligence Meets Natural Stupidity” explains why using words with meanings in AI theories is a bad idea, and Doug Lenat’s “Why Eurisko Appears to Learn” noticed that his program’s a priori data structures were isomorphic to the mathematics it appeared to learn.