The dangers of not aligning artificial intelligence with human values - Bark Sedov

The dangers of not aligning artificial intelligence with human values

In artificial intelligence (AI), the “alignment problem” refers to the challenges posed by machines simply not having the same values ​​as we do. When it comes to value, at a basic level, machines don’t really get that much more sophisticated than understanding that 1 is different from 0.

As a society, we are now at a point where we are starting to let machines make decisions for us. So how can we expect them to understand, for example, that they should do this in a way that does not involve prejudice against people of a particular race, gender, or sexuality? Or that the pursuit of speed, efficiency, or profit must be done in a manner that respects the supreme sanctity of human life?

Theoretically, if you tell a self-driving car to navigate from point A to point B, it could just make its way to its destination, regardless of the cars, pedestrians, or buildings it destroys along the way.

As Oxford philosopher Nick Bostrom outlined, if you tell an intelligent machine to make paperclips, it could ultimately destroy the entire world looking for raw materials that can be turned into paperclips. The principle is that it simply has no idea of ​​the value of human life or materials, or that some things are too valuable to be turned into paperclips unless specifically taught to do so.

This forms the basis of Brian Christian’s latest book, The alignment problem – How AI learns human values. It is his third book on AI following his previous work. The most humane human and Algorithms to life. I have always found Christian’s writing enjoyable to read but also very insightful as he is not concerned about bogging down with computer code or math. But that certainly doesn’t mean it’s easy or non-intellectual in any way.

Rather, his focus is on the societal, philosophical, and psychological implications of our ever-growing ability to create thinking, learning machines. If anything, this is the aspect of AI where we need our best thinkers to focus their efforts. After all, the technology is already here – and it will only get better. Far less certain is whether society itself is mature enough and has sufficient safeguards to make the most of the amazing opportunities it offers – while preventing the serious problems it could bring from becoming a reality will.

I recently sat down with Christian to discuss some of the issues. Specifically, Christian’s work looks at the encroachment of computer-based decision-making into areas such as healthcare, criminal justice and lending, where there is a clear potential for them to cause problems that could end up affecting people’s lives in very real ways.

“There’s this fundamental problem … which has a history that goes back to the 1960s and to MIT cyberneticist Norbert Wiener comparing these systems to the story of the Sorcerer’s Apprentice,” Christian tells me.

Most readers will be familiar with the Disney cartoon in which Mickey Mouse tries to save himself his master’s work by using a magic spell to give a broomstick intelligence and autonomy. History serves as a good example of the dangers of these qualities when not accompanied by human values ​​such as common sense and good judgment.

“Wiener argued that this was not the stuff of fairy tales. That’s the kind of thing that’s in store for us as we develop these systems that are sufficiently general and powerful… I think we’re in a moment in the real world where we’re filling the world with these broomsticks, and this is going to be a real problem.”

One incident Christian uses to illustrate how this misalignment can play out in the real world is the first recorded pedestrian death in a collision with an autonomous car. This was the death of Elaine Herzberg in Arizona, USA in 2018.

When the National Transportation Safety Board investigated what caused the collision between the Uber test vehicle and Herzberg, who was pushing a bike across a street, it found that the AI ​​controlling the car was unaware of the concept of jaywalking. It was completely unprepared to deal with a person who was in the middle of the road where they shouldn’t have been.

In addition, the system was trained to rigidly segment objects on the road into a number of categories – such as other cars, trucks, cyclists and pedestrians. A person pushing a bicycle did not fit into any of these categories and did not behave in a way that would be expected of any of them.

“It’s a useful way to think about how systems can go wrong in the real world,” says Christian, “It’s a function of two things – the first is the quality of the training data. Does the data basically reflect reality? And it turns out, no — there’s this key concept called jaywalking that wasn’t there.”

The second factor is our own ability to mathematically define what a system like an autonomous car should do when it encounters a problem that requires a response.

“In the real world, it doesn’t matter if it’s a cyclist or a pedestrian because you want to avoid them either way. It’s an example of how a fairly intuitive system design can go wrong.”

Christian’s book explores these issues further as they relate to many of the different paradigms currently popular in the field of machine learning, such as: B. unsupervised learning, reinforcement learning and imitation learning. It turns out that each of them presents their own challenges when it comes to aligning the values ​​and behaviors of machines with the humans who use them to solve problems.

Sometimes the cause of problems is the fact that machine learning tries to replicate human learning. This can happen when data errors cause the AI ​​to face situations or behaviors that a human brain would never encounter in real life. This means that there is no reference point and the machine is likely to keep making more and more errors in a series of ‘cascading errors’.

In reinforcement learning—which is about training machines to maximize their chances of reaping rewards for making the right decision—machines can quickly learn to “play” the system, leading to outcomes unrelated to the one you want. Here, Christian uses the example of Google X boss Astro Teller’s attempt to encourage soccer-playing robots to win games. He devised a system that rewards the robots every time they take possession of the ball, an action that at first glance seems conducive to winning a match. However, the machines quickly learned to simply approach the ball and touch it repeatedly. As this meant they effectively captured the ball over and over again, they were rewarded multiple times – although it was of little use when it came to winning the game!

Christian’s book is packed with other examples of this alignment problem – as well as a thorough exploration of where we stand in solving the problem. It also clearly shows how many of the concerns of the earliest pioneers in the field of AI and ML have yet to be resolved, and touches on intriguing topics such as attempts to infuse machines with other qualities of human intelligence, such as curiosity.

You can watch my full conversation with Brian Christian, author of The Alignment Problem – How AI Learns Human Values, on my YouTube channel:

Leave a Reply

Your email address will not be published.