Shahar Avin on AI Governance
Shahar Avin is a senior researcher at the Center for the Study of Existential Risk in Cambridge. In his past life, he was a Google Engineer, though right now he spends most of his time thinking about how to prevent the risks that occur if companies like Google end up deploying powerful AI systems, by organizing AI Governance role-playing workshops.
In this episode, we talk about a broad variety of topics, including how we could apply the lessons from running AI Governance workshops to governing transformative AI, AI Strategy, AI Governance, Trustworthy AI Development and end up answering some twitter questions.
(Our conversation is ~2h long, feel free to click on any sub-topic of your liking in the Outline below. At any point you can come back by clicking on the up-arrow ⬆ at the end of sections)
- Intelligence Rising
- Transformative AI
- Measuring Transformative AI By The Scale Of Its Impact
- Comprehensive AI Services
- Automating Ceos Through AI Services
- Towards A “tech Company Singularity”
- Predicting AI Is Like Predicting The Industrial Revolution
- 50% Chance Of Human-brain Performance By 2038
- AI Alignment Is About Steering Powerful Systems Towards Valuable Worlds
- You Should Still Worry About Less Agential Systems
- AI Strategy
- AI Strategy Needs To Be Tested In The Real World To Not Become Theoretical Physics
- Playing War Games For Real-time Partial-information Advesarial Thinking
- Towards World Leaders Playing The Game Because It’s Useful
- Open Game, Cybersecurity, Government Spending, Hard And Soft Power
- How Cybersecurity, Hard-power Or Soft-power Could Lead To A Strategic Advantage
- Cybersecurity In A World Of Advanced AI Systems
- Allocating AI Talent For Positive R&d Roi
- Players Learn To Cooperate And Defectplayers Cooperate Or Defect
- Can You Actually Tax Tech Companies?
- The Emergence Of Bilateral Agreements And Technology Bans
- AI Labs Might Not Be Showing All Of Their Cards
- Why Publish AI Research
- Should You Expect Actors To Build Safety Features Before Crunch Time
- AI Governance
- Toward Trustworthy AI Development
- Concrete Mechanisms To Tell Apart Who We Should Trust With Building Advanced AI Systems
- Increasing Privacy To Build Trust
- Sensibilizing To Privacy Through Federated Learning
- How To Motivate AI Regulations
- How Governments Could Start Caring About AI Risk
- Attempts To Regulate Autonomous Weapons Have Not Resulted In A Ban
- We Should Start By Convincing The Department Of Defense
- Medical Device Regulations May Be A Good Model Audits
- Regulation Should Be Flexible To Future Developments
- Alignment Red Tape And Misalignment Fines
- Red Teaming AI Systems
- Red Teaming May Not Extend To Advanced AI Systems
- What Climate Change Teaches Us About AI Strategy
- Twitter Questions
- Final Thoughts
Michaël: I remember seeing you for the first time at some EA Global conference, back in 2018. You were organizing some live world play AI strategy workshop. There were 50 people in a huge room, separated in many groups and we had to take an insane amount of critical AI decisions in the labs of two hours. Yesterday, I had the honor of participating again to one of these workshops. That’s are apparently called Intelligence Rising. This time, the game had gone through a ton of iterations and lasted for a total of three hours. Thanks Shahar for coming on the show. It’s a pleasure to have you.
Shahar: Thanks for having me, look forward to the conversation.⬆
Why “Intelligence Rising”?
Michaël: Your workshop is the best way to think about all the things. So let’s start with that. Why is it called Intelligence Rising?
Shahar: We wanted something about intelligence because it’s a story of how ever increasing more Powerful Artificial Intelligence Systems shapes the world around us and gives very powerful actors — whether they are companies or nations, tough decisions to make. And if the decisions don’t go well, things go very badly for all of us. So we had to have intelligence in there. We wanted something about the rising tensions, the increasingly powerful systems in the background. We also spent a lot of time trying to find a better name but failed. So we are stuck with Intelligence Rising for now. ⬆
What Role-playing Powerful Actors Teaches Us About Governing AI
Michaël: When did you start developing it? I remember playing it the first time 2018-ish. Is it about the time when you started developing it?
Shahar: That’s about right. I’ll have to go back to my emails, but I think around 2017, I was visiting the Future of Humanity Institute in Oxford for different reason. And I was pulled into a side room where a couple of researchers there were playing with the idea of maybe doing role play, to kind of play through a particular future scenario. And I really enjoyed it. I got to be Demis Hassabis, the CEO of DeepMind for the period of an hour and a half. It was a terrifying experience. I failed very badly and I walked away thinking, there is something here. And I spent some time playing around with the initial version of it, which was just a bunch of people in living room, pretending to be very powerful actors. We had very crazy stories come out of that.
Michaël: Some people were actual crazy actors if there was Demis Hassabis in the room.
Shahar: That is true. And then we got to the point where it felt like there was something there. I applied for a small amount of funding from the long term future fund, which was successful. And I used that money to pull together a little team of volunteers and design the rules, and mechanics, and basically, five years later, we are with what you have seen yesterday.
Michaël: There’s one of the actor in this role play game as being not DeepMind, but Alphabet. Can you tell us a little bit more about the other actors in the game, maybe?
Shahar: Of course. So one thing we did is you no longer play a person. You play an organization, which is pretty willed when you’re doing role play. But it’s a similar thing you would get in a model UN setting, right? You come and you negotiate as the Iranian delegation or the American delegation. So it similar to that, except that we have both states. In particular, we have China and United States as two countries. That pretty much everyone agrees are at the forefront of the race for very advanced Artificial Intelligence in terms of budget, in terms of talented individuals, in terms of large data centers, in terms of big companies that can do things. And then we also wanted to put in companies that are at the forefront of developing this technology, because we were seeing the vast majority of advances coming from private industry, not from government labs, not from academia. And so we have one tech giant from each of these countries. We have Alphabet for the US and we have Tencent for China. ⬆
Measuring Transformative AI By The Scale Of Its Impact
Michaël: “Advanced AI technologies” — that’s a very normal word that people from outside of the AI community can understand. But in the game, maybe there are other words that are brought up such as Transformative AI technology. I don’t know if AGI is there, but in this podcast we talk a lot about AGI. Do you want to give us a sense of what the kind of key concepts are in this game, the technologies that are being developed?
Shahar: Happy to. There’s a really nice paper by Jess Whittlestone and Ross Gruetzemacher that explores the idea of Transformative AI and radically Transformative AI and how we might break apart some of these notions. The idea that you can measure kind of how Transformative a technology is by kind of the scale of its impact. So if you think about nuclear weapons, they radically changed warfare, but they didn’t change the way people go around shopping. If you think about electricity or agriculture, they changed pretty much everything in our lives. And we can imagine even bigger changes than the changes brought about by the industrial revolution or by agriculture.
Shahar: These are widely different scales of transformation. The game is aimed at exploring two particular promising avenues within Artificial Intelligence — Language Modeling, the ability to have models understand human language and reason through language and the other is the Reinforcement Learning, the ability for models to make long term plans in response to rewards and autonomously come up with agential behavior. And through both of these we explore how much more advanced versions than what we have today could lead to radical transformations up to kind of the end of the technology tree, where we have visions of AGI on one side and non-agential but still radically transformative technologies on the other side. ⬆
Comprehensive AI Services
Michaël: I think the other one is called Comprehensive AI Systems.
Shahar: Comprehensive AI Services is a particular term that has been used by Eric Drexler in a report called Reframing Superintelligence. Nick Bostrom published Superintelligence, a book very influential in 2014. And Eric disagreed with a bunch of the framing, particularly he doesn’t like us thinking about the future as “the AI” — this one monolithic agent god like thing that takes control of the entire future. And if things go well, gives us utopia and if things go bad, turns everyone into paperclips. Instead, he would like us to think about lots and lots of AI services, doing things in the economy, maybe up to the point where any economically meaningful task can be done by an AI system, but still the highest level of control is done by humans. Maybe humans being assisted by very powerful systems, but it’s all largely being done with systems that are relatively low on agenthood or agentic planning.
Michaël: So it’s a lot of narrow AI that are coordinating together to automate the industry.
Shahar: Lots of narrow AI, where the design of how to break a task into lots of modules has been done by humans or maybe humans assisted by even more narrow AI systems to create something that’s very powerful and stable. These narrow AI systems are not colluding with each other. They’re passing information in a way that this was designed by humans to keep them safe and controllable.
Michaël: Right. So they’re just communicating to an API for relevant information, but not communicating… Or they’re not like single agents coordinating to escape. And the other scenario is basically an agent capable of doing planning that would’ve emerged from pushing RL to their today-limits.
Shahar: That’s right.
Michaël: Which scenario do you think is the most likely?
Shahar: The thing is that they’re not mutually exclusive. You can have a world in which most of AI that’s been deployed is services maybe for advanced services, while some labs are still building very advanced Reinforcement Learning agents. There are very good first principles argument for why taking humans out of the loop or having models that are very good at long-term planning and a gentle action and reasoning gives you a benefit. In economic settings in cases where the value of time is very important or in places where humans haven’t picked up useful concepts for thinking about the domain.
Shahar: On the other hand, I think there is also growing consensus that these systems are particularly dangerous. And then there a question of, would we be able to coordinate around the danger and move away from not building those systems? I’m much more excited about people who are trying to evaluate and quantify to what extent a system is agentic, a system is planning, a system is situationally aware. Because if we have those benchmarks than we could say we don’t want systems that are very high on those benchmarks. We could talk later about how you might go about doing it. ⬆
Automating CEOs Through AI Services
Michaël: What about the economic incentives to build general systems? For instance, if I’m a CEO and I want to know what’s the best plan for the next six months. I want to have access to something that’s able to do planning for me to provide a better economic value. If I’m the CEO of Microsoft, I might be interested in having a better-than-human decision-maker and planner.
Shahar: As I said, these are very good first principles arguments. But then you also have to think, “What does the CEO of Microsoft actually do on a day to day basis?” And the answer is lots of meetings in which plans are being presented with as much context as possible to drive a go, no-go decision, or how much budget to put into this thing. The plans are very responsive to the world. They’re very responsive to new information coming in the regulatory environment is changing. Technology is changing. Teams come up with new products and you’re just trying to make very rapid decisions about what to push forward and what to slow down. All of this with some strategic understanding of what’s happening. There could be many services that could distill the information more accurately, helping the communication work better, giving an overview of what’s happening, maybe do some reasoning about how plans could work, all of which would make the job much easier and more impactful without completely replacing the CEO with a black box model.
Michaël: Is the idea that the CEOs will not completely be replaced in a Comprehensive AI Services project where it will just have less time to dedicate to some things. And these job will be mostly automated. Will that still be a human in the loop?
Shahar: Yeah. The Comprehensive AI Services vision is that there is still a human in the loop, but the human is significantly supported by very advanced systems. I think it’s particularly carves out of space for something like CEO and kind of high power elected officials to maintain control of kind of the hierarchies that fall below them, but make everything in the chain better so that both they have better options available to them. If you have services for design, services for data collection, services for technology development, then there is a much bigger scope of activities they could do and also bring them much closer to optimal decision making. You can have a service that points out biases in your reasoning without delegating the reasoning to an agent. ⬆
Towards A “Tech Company Singularity”
Michaël: If most of our economy is automated by AI, we could expect an increase in GDP growth. And that brings us to the concept of Transformative AI. This is something that might happen before we get General Intelligence or Comprehensive AI Services. Do you want to give a definition of Transformative AI. I know people disagree about this precise vision and there are different ways of phrasing it, maybe use the one you think is more useful for today.
Shahar: I actually really like this concept from Andrew Critch, the idea of a “Tech Company Singularity”. Before we get the full technological singularity, where we have these agents optimizing the future for us, we’re going to maybe briefly go through a period where a very large tech company that has access to very advance capabilities and a big footprint on the world. They have access to data and skilled personnel, can peak an arbitrary part of the economy and in the space of months or few years, become a dominant monopolistic actor within that industry, if they so desire. So Microsoft, at some point, decide that, “Oh, we’re gonna go into agriculture.” And then within a year, the vast majority of food produced is an output of Microsoft and able agricultural AI assisted services.⬆
Predicting AI Is Like Predicting The Industrial Revolution
Shahar: The idea that this could happen is because AI would enable you to make better decisions and open up new options that didn’t exist before. If your AI is working within R&D. I think the way to think through this is go industry by industry and think how much innovation and progress have we seen in that industry over the last 10, 20, 50, 100 years. And what it would look like to see this amount of progress again, in the space of a year, in the space of a few months. And then that’s continuing up to the point where we either hit some marginal returns in reality itself or not. That is radically transformative. It’s very hard to think through it because it’s similar to thinking prior to the industrial revolution, what would the post-industrial world look like? It’s an exciting exercise. You should also assume that you’re going to get it very wrong.
Michaël: What was the main difference before and after the industrial revolution? I guess there were two initial revolution, right? Are we talking about the two of them?
Shahar: You could pick any of them, but the introduction of trains completely changed people’s perception of space and time. What is near? What is far? If you, all the generations before you, mostly grew up and died in the same village and only people who were very rich had access to things like horses and horses are kept at a certain speed. And that was the speed by which you could move around. No one could move faster than that. Or if you think about engines. You had windmills before you had some innovation with agricultural technology, but the kind of boosts that they would give you are nothing compared to what you have in the world with factories. The rise of cities, the rise of services, the rise of factories, the idea that the material welfare in your house is going to radically change over the space of a generation or two.
Shahar: All of these were not quite imaginable concepts if you lived before the Industrial Revolution. I’m trying to think, what do it mean for the kind of imaginary furniture that people have access to, radically changing every few months, maybe every few days, right? We have all of the stuff that humans have pulled together on the internet and we keep generating more and more of this and are now trying to imagine the world, but this is like a tiny fraction of all of the content that humans have access to. And all of the content is designed to be stimulating and empowering and creative and so on.
Michaël: I think we’re already pretty close to it with DALL-E 2. As a person, I use this tech, hours a day and I just find it so compelling to have beautiful image. For me, it’s like a new tool, right. And maybe in a few months we get something GPT-4 and the API and people get to play with it. We are still not at the Transformative AI case obviously, but we could get some new API or new products every few weeks. We already have a bunch of breakthroughs that will happen every day or five times a week in April, May and June. I think we’re getting to the point where we have new stuff to keep us entertained every day. But yeah. It is probably very different to have actual AI shaping the economy.⬆
50% Chance Of Human-brain Performance By 2038
Michaël: So to think about those kind of things, people care about the number of parameters an adult-human could have. And then they try to think about how many parameters would you need to be simulating a human brain? How much parameters do you need? How much compute would you need to train optimally this model? And at one point they think, “Oh yeah, this is pretty close to what evolution has done with us or how many neurons human brain has.” At this point, do you mostly agree with those models?
Shahar: I mostly defer to those models because the people who have been building those models know a lot more — both about machine learning engineering and about computer science and about the cognitive-biological angles that they’re looking at. I think that the way they reason through this is pretty reasonable. They’re pretty clear about the assumptions and if their assumptions are wrong, we could get very different timelines. I think some of the more kind of influential and rigorous people who’ve done studies in this space are talking about 2038 as maybe the year when we have 50-50 chances of getting human brain level performance out of our systems.
Shahar: In 2038, my daughter will be 18. So that’s pretty hard to square or live with. But that doesn’t mean it’s not going to happen. Just because my basic outcome is to imagine my daughter going up into a world where the hardest thing I will have to deal with is that she’s on TikTok. It will in fact be the case. That’s probably what she has to cope with is way more crazy than me growing up and having to deal with the internet and social media, which my parents didn’t have to deal with.
Michaël: Yeah. Now, kids have to deal with Instagram, TikTok, Snapchat, and there’s a new one every three years. In a decade, it would be one every week. And you have to find a new and new thing to ban every night. A new app to ban. One consideration that happens when considering those Transformative AI systems is that humans could be out of the loop, not understand what’s going on. And if the systems end up being agenty, they might take decisions that human don’t really want. There might not be any CEO in the loop, but just some AGI.⬆
AI Alignment Is About Steering Powerful Systems Towards Valuable Worlds
Michaël: And that brings us to the concept of building safe AGI or safe AI in general. And also aligning an AI. What do we mean by AI alignment? I think in your game, you mostly talk about safe AGI because it’s maybe an easier concept. In the part scenario you mentioned like Comprehensive AI Services, at what point does the AI go rogue? At what point does the AI has a negative impact in the world? Why do we need to make it safe?
Shahar: I really like the idea of alignment, which is basically we build infrastructures and organizations and ultimately individual, very powerful systems that kind of are aiming to steal the world into the kind of world that we would find valuable and not just find valuable because of some story that’s being told to us, or some find valuable because of electrodes in our brains. But find valuable in a way that we generally do find valuable, right? That we would choose to live in those worlds without manipulation, just by being involved, that this is the world we live in. And that’s a very hard target for a bunch of reasons that people can go and look up why AI alignment is hard. But this particularly applies to systems that are driving a lot of the steering of the future through the processing.⬆
You Should Still Worry About Less Agential Systems
Shahar: If you instead have narrow, less-agential systems, you should still worry about that because it might be the case that systems are locally pointing at things that are more or less aligned with human values and where the future goes is just the sum total of all of these vectors, all of these small systems pointing in various directions. And so if all of them are pointing in a bad direction for humanity, for example, because they’re pointing in the direction for doing well and things that are easy to measure other than on things that generally capture what we like. Then even though we don’t have giant agential systems, we could still end up in a pretty bad place. Or they fall under a factor of being very manipulative, even in a very small non-agential way. There is a lot of socio-economic or technical reasons for why these systems end up kind of being deceptive or being manipulative for various reasons. Even if all of them just operate in a very local way. Not all of them optimizing the future.
Shahar: And another thing to think about is if you have this kind of multipolar, many different systems in the world, even if they’re narrow is whether they play nice with each other, right. They’re all trying to steal the world towards a particular goal in their own domain. So one failure mode is they’re all pointing in a way that’s bad for humans and other is they’re all pointing in different ways and they’re causing a lot of instability. So, which is why we’ve talked about safety in the Comprehensive AI Services context has a lot to do with stability of the system. Preventing the system for collapsing within itself.
Michaël: So even if the system doesn’t have a common goal, there’s some kind of direction that emerges from all those systems working together.
Shahar: Yeah. So one failure mode is that there is an overall emergent direction that is bad for us. And another is there is no emergent direction, but the systems in fact are conflicting with each other undermining each other. So one system is optimizing for one proxy. It generates it externality that is not fully captured by its designers that gets picked up by another system that has a bad proxy for it, and then tries to do something about it.
Michaël: Something like a Mesa-optimizer?
Shahar: So this is less like Mesa-optimizer. Mesa-optimizers is maybe you think you’re starting with a narrow system, but the narrow system is part of solving its problem creates a subagent that is very powerful and is misaligned. This is more what we have today in the way we try to govern global affairs. Ultimately, governments, companies, international organizations are all human creations, aim to capture what we value and bring to us a better future. And partly the problem is that they’re not very aligned — both with what we want and what is good for us, but also they’re conflicting. And so there are a bunch of equilibria situations between these organizations that are bad for us because we haven’t designed them well enough. You can imagine a similar failure mode though, much more amplified in terms of its consequences for everyday life. If you’ve automated a lot of these processes.
Michaël: I think we can go with two different versions of alignment. One is the Paul Christiano definition of having system do what we want. And I think in your game, there is a Coherent Extrapolated Volition. You’re laughing because you think it’s silly.
Shahar: I don’t think it’s silly. I think that is probably the place where in our game mechanics, we have been least cautious, where there’s the largest jump. Where you could just one turn realize that Coherent Extrapolated Volition. It might be a thing that you want to do. And two years later you’ve just solved it and it’s fine. I think it’s not going to be like that. I wish it were like that. In fact find a technical socio-technical way of having machines that want what would be good for all of humanity. If humanity had a long time to reflect about what it actually wanted. ⬆
AI Strategy Needs To Be Tested In The Real World To Not Become Theoretical Physics
Michaël: I guess most of your time as a researcher is spent considering all those different scenarios and think what would be the best strategy, sometimes as a game developer or a player, but sometimes even as a policymaker. What’s the best strategy to optimize for the good futures where your daughter lives for more than 18 years. And this is like the whole field of-
Shahar: That would be nice.
Michaël: … AI strategy, I believe. Do you have a partial definition of AI strategy instead of just trying to optimize for good futures?
Shahar: Ultimately, what you want out of it is trying to find pathways to good futures. But I think AI strategy also says something about how you go about it. It’s about figuring out who are the actors, who actually get to make decisions that bring about futures and understanding the relative power between them and what motivates them, what incentives they have, what limits they have on actions, what information they have. And you also want to understand crucial strategic parameters. So what are the things that are driving the story. Advances in how much computing power is available for models is a very big part of what’s driving the story. And it has been for a long time. If most law had not been a thing, if we had fifties type computers today, our world would be very different in a very meaningful, deep way.
Shahar: How important data, is how important synthetic data is, how do we train up people to operate in this space, what are the external regulatory parameters that matter. Figuring out what matters more and what matters less, figuring out which actors matter more and which actors matter less. And this is not a normative judgment. This is just trying to imagine, you’re playing through all of this future, the dimensions of thinking who has the biggest levers to pull, what do those levers look like under what conditions could we get to pull them and would they work?
Michaël: It’s somewhat like a physicist trying to figure out, like, what are the key parameters of the system if I move this one? Can I hold this one and move this? And just get the same result. You try to see what are the key considerations or key dynamics and then maybe ask the UK government or US government, “Hey, can you please reduce the compute of AI companies because we think it is important parameter in the future”.
Shahar: Yes. And of course that’s great because that exactly shows you how this analogy leads this way of thinking to fail. Which is physicists, get to isolate things in a lab and explore them very carefully and then put them in a box that still isolates the effect such that you get reliably the output that you want.
Michaël: Sometimes we get coronavirus outbreak that comes from a lab as well.
Shahar: Most likely from an animal. But yeah. Hypothetically could also have come from a lab. But that’s a good example. You move from inanimate things to animate things. The amount of chaos increases the ways for things to failing unpredictable way increases. When you go from just biological things, to agential reasoning things, whether it’s humans or whether it’s advanced AI systems. Your ability to kind of do this isolation of causes and consequences becomes much more limited. ⬆
Playing War Games For Real-time Partial-information Advesarial Thinking
Michaël: And in your game, there are actual humans playing. So instead of having a simulation, a game where you play with a computer, it’s a roleplay game It translates better to reality than a simulation like in robotics. It transfer better if you have an actual robot grabbing things and something in simulation. I think that’s a good transition to just focus on your game. Why do you think Intelligence Rising help us think about AI strategy?
Shahar: There is something special about war games and what war games are good for, which is specifically thinking about strategies in an adversarial context. So when one group wants one thing and another group wants a different thing, and that’s not quite the same thing, they come up with plans that undermine the other team’s plans, and then they come up with mitigations for those things. And they come up with mitigations for those things. And that adversarial tension between the two teams can rapidly escalate to very creative solutions.
Shahar: It’s the iteration on the iteration, on the iteration on the iteration on the original naive plan that is quite how to do as a researcher just sitting in a room, trying to think about all of these potential iterations. And so it’s very generative in this way. Even if you then need to go and take the outputs from sensor simulation and be like, “Okay, how much of this was realistic?” What levels did they reach for in an attempt to deal with the strategic situation?
Shahar: Now we have to go and do a bunch of research about would that be possible? Does it fit within the current incentive structures, legal frameworks and so on. So from a research perspective, it’s a really interesting way of drawing on this adversarial creative process. It is also a way of drawing on the more human element.
Shahar: So there is something that we don’t do very well because we condense something like two decades into the space of three or four hours. So you don’t fully get the experience of what it would be like to be in the position, being a person, having partial information and having all of the biases that a human would. Some war games go much closer to that. And I think we did it a little bit with the thing we had yesterday. We ended a bit early, so I told you, “Okay, stop thinking strategically. And instead, just go and embody a group of people having to make a very difficult decision. In a space of 10 minutes, you have 10 minutes.”
Shahar: And that is a very useful research tool because people are aware of the game theory, but they’re also trying to inhabit the feels and emotional and personality aspects of these decisions that we know from history matter a lot. If you look at history, it’s not a chain of game-theoretically optimal decisions all the way. There are a whole bunch of just humans being humans. And so it’s reasonable that the future will also be a bunch of humans being humans, even with large strategic decisions to do with where AI should go. ⬆
Towards World Leaders Playing The Game Because It’s Useful
Michaël: When we play the game, we don’t actually try to maximize natural equilibrium. We just go for how to win. And then we the other guy winning and it’s like, “Oh, I don’t want this guy to win so I’m just going to attack him and spy on the US government or something.” So I think it captures a little bit of human bias but maybe not everything. What is the audience you have for this game? I would expect politicians or people studying international relationships to be the key audience for having them think about how to regulate AI. Maybe you want people to think about comprehensive AI services more like people in the AI alignment space. What would the ideal audience in the future?
Shahar: So those are great guesses. Our ideal audience is probably to play with the actors depicted in the game. So the key decision makers in China, the US, the leading tech labs, the leading AGI labs, because then you switch from it being a research tool and much more of preparatory simulation tool. So it’s like, “Here are some really big decisions. If you get them wrong in real life, it’ll go quite badly for us. Please play through those decisions in a safe environment until you learn to do the cooperate-cooperate.”
Shahar: We are quite far away from that. Partly to do this well, you need to have a lot of trust in the product. You need to have a lot of very similar to it in order to present things that matter to these decision makers. And so we’re now building our way up to the most of our success has been with people at the undergrad post-grad level, either with a technical background where we want to let them explore more of the range, get them to think about safety, get them to think about the importance of geopolitics and maybe they want to shift a little bit attention into that space, or people who are in the governance space, but we want to sensitize them to the idea that there are issues at stake with AI governance that go far beyond bias and privacy. Not that these things don’t matter, but maybe in five or 10 years time, we may say, “how we can back to the days when our biggest problems were with bias and privacy.”
Michaël: So the middle ground between bias, privacy and general agents, something that could be transformative and important. Autonomous weapons, for instance.
Shahar: For example. Yeah. So we try to get them to in a space of four hours, live through humanity, mostly worrying about bias, privacy, maybe online persuasion, then having to think about autonomous weapons systems, large scale unemployment, very powerful tech companies. And then also thinking about radically transformative. If this goes badly, we don’t have humanity anymore. But if this goes well, we are in a age of radical abundance. And then they go back and are like, “Yeah, previously 100% of my time was spent on the privacy and bias issues. Maybe I’ll pay a little bit more attention to what’s coming down the pipeline.”
Michaël: I’m not sure if governments are aware of all the radical transformations in the next 20 years. I know Putin said something like, “The country that has AI wins the world in the end.” I know Obama in 2016 had some meetings about general intelligence, but I’m not sure for them, it’s something very far away. I’m not sure when they will be actually interested in this technology and be like, “Oh yeah, this could be transformation me to invest a little bit more in that tech.”
Shahar: I wouldn’t necessarily go for the actual president at this point, partly because they have to think about all of the things that matter. And it’s whether you are lucky or not, whether they pay attention to the right thing. What you want to do is find a senior advisors whose job it is to keep track of this particular domain so it would be the office of size and technology policy, or it would be some other branch of government that is tasked with thinking about technology regulation, futures, foresight, and get those concepts into there in a way that makes sense. That connects them to the research, that connects them to the concerns, that connects them to advisors who can help them navigate the space. And increasingly also co-create governance solutions. ⬆
Open Game, Cybersecurity, Government Spending, Hard And Soft Power
Michaël: In your game, mostly we’re trying to pretend we’re Demis Hasaabis in the room. What actions can you take? Is it something where you have a few action that you peak in the card or is it something where more of a role play game and you need to say like, “Oh, we can invent crazy actions.”
Shahar: We’ve deliberately gone for an open game. As I said, the initial version of this was just a bunch of friends in the living room coming up with stories, right? There was no constraints whatsoever about what you could do. And we just wanted to make a story that responds to whatever people said before. It is almost like a group creative fiction session. And then it was like, “Okay, but we need some controls of this.” So I introduced some dice to say, “Okay, you try to do this thing. Let’s roll a dice and see how well it goes.” So you just have a range from, yes, this goes very well. Yes, but there’s a complication. No, this doesn’t happen where maybe you get something slightly different. No, and it goes much worse than you planned. This is just a very simple mechanic from game design.
Michaël: And this is something where a game designer like yourself can calibrate if people are doing things that are unrealistic and you’re like, “Okay, you want to do this, but please world is 20 sided dice.” And you need to have at least 15 or something to the direction.
Shahar: So first of all, I would say I’m not a game designer, but this is the role of a game master, the facilitator of the session. My job there is to make sure that people stay on futures that are interesting but still plausible. And so I can use the dice in this way. We have moved in the game design much more to open half closed game. So you saw in the game yesterday, we have a technology tree. And your interaction with the technology tree is fully codified, but you can’t come up with crazy new ideas for what AI could do. What AI could do is predetermined by the game in terms of cards. And you just unlock those cards through predetermined actions, which is about allocating a challenge and rolling dice.
Shahar: And we did that because we wanted to keep the AI technologies relatively fixed and explore the policy responses of those actors to these advancing AI technologies. We kept this, the other part, the policy part completely free. But even though it’s completely free, we still wanted something that brings people to a common or repeatable or legible story. And so we have a bunch of parameters or characteristics the way to think about them. We have some attributes for the players and also some attributes for the rest of the world. And those numbers go up and down. And the way they go up and down is me having a conversation, a negotiation with the players where they say they want to do something crazy. And I say, “Oh, it sounds like you want these numbers to go up and those numbers to go down. Roll this number of dice to see if you will succeed.”
Michaël: So those parameters are cyber security, budget, high power, self-power?
Shahar: Yeah. We have five. And you can see this is a very naive, big picture of view of what matters. You have budget and budget is basically just your free flowing ability to invest more in some other domains that matter, or bribe everyone into controlling the future. Then we have AI talent. And AI talent is a proxy for the inputs that drive innovation in AI. Because we really care a lot about how fast things go. And we imagine those are the actors that are driving the progress. And so we need to have some way of measuring how fast things go, who gets to go first, who gets to go second, where do they allocate the resources? This is what determines whether we get capabilities first or safety first.
Shahar: So it was really important to have that, the AI inputs category. And then the other three are powers, because we wanted to see how would having some amount of quite a lot of power in the world translate into actions that shape AI futures. And we basically broke this down into hard power, which is your ability to shape how atoms are organized in the world. I mean, historically, and especially in war games or in terms relations, you would think about it as military power. But we also wanted to introduce the ability of maybe you do some other very large things in the world that ought to do with how atoms are arranged.
Shahar: Soft power, which is a giant category that captures basically anything to do with human beliefs, norms, institutions, all of the way up to legislation, international agreements, treaties, tax regimes, or it could be things that are much smaller, like how well do people think about you? Do they like your brand? And then the last one was cyber power, which is your ability to control digital bits, whether there’s zero one, whether you can read someone else’s information, this part I’m not wanting you to read it. Maybe you can sabotage the project. Maybe you can prevent others from doing the same to you. So we break the world into atoms, beliefs and bits and try to translate all of the actions into operating in mainly one of those domains.
Michaël: And to build AGI, you need to have very good ideas, right? You need to have very good innovation, a bunch of research. There are different win conditions. One is you build safe AGI first, and I guess the other one is interesting as well is when you have one of those attributes that is much higher than the other players. So you get the leading cyber security country.⬆
How Cybersecurity, Hard-power Or Soft-power Could Lead To A Strategic Advantage
Shahar: Yeah. We picked up this idea that’s been flowing around of a decisive strategic advantage, right? Getting to the point where you as an actor are so far ahead that you’re not only just a hegemon, but you effectively get to call the shots for everyone. And we made it deliberately hard to get to be that way, but also you could leverage progress in AI even before AGI to try and boost those capabilities. So you can imagine hard power hegemon or hard power design advantage. You could find that you have managed to leverage AI either directly by applying AI to military power or indirectly. AI generated lots of money for you and you reinvest that money in military capabilities to get to the point where you are basically unopposed. Maybe you find a way to use AI to undermine other people’s nuclear deterrence.
Shahar: And at that point, you effectively could take over the world militarily if you wanted to. And probably you would leverage that to just having effective political control of the world, or sufficiently so that things go your way. And you could imagine similarly in the soft power domain. If people just have so much higher favorable views of view as an actor, as opposed to all of these other actors who are also vying for attention and religions, then you just get mass defection. You basically say, “We want this actor to take political control over all these other actors because we believe in their way of doing things.”
Shahar: And extreme control in the cyber domain, you could imagine just getting into everyone else’s systems and shutting them down or stealing all secrets. So each of these, if you get to the point where you are so far ahead of anyone else and probably radically transformative technology is the only way to get there compared to the wallet we have now. Then this is an outcome that we want both to explore and to incentivize the players to explore as part of the game.
Michaël: I feel the military power, the hard power is the most convincing to me. I see a world where you have a bunch of autonomous weapons and you can just basically do whatever you want. The soft power, I don’t fully buy it. I feel like even if the US has a huge soft power. Okay, if basically every country accepts everything, then okay, then you win. Is it just being the political leader that everyone respects?
Shahar: It’s very hard to imagine what the scenario would exactly look like, but you can imagine you just become an extremely trusted actor that people resonate with the values that it represents. It looks very competent. It looks to everyone from their own position, whether it’s an organization of individual, just sees themselves being part of this empire or this world arrangement. And then you start suggesting major governance reforms within your own country. And then those governance reforms also get replicated in other countries because they just look like best practices. And then you start suggesting changes to the global system, maybe a new version of the UN, everyone agrees that your judgment about the UN failings is a pretty accurate description of where it is in fact failing. And your suggestion for a system to replace it is just a really good proposal. And then slowly, slowly you get to the point where the way political power is distributed and organized around the world shifts in a way where you are effectively in control. ⬆
Cybersecurity In A World Of Advanced AI Systems
Michaël: What about cybersecurity? I feel like if people are careful about their AI systems or their AI research, they could just have the older code into some servers, always off. In my opinion, even if you have very good cyber, it is not given that you can just hack into every company and get their code, right? Because there’s the stuff people know in secret and people thought in public. I don’t know can a very powerful actor just hack to DeepMind and still their latest research? How feasible it is?
Shahar: So again, with all of these things, it’s about how far ahead you are compared to your adversaries. So we already know that making computer systems secure is very hard. You both need to get the system itself secure so there are no bugs because if there are bugs that they could represent vulnerabilities that they can be hacked. Some of them may allow remote hacking, but some of them require direct access to the systems. This can get mixed up with ultimately these computer systems operate on physical substrates. There is radiation that you might be able to pick up, other signals like temperature signals and so on. All of these exploration of side channels is a fascinating domain.
Shahar: In particular, if you’re building very large, complex, opaque systems, from a system-engineering or system-security perspective, you’re just significantly increasing the way things go wrong because you haven’t engineered every little part of the thing to be 100% safe, and provably and verifiably secure. And even provably and verifiably secure stuff could fail because you’ve made some bad assumptions about the hardware.
Shahar: So I can imagine the world where your ability to model the other person’s system may be all the way down to the hardware and weird effects that happen at the molecular atomic level of the chips, coupled with your much better understanding of how the high level intelligence capabilities cash out in terms of flow between point operators, leave gaps that you can exploit as an adversary.
Shahar: But yeah, it requires you to be much farther ahead than the runner up. An average person on the street, if the NSA really wants to get into their phone, they probably could. And the gap that I’m imagining between the average person on the street and the NSA is much smaller than the gap. I’m imagining the NSA and an actor that has access to very advanced AI capabilities that is able to really dig down into all of the vulnerabilities of those computer systems.
Michaël: This makes me think about the book, Superintelligence, in which Bostrom imagines having a seed AI, something very smart in a box, and then you try to think about different limits in input-output that you should give the system so that we’re sure that it stays in the box. And if the system is able to understand physics better than you, then you can just take the electromagnetic signals around or the heat things. I could imagine that world if you’re very advanced in terms of AI or other tech, you could just go through different hacking channels than just pure remote-terminals. Makes a lot of sense.⬆
Allocating AI Talent For Positive R&D ROI
Michaël: How do you get to those advanced system? Do you need to take your budget and invest in the tech tree? Do you need to put your AI researchers and say like, “Hey, please do more research, produce one paper now.”
Shahar: One thing that all actors do all the time is allocate their AI input. We call it AI talent, but it represents a more complex thing than that, into various projects. And those projects could either be basic research and development, do a thing, publish a paper, advance the cutting edge of what we could do with AI or they’ll products and capabilities. Products are ways of converting these advances into money. And capabilities are ways of converting these advances into one of these powers. So you can imagine if I have a very powerful language model, my ability to spin up narratives that make me look like a more aligned, competent actor and make other actors look bad increases.
Shahar: So there is a cloud that is called automated impersonation propaganda that increases yourself power by one. And as the technology advances, you get clouds that are more powerful and operating different domains. Separate from that. You also can take free form actions that say, “I invest my money and just hire a bunch of topnotch, cybersecurity experts. Or I buy this small company that has this expertise.” The kind of things that you see in the world. But the game is very much designed such that if you really want to get to the advantage, you need to lean into the advanced AI technologies to get them.
Michaël: And to get this AI technology, you put those researchers into a lab and you say like, “Now you need to do some research.” And they can either do public research that attracts more people, and then you can just have a compounding interest in how many researchers you get. The more researchers, the more public research, and then the more visibility you have, or you can just go full on trying to get AGI in stealth mode.
Shahar: That’s right. We’ve very deliberately built a mechanic where there are positive returns to investment in AI research and development. So the more basically what you do into AI, the amount of AI inputs you have increases either because you are seen as a leading lab and so everyone wants to come and work with you or because you have found ways to reapply that AI to the productivity gains of your researchers. You measure to automate some parts of the research process, you’re able to design more efficient compute and so on. And this goes back to the idea of intelligence rising, right? This is as you increase your intelligence, your ability to follow increase your intelligence increases. It’s interesting because it’s a story that you can tell account story to about diminishing returns. This very much explores the future in which things do speed up and accelerate.
Michaël: The only thing that I find unrealistic is that you can just allocate researchers to some places to do research, but at some point, you can be broke and not have any budget and still produce a lot of research. So yeah, I would feel that a scenario where you need both money and talents to do research, because today, you need millions of dollars to train GPT-3. I believe a scenario like this would be more realistic.
Shahar: I mean, this is getting into the nitty gritty of the game a little bit, but I guess the way to best think about that kind of budget power in the statistics of the games is how much extra do I have. It is not that the powers that you have and maintaining the position of power is free, it’s just that this is part of your annual budget. And so what we keep in the budget column is how much extra profit, extra revenue, extra money sloshing around the bank do I have because I developed this amazing product. Or because I found a way to increase my market share. And how can I then reinvest this extra into one of my other lanes? So yeah, it is very much on the assumption that if you have 12 AI talent tokens, somehow in the background, you found a way to pay for them. And the budget column is how much do I have on top of that to further invest in this or to invest in some other power.⬆
Players Learn To Cooperate And Defect
Michaël: So it’s like the profit Google makes that they can invest to build a better brand or attract more talent by doing a nice pitch or a nice video. What is it like interaction between companies like Google and other actors we mentioned, I think China and the US. Can the US come to Google in your game and say, “Please, can you give us all your talent or please can you get all the research you have?”
Shahar: Not only they can, many players who are in the game have tried to do that. And we very much encourage teams to talk to each other, explore deals, share information, make threats. And initially, that was all right. It’s only part of the way the game is played out that you have your breaks between the turns, and in those breaks, you both discuss with in your team what you want to do, but you can also discuss with other teams and coordinate plans. And so you might get some joint actions.
Shahar: Over time, we saw that the relationship between actors, it’s not about sharing information or enacting plans, but really start changing the world. An example of that would be a state nationalizing a company, or a state banning a company from being able to operate in a particular way, or a state creating a public private partnership with a company, or two companies deciding to create a joint venture between them and where the joint venture has to do safety research.
Shahar: And so as game master and these who developed the mechanics behind the game had to come up with rules for how to change the statistics in the world based on this. So we actually have some home rules now for what happens in nationalization. And it’s very different if it’s forced nationalization or cooperative nationalization. What happens if you create a public-private partnership? So for all of this, we have rules. I don’t tell the players those rules because I don’t want to steal them towards nationalization. But if they come up with that as a free form action, I know how to resolve it mechanically.
Michaël: Because at the end of the day, you just want to generate good statistics for your research. You just don’t want people to play the game randomly.
Shahar: I want people to think hard about what they want to happen, but I also want them to come up with strategies that I haven’t thought of in advance. The more information we give them about the action space, the less creative they get to be.⬆
Can You Actually Tax Tech Companies?
Michaël: I think the action that people think about first is something about taxing tech companies. Yesterday, I found that we had someone who thought about what a Republican Senator or president would do. And he was like, “Oh, I am this part of president. I’m going to tax Alphabet.” Do you think this is a reasonable scenario? How much Amazon does pay in taxes to the US government? Is it even legally possible to increase that insane amount when we get to transformative AI.
Shahar: Taxation of multinational-companies and in particular tech multinational-companies is a very complex issue on which I am not an expert. A lot of it falls to questions of jurisdiction about closing loopholes. It’s all about, do you charge at a place where someone uses the technology in place for it, or do you charge at a place where technology was developed? Can you move the IP for the development around to a tax efficient jurisdiction? It’s actually a really big international story. The attempt in recent years, somewhat successful to create a framework for global taxation that creates a global minimal bowel for taxation, I think a 15% for corporate taxes. I think it was spearheaded by the OECD, but there were many other participants. And of course the smaller countries who are a big part of the economy being a tax haven. So they need to be brought into this with some creative ideas. It’s a fascinating space. Whether you could, in the space of two years, realize that the particular actor is becoming very powerful and then slap attacks on them, that seems much less realistic.⬆
The Emergence Of Bilateral Agreements And Technology Bans
Michaël: Are there elections or treaties between countries, in the game, or there are political events that should happen over the years as well?
Shahar: This didn’t happen in your game, but in at least some of our games, especially if something really bad happens halfway through the game, then the states suddenly become aware of this and they want to do something about it. And they might start negotiating a treaty to control the technology or keep the worst versions of it from reaching the market. We’ve seen games where China team and US team have explored the ban on autonomous weapon systems. We have seen games where these two actors have explored bans on the highest level of technology, the transformative ones until there is enough public information that enough safety research has been done. And you can argue about how realistic it is, but it’s really interesting for us to see players explore those spaces. And then also see if they do negotiate these treaties, do they then defect from them. ⬆
AI Labs Might Not Be Showing All Of Their Cards
Michaël: Another thing I find interesting is that you can just spy on companies and spy on countries with your cyberattack or just intelligence agencies and gather more information. You can also go full stealth mode and try to build AGI where other people think you’re just not ahead of the curve. And I think that’s one thing I find unrealistic where today most of the research, most of it, I guess a lot of the AI research is open source or on Arxiv, right?
Michaël: Mayve the most state of art AI research is now starting to go private. But I would expect it to be hard to go full private for three years, because those kind of advances, we need to build on top of other public research. So I guess there’s a limit how fast you can go on your own with 100 or 300 researchers.
Shahar: I am not sure. So for one thing, I don’t know how much of the most cutting edge research today is public. I would not be confident that it is. It is very easy to look at all of the stuff that is public and see a lot of it, and infer from the fact that you’re seeing a lot of public research that all research must, therefore be public. I don’t think that is a correct inference to make.
Michaël: We have some evidence that at least some research at Google Brain or DeepMind was not public in the past two years, at least for the Chinchilla paper. I got the information from different sources that they didn’t publish. They didn’t find Chinchilla and publish it the next month. They probably waited between six months to a year. At least that’s what’s from my conversation with Ethan Caballero and other people on the podcast. That’s one paper, but there maybe other papers going on at Google Brain where they replicated their results from GPT-3. Maybe there are other papers before that showed they got replicated the capabilities and maybe pushed it forward, but the kind of biggest example so far has been PaLM, and PaLM was somewhat two years after GPT-3. So I guess they’re not showing all their cards. Yeah, that’s my guess.
Shahar: Yeah. So, that would also be my guess that they’re not always showing all of the cards. It’s not always a calculated decision, but there is a calculated decision to be made of, if I have a result, do I publish or not? And then what goes into the calculation is, if there is a benefit from publishing. It increases your brand, it attracts more talent, it shows that you are at the cutting edge, it allows others to build on your result and then you get to benefit from building on top of their results. And you have the cost of, as long as you keep for yourself, no one else knows it, and you can kind of keep on doing the research. Or you might think that there are harms, and so you keep the thing in the back in a way to find mitigations for the harms before you publish. That’s a very common practice in cybersecurity research, for example.⬆
Why Publish AI Research
Shahar: And now, of course these are not fully rational agents, it could just be a researcher that is really proud of their results and want it to be public. And might even go against what the institution is telling them to do, because they wanted individual credit or they just really believe in democratizing the research and so on. But all of that goes into the machinery of whether this particular paper gets published right now or not. You could have a hundred papers, but not all papers are created equally. Some of them could just be way more meaningful than others. And if you’re sitting on top this really meaningful paper like… maybe it’s the SayCan paper, that just opened up a whole new domain. I think that paper was published pretty much immediately after it was discovered.
Shahar: But you could imagine the world in which it wasn’t discovered, in which it wasn’t published. And maybe it takes a while for others to develop it, and in the meantime, you’re building on top of that. Especially, I think, as you get to technologies that have very clear, very large impacts on how economic power or other kinds of power is distributed in the world, the incentive to just sit on it and see how you can maximize the value of it increases. So we might be going to all bit more secrecy.
Michaël: And at some point you might also want to use those powerful technologies to generate more revenue. If you have access to something like DALL-E 2 or GPT-4, we don’t really know GBT-4 is coming this year or not, or even not coming at all. But imagine you have a very good API and people use it a lot, then you can just use the money. Otherwise you just sit on something and you use it for more research, but you don’t get more money for it. I guess one other way of how startups operate is showing off very cool technology. And then telling people like, “Hey, we are very close to get to AGI, you could invest in us and you can get some caps profits from when we build AGI.” I feel like they do two different strategies.
Shahar: So, certainly if you are a small actor that has made a breakthrough, there is a lot of incentive for you to show what you have… At least with some very powerful actors, whether there are investors or whether there are other companies that you want to be able to access the resources. It’s not clear that you’ll make the thing public necessarily. If you’re a large actor that it is in a position to exploit the technology through an API or by improving internal processes, then the incentive for you to make things public decreases significantly.⬆
Should You Expect Actors To Build Safety Features Before Crunch Time
Michaël: Private research on Coherent Extrapolated Volition or Anomaly Detection, or other more long-term AI alignment research is not incentivized by economic product, this is incentivized by having system do what we want in the long term. So I think in the game you can build either of them, and it’s just for winning at the end. You’re just planning how to build safe AI at the end, so you need to instrumentally do it so that you’re not dead. Do you believe companies are strategically developing safety measures because they think this, or because they want to look good? Is there something in the dynamics of the game where you try to make it harder to build safe technology?
Shahar: So we suddenly make it the case that safety oriented technologies give you less benefit. They don’t bring you closer to winning on the size with specific advantage, they unlock less products and capabilities. So in that respect, they’re less appealing from “I want to win the game” perspective. We make it pretty clear, and maybe more clear than the world is aware, that if you get radically transformative technologies and they are not safe, things will go pretty badly for everyone. You can’t win with an unsafe, misaligned, radically transformative. It’s just, everyone loses. So, there’s something quite interesting that happened in the game yesterday which is, Alphabet was doing all of the secret research, trying to get it to AGI or Comprehensive AI Services.
Michaël: I don’t know who was playing Alphabet, but they did a great job.
Shahar: They did an amazing job. By the way that game ended with the world going to ruin in the end. But they did secret safety research. And if I’m trying to put myself in the shoes of those players, why would they do that? I think one of the reason is, there is certain types of safety or alignment research that you can only do when you have access to very advanced models with advanced capabilities. And so by revealing the fact that you have solved this problem, you are also revealing that you have these very advanced capabilities in your back pocket, and that might not be something that you want to do. ⬆
Why Tech Companies And Governments Will Be The Decisive Players
Michaël: Dynamics between tech companies and governments, I think, are interesting is that governments are less ahead in terms of tech. So, they start with a different level of AI talent, which is kind of where we are now. So, Alphabet or, I think the other company is Tencent, are the ones pushing forward the field and doing dangerous things. And then the governments are like, “Oh no, we need to regulate this.”
Shahar: That is very much by design. One fundamental underlying mechanic of the game is positive potential investment in AI, so you start with some amount of talent and the more you leverage the talent to grow your research, then you get more talent and then you kind of go up the exponent. Another thing that was kind of built from the very basics of the game is benefits accrue where you are and harms accrue far away from you and slowly come back to you.
Shahar: We also have this mechanic of concerns or negative impacts of technologies along the way, and they always start impacting far away. What we have called Global Stability, especially impact outside of the countries that are hosting those tech companies. And after harms overflow the rest of the world, they start impacting citizens in those countries, but they don’t affect the companies. So, it is very much the case of the dynamics, that it is up to the states to pick up the tab and respond to the harms created by technology that was created by the companies. And it’s only at the very end that the harms come all the way back to the people who are developing the technology. ⬆
Regulations Need To Happen Before The Explosion, Not After
Michaël: Do you think it’s realistic to have governments regulating AI, since the pace at which regulation happens is maybe in months or years, where as for AI technology, we were talking about weeks or days or months or so. In the case, it becomes transformative, when we’re talking about the point where it’s a new breakthrough, every few weeks, that impact our economy. So at the point where we reach transformative AI. And your daughter is aging, sorry, I keep bringing it up, I find it interesting. Do you really think that the US government will be able to do anything about it, except from seizing all the assets?
Shahar: Here’s an analogy, if you want to regulate an explosion, you don’t regulate it as it’s happening, you regulate it before it’s happened. Similarly here, if you get to the point where the technology is radically transforming your world on a month by month or week by week basis, it’s too late to do this regulation, unless the regulators are also sitting on top of very powerful AI that helping them keep track of what’s happening in regulation. We need the different regulatory processes.
Shahar: But if you know that you’re going into a world where such an exponential explosion is coming, then all of the regulatory work needs to happen beforehand to either delay it happening so that you have more time to figure out what to do about it, or steer it so that it’s more controlled or otherwise shape how it’s going to happen. In the game, a lot of the policy space to act is before radical disruptive technology happens, once radical transformative technology happens, the game is over. And it’s not over in the sense that, we now live in utopia and there are no decisions to be made. It’s that the way decision making will happen is probably not presidents and CEOs thinking carefully through things and deciding what to do. We will switch to a new domain.⬆
Early Regulation Could Become Locked In
Michaël: Ideally you would want to regulate in advance, but tech is famous for being able to bypass regulation. So what are the strongest arguments for AI governance being able to accomplish anything at all?
Shahar: First of all on the, “When do you regulate?” There’s a concept that I think lots of people should know about, which is called the Collingridge dilemma. Which is, when you want to regulate a technology or steer a technology towards a good outcome or any big change that is predicting in the future, if you try to do it too far in advance, you don’t have the details of what the change is going to happen, and so you don’t have a good solution. If you do it too late, then the thing is pretty much locked in and you don’t have much ability to change it.
Shahar: So trying to find the sweet spot in the middle, where you know enough to regulate, but it’s not too late to change how things are going to go is the game of AI regulation, AI governance. And you can make the game easier by putting in the regulation early that they can scale up or get adapted as you go along. You could have lots of people who are broadly in agreement that we need something, and put them in places of power. And so when it comes time to regulate, you have lots of allies in lots of places. You could generally teach people the fundamentals of why cooperation is good and why everyone dying is bad.
Shahar: And so there are all of these things that can help you play this game, but this it’s still a very hard game. Partly it’s about looking at historical examples. So, our world is a very regulated world. We tend to see the failures, but we forget that none of these digital technology would exist around us without standards, and interoperability. We wouldn’t be able to move around if transport was not regulated and controlled and mandated in some way. If you don’t have rules, standards, norms, treaties, laws, you just get chaos.
Shahar: And you could have a very anarchic vision that, imagine everyone does their own thing and you just some things end up working, and so you just have very weak norms about what we do, but it is still a form of governance. You still adopted some way of doing things, because it turned out to work well or because whoever has the most power decided that this is how things are going to go. And so everything else is just iterations on who are the powerful actors who get to shape the way things go. What is the design space for mechanisms to shape how things go? What is the coordination? So, even if it’s just down to tech companies figuring out which products to have and how to put them out in the world, it is still a kind of governance. It’s just a self-governance by companies.⬆
What Incentives Do Companies Have To Regulate?
Michaël: So there is a way of doing short term or medium term regulations that we’d a very positive impact in the scenario where most of the impact comes from medium term technology like autonomous weapons, and we still need to regulate those. Yeah, it kind of helps shape the future because the companies are going to make the decision at the end, but they will have some constraints from the short term or medium term decisions. Is that basically what you said?
Shahar: I guess, the big point I was trying to make is, you always get governance, you just get different forms of governance. I mean, even just letting companies do whatever they want is a form of governance, it’s just not a very ideal one. But I think there are also reasons for companies to realize that this is not an ideal form of governance. You see how a lot in the social media space, social media companies are calling for governments to come in and regulate. And you can argue about whether it’s honest or not, but some of it is, if they make a decision about what content should or shouldn’t be on the platform, then they are on the hook if they get it wrong. If they are implementing something that was decided by a government for them, then they can point on the government and say, “Oh, we have to do it this way because it’s illegal to not do it this way.”
Shahar: And I think similarly for advanced AI systems, there are questions around values that companies might just not want to be on the hook for making themselves. There are questions about mitigating harms and risks that companies might not want to be on the hook themselves. And companies, particularly companies that are aware of what these very capable systems can do, might be very worried about what their competitors might be up to. And so they could be interesting for them to help develop standards, regulatory measures, to make sure that… So for example, Anthropic really cares a lot about things going well, and they might worry that Facebook AI research might not care a lot about things going well, not because they’re bad people, but because they don’t take seriously the kinds of risks that Anthropic takes very seriously. So the Anthropic would have an interest in creating a regulatory environment that mandates everyone to take the risk seriously so that it captures Facebook.
Michaël: And sometimes it’s not only about the people there, but also what’s the structure, what’s the design of the company, what’s the short-term policy. And maybe even the higher level of abstraction would be like, “Oh yeah, maybe the problem is capitalism, having a for-profit startup that needs to give dividends to investors.” And I guess, we’re saying new forms of governance, this is maybe out of topic for this podcast, but something like DAO, in crypto where we have different structures for who decides to do a new law or a new a new process on the chain. ⬆
Why Shahar Is Terrified Of AI DAOs
Michaël: Is this something you would think about, the different structure of governance that could be the most helpful for building safety? And I know OpenAI has put something in the structure that if the most powerful actor was to build AGI before them, they would possibly go and help them make safe AGI?
Shahar: The assist clause, yeah. But quick point is that I’m terrified of AI DAOs. I think there’s a lot of interesting, really interesting governance, particularly governance innovation in the crypto space and that decentralized governance space. I just think that so much of our know-how about how to do good governance is from people building interpersonal relations and trust and scaling debt in various ways to make good outcomes in the world that’s going to fully anonymous, kind of formalized voting on chain. Just will need to catch up a lot to all of the soft measures, interpersonal, not fully explicit measures that we have for governance before it can overtake them in terms of good things for the world.
Michaël: Maybe the problem is the thing we have, brain. The problem is the meat part, and maybe if we have something that is able to understand human values or human preferences and make decisions for us. And maybe if the governance part relies less on emotional things and me talking to you and smiling to you and convincing you and more on actual Nash equilibriums from DAOs or something, and people in putting their money. And don’t you think in a world where everything goes crazy and where it’s impossible to actually communicate outside of Twitter, it’ll be better to have actual rules for voting that are implemented in code.
Shahar: I agree that if you took away any ability to interact in person and you just gave everyone Twitter, then building a bunch of game theoretical ideal machinery on top of it is strictly better. But we don’t live in a world where we only have Twitter, we have a new world, we can all fly to a conference and share a meal together and talk about how to have the future go well, and people can have a conversation like…
Michaël: How many meals did Putin have with other leaders before he invaded Ukraine?
Shahar: Maybe not enough. I mean, that is not to say that, this is not enough, the kind of sharing a meal together is not enough for getting a future Google, whether it’s also the underlying mechanics, perceptions of fear and so on. But I also don’t know that we would’ve had a very good on chain solution that would have prevented the Russian invasion. I mean, the world had the vast majority of Russian foreign assets in escrow, in foreign banks, which they confiscated the moment the invasion happened. There was not enough to deter it.
Michaël: Do you have any thoughts on the Charter from OpenAI, that assist clause? I talked about this on another podcast that is not online yet, but I believe with Robert Miles, I talked about it where there’s nothing specific about this clause and there’s nothing that defines exactly what we mean by a company being close to AGI. So there’s nothing like activating this clause?
Shahar: Yeah. And maybe this links a little bit to thinking about DAOs, it’s the idea that the vast majority of legal constructs are not fully specified. There are little bits in the details that you leave for human judgment, and that is arguably a problem because it leaves you for exploits. But arguably it’s an asset because it means that you don’t have to know in advance what things would look, you just trust that there will be smart, well-meaning people at the time of making the decision, who can cash out what particular terms mean. Arguably, a lot of what the legal system does is also kind of arbitrate what is a reasonable interpretation of a bit of text in a particular context. ⬆
Toward Trustworthy AI Development
Concrete Mechanisms To Tell Apart Who We Should Trust With Building Advanced AI Systems
Michaël: That’s where AI is useful, is in interpreting the words from humans and trying to understand what we actually meant. So that’s why I think having something like Coherent Extrapolated Volition or something that understand the words from humans values could help us in regulation. So you wrote, actually, I think one or two papers about regulation in a broad sense, mostly about how to build trustworthy AI systems. Do you want to explain a little bit what you mean by trustworthy in that regard?
Shahar: Yeah. So first of all, I didn’t write it, I was a co-author on the report. I’m a huge fan of these many multi-author reports, mostly because I think these problems require lots of different perspectives and domains of expertise and people who are in different organizations, and lived different walks of lives to come together and hash out a product that is meaningful and arguable and truth seeking, but also impactful. I pay attention to it. So for this particular one I teamed up, yet again, with Miles Brundage, we’ve teamed up before, but also with three other co-authors to guide the process of editing this report called, “Toward Trustworthy AI Development”
Shahar: It came out of a workshop similar to previous exercises this, where we invited a bunch of people to San Francisco and we asked them, “what concrete mechanisms could we put in place to help us gain more trust in AI developers?” And we didn’t just want to trust them, we want to trust where trust is merited, organizations are trustworthy. So basically, how do we get concrete mechanisms that allow us to tell apart those people who we should trust with building very advanced AI systems, because we know they’re going to do things well, and what does it mean for them to do things well. And those that we really shouldn’t trust and maybe were regulatory intervention is required to ban the most of them or increase the scrutiny over them so they can become a trustable organization.
Michaël: What do we mean by trustworthy here?
Shahar: Trustworthiness is really interesting. People often differentiate trust in artifacts in systems, and trust in people and processes and organizations. When you think about systems, you really care about competence and reliability. Will this thing do the thing it is meant to do? If it fails, how badly will it fail? Will it fail in context that I care about, and you can either trust or not trust. You trust that your car will walk well and not explode. When you come to people, you still have that, the competence and the reliability aspect. So you want people who reliably are going to fix your broken car, but you also care about the motives and what’s driving them and do they infact want what is good for you? Because if that is happening, then you can delegate more decisions to them.
Shahar: In our report, we actually only focused on the capability side. So we want to know how do the people who make AI systems. How can we create information, create externally available information that they are doing the right things when it comes to developing AI systems? That they are following best practices, that they are thinking ahead of time about harms, that they are doing everything that is reasonably possible to mitigate those harms. And how do we structure an ecosystem of information so that external actors, users, regulators, competitors, can then use this information to get to a point where they trust or don’t trust this particular developer?⬆
Increasing Privacy To Build Trust
Michaël: What’s the purpose of trusting a particular system? Is it mostly to have the system be more interpretable? To be sure that the system doesn’t use personal data? Is it because we care about the system not misbehaving in some way? What is the actual thing we need into the system to trust?
Shahar: Great. Maybe I got a little bit to the context. We now live in a world where we have a pretty good idea or many actors have put out lists of AI principles. Those lists of AI principles are basically a recipe or an outline of, “What do we think good AI looks like?” And there are things prevention of misuse, prevention of accidents, so safe and secure throughout lifetime, privacy protecting, lack of bias, particularly bias that interfaces with social biases in really bad ways. Accountable, so that you have someone who is responsible, if things go wrong, transparent, so that you understand how the system makes its decisions. Some of these things are ultimate things that are good, some of these things are valuable to get things that are good. But we have this list of things that we want out of AI systems, and then you ask, “Yeah, but how do we actually get to the point that the system has those features or that we can trust the developer that they’re doing what they need to do to get those features?”
Shahar: So a very concrete example would be, we are getting pretty good at mechanisms for privacy-preserving machine learning. We have techniques that use various kinds of encryption, use various kinds of algorithms, to minimize the amount of information required to train a system. Or that keeps the information at its source and only takes a few relevant items. Or adds some noise to the data so that you can’t infer things about the individual participants. Or that you encrypt the computation in some way that you can still use the service without necessarily sharing all of the private stuff.
Shahar: And so once you have those standards, then you can ask, “Is this developer who’s operating in a privacy sensitive domain using the standard or not?” And you can start by just seeing if they say anything publicly, about using that standard. Or you could have an auditor come in and check whether in the development process, they’re using best practice of privacy, preserving machine learning,⬆
Sensibilizing To Privacy Through Federated Learning
Michaël: What kind of best practices or standard are we talking about here?
Shahar: Are you using Federated Learning? Are you using the differential privacy in this context? And you need a little bit of expertise to know what good implementation of these tools looks in different domains, there’s no one size fits all. But these are things that we can get to doing, partly because, as a part of the report, there are public demonstrations of what end to end projects look like that use federated learning or use differential privacy look like.
Michaël: What is federated learning for people who are not familiar?
Shahar: Federated learning is where you keep a lot of the information on the original device. So, say you want to train an algorithm on data that is collected from people’s personal mobile devices.
Shahar: One way to do it is you get all of the information from all of the mobile devices into your own datacenter and then you train a system of that and that clearly is a privacy risk, especially if that data is medical data or other kind of sensitive data. Alternatively, you can run an algorithm that kind of looks at the data on the person’s device and then only updates a few weights of the model and those updates get sent back to the centralized space. So your model still loans but you never got access centrally to all of the data that was used to train it.
Michaël: I know this makes a lot of sense for me because I’m kind of familiar to your research but I would kind of expect some AI alignment researcher hearing this podcast and think, “Wait, we started by talking about AGI and now we’re talking about something close to GDPR.” How do you connect this data regulation to maybe the long term future of building safe transformative AI.⬆
How To Motivate AI Regulations
Shahar: Perfect. So you start by saying some AI systems could be risky, right? That is already a controversial plan because most of software, at least since the 90s is not regulated, right? Most of software, the potential harms that come from general software are not considered the purview of government, maybe there are some lawsuits on the margin but largely you buy a software for Microsoft or from Google and if things go badly for you, then things go badly for you. You shouldn’t have bought it in the first place. It’s a responsibility of the user. We want to get to the point where there’s an agreement that when software or AI systems become sufficiently advanced, the hubs are big enough that governments should stop caring about it and privacy is a way to do it, today. Bias is a way to do it, today.
Shahar: And then you start carving out certain domains where AI could be harmful where governments could pay attention and then you ask, “Okay, so how should governments pay attention?” And so you build a machinery, that machinery could look like mandatory audits. It could look like standards and best practices. It could look like red teaming and all of these tools basically create an interface whereby a government or regulator, a third party actor or an industry body demands information from an AI developer to make sure that they’re dealing with risks.
Shahar: And we start with known risk. We have a lot of evidence because that’s what we can move on today and then you ask, “Okay, can I create concrete measures of these more advanced, scary stuff? So, if I’m worried about misaligned, advanced planning, strategically aware systems, can I create an evaluation suite? Can I create metrics that will tell me that this system has these properties or is at risk of having these properties? If I can do that, then I can get to a point with all of these existing machinery in the world where I require developers of new systems to pass those systems through this evaluation.” And if they don’t pass the evaluation, I tell them that it’s illegal to then go and continue to develop the system or prevent them from deploying the system.
Shahar: Now, this is a naive sketch of what would work because there was a bunch of competitive dynamics in ways you could scale the regulation to prevent it from happening but that’s the kind of initial version, naive high level story of what we’re trying to do.⬆
How Governments Could Start Caring About AI risk
Michaël: Once you get to the point where government care about risks from AI, then you could start changing which risks they care about. Right? And so if you can start measuring risks that are more of the kind of agential systems that are taking over the world, then you can start having third party scrutiny about whether new systems are scary agents that are trying to take over the world.
Michaël: So it is mostly instrumental in trying to have governments care about long-term risk. So you try to map out the kind of current risk they have in terms of data or even more targeted to how it could impact their government or political opinion of the laws they’re passing. So, do you start with something more closer to what they care about in their mandate and then that could lead them to care more about things that would happen in four years or in the next mandate?
Shahar: Yeah. So it’s about first of all, noticing that governments are now caring about AI will previously, they did not and they care about AI for all of the current reasons bias and privacy. Once they care about AI, then the game is about making that future ready. You don’t want just an ossified thing that only cares about privacy, even in the world with giant drone, swarms and highly manipulative chatbots, you want the regulation of today to take into account new risks or that the parts of government that created today’s regulation would be willing to create new regulation. Ideally you want to decrease the amount of time that it takes to update regulation to account for new risks and that values institutional designs that you could do to make that happen. ⬆
Attempts To Regulate Autonomous Weapons Have Not Resulted In A Ban
Michaël: How does that translate for autonomous weapons? So imagine, I know there are some crazy drones capable of looking for the face of Obama and then flying there and killing the person. This tech has already been here for a while. Is there any particular regulation of autonomous weapons or more event technology or is this just something people talk about but don’t actually make policies about.
Shahar: So there’s a lot of that. The main form for attempting to regulate these autonomous weapons systems had been the United Nations CCWC. I’m not going to go through all of the acronyms, interested people can look this up and also that there are people who are more expert in this domain than I am. It has not resulted in an international treaty or ban, which is what people were pushing for, but it has certainly done a lot of work in raising awareness of this as a failure mode and countries who don’t want to ban, they’ll always respond to this pleasure by building pretty advanced internal processes to say, “Well, we are going to have such drones. We are going to have such systems but they have to pass, be compliant with international law. They have to pass weapons reviews. We have to be able to show that they don’t take extreme risk. They are accountable to the users, et cetera.”
Shahar: So the people who are really driving the development of this stuff are states and very large militaries and they’re pretty happy to adopt pretty intense regulation and governance and testing and verification and validation of those systems so that they don’t get banned also that they don’t kind of lose face in the international arena.⬆
We Should Start By Convincing The Department Of Defense
Michaël: Is the US really accepting to have all these military development being inspected by a third party that is not the US?
Shahar: Third party inspections are not an option in this particular domain but what you can have is pretty detailed and a lot of resources invested in building up technical capabilities to test the safety of such systems, to verify control, to explore how they might fail and then fix that and then you get into the relationship between the Department of Defense and the contractors who would be building such systems.
Shahar: So the DoD is not effectively the regulator for all of these kinds of systems without being developed in the US and you must say, “I don’t trust the DoD.” But it’s also about the power relations in the world or such that it’s very hard for anyone else to compare the US Department of Defense to do things that they don’t want to do.
Shahar: So then it’s about applying pressure within that organization is the effective, most available regulator for this technology because you might say, “Well, I don’t really trust the US Department of Defense but I trust them more than a startup that someone started in the garage for developing autonomous weapons.”⬆
Medical Device Regulations Might Be A Good Model Audits
Michaël: Outside of the US, do you think companies would accept to pass these tests about their systems? If I’m a small startup or even a medium sized startup that has funding, would I accept to have a bunch of audits and comply to a bunch of roles about my… I know companies are really crazy about GDPR. They think it’s complete bullshit, at least in Europe. If I’m asking for even more compliances that required a system to be interpretable. I think in Europe now they’re trying to push something where self-driving cars need to be somewhat interpretable. I’m not sure about the regulation precisely but I feel we’re going in a world where we’re asking too much for system that are getting more and more advanced and more and more harder to inspect. Do you think Europe will be able to catch up to those kind of things and will actually make self-driving cars interpretable?
Shahar: So I don’t know that whether Europe would make self-driving cars interpretable but I do think that compliance is part of the cost of doing business in a risky domain.
Shahar: If you have a medical AI startup, you get people inspecting your stuff all the time because you have to pass through a bunch of regulations and you could get fined or go to jail, if you don’t do that. The threat of going to jail is a very strong motivator for someone who just wants to go on building good tech for the world. I’m much more worried in that respect about the US than I am about Europe because Europe has regulation-heavy approach to regulation, which also explains why they don’t have any very large players in the tech space.⬆
Regulation Should Be Flexible To Future Developments
Shahar: I think that the name of the game is smart regulation. You don’t want regulation that is overly built in some but you also don’t want regulation that fails to capture the really important risks and you want governance research who are very tuned into the real harms that can come from systems. The technical reasons of harms that come from these systems, be able to tell that some systems are not very dangerous and we can exclude them from scrutiny or do very light scrutiny. Whereas, other systems really are very dangerous where we need very heavy tools to interfere. Think of an analogy, when you do biological research on pathogens, you have different biological safety levels. These are different kind of levels of protection within the lab that studies.
Michaël: There are stickers saying level three, level four.
Shahar: Exactly. What do these level three level four are? These level three level four are kind of shorthand for, here’s a whole bunch of regulations that you have to pass in order to do this kind of research that is tailored to the level of risk. So the more risk you pose, the more regulation you have to pass and that just makes sense. It’s just a very reasonable way of doing things.
Michaël: It’s pretty convenient because in AI we have level four and level five, self-driving.
Shahar: True and you might even go farther than that and say in specific high harm domains or high risk domains, self-driving, medical, you could index to the kind of responsibility that say something or level of autonomy but for general systems, things GPT or DALL-E That you don’t quite know how exactly they’re going to be used. You don’t know the domain of application. You could instead index to how powerful the system is and for that, we have a batch of metrics and we can also ask, for example, how much data was used to train the system. How many parameters that the model have, how much compute was this to train it as a proxy for how powerful the system is as a trigger for more and more regulation. ⬆
Alignment Red Tape And Misalignment Fines
Michaël: Do we need an alignment tax, taxing system, taxing companies, if they’re not building aligned AI?
Shahar: I think we should have misalignment fines in the same that we fine companies for causing harms. It’s basically a way of internalizing the externalities. If you make a system that causes harm, you should pay for it and the way we do it is who fines but I also think they should have alignment red tape. So the more powerful your system is, you should be paying the red tape cost of proving that your system is safe and secure and aligned before you’re allowed to make a profit and deployed in the world.⬆
Red Teaming AI systems
Michaël: Another red thing is red teaming where it’s another method to test if that model is safe or not, you have a bunch of hackers or people are trying to push for the where your system could go wrong.
Shahar: Yeah. So read teaming comes from the domain of other military or cybersecurity but you have adversarial relationship and you pretend to be the adversary and break whatever system or whatever process you have in place. By extension, this is deliberately trying to figure out how things could go wrong with a particular system, a particular set of a particular context and once you figure out how things go wrong, you then go back and redesign the system to make sure they don’t go wrong this way.
Shahar: I think mandating that and building up a community of practice that can do it for AI systems, particularly advance AI systems is something that we need. Some people already starting to see. Many companies have internal processes that check house systems might fail but getting that to the point where we have a DEF CON for AI and then maybe even further on some courses in how to stress test AI systems, how to red-team AI systems, sharing of best practices, building up of tooling, software tooling, AI tooling, to help you do those kinds of stuff. That’s a world that I think we need to go into if we’re going to minimize the harms’ formula.
Michaël: I think there are more and more adversarial examples for self-driving cars, there are a bunch of small patches you can make to the image that would make it detected stop sign when there’s no stop sign but with the rise of language models, people can just test models like GPT-3 and find ways in which it feel misbehave. The whole thing we talk about biases, you can just do red teaming on those kind of models and see at one point, do they say something racist or anything else. I think it becomes increasingly easier to do red teaming on larger models and I see people doing this as instrumental step to build safe system, they just do a bunch of word teaming on biases or small alignment scale and they kind of think that this kind of methodology would scale up building increasingly robust and aligned systems.⬆
Red Teaming May Not Extend To Advanced AI Systems
Shahar: Some people call this kind of thing Prosaic Alignment. You might worry that when you get to very advanced systems, this method no longer works and you need different kinds of solutions and I just don’t have direct expertise in here. So I just want to be open to the possibility that this is the case and maybe you get some false confidence from doing this prosaic alignment and kind of red teaming systems that we know of and making them more aligned.
Shahar: On the other hand, it would be silly not to do it. Why not leverage the failure modes of the systems that we have today to learn how to make better systems both today and tomorrow. At the same time, we should also be putting some of our resources into really fleshing out the argument that this will not scale that at some point there will be just a sharp left turn and we will all be dead. That is a very compelling argument, you’ve already have all of this machinery that is measuring the risk of particular systems and you could amplify this machinery to say, “Okay, now we need to stop.” You already have the interfaces in the world between governments, regulators, auditors, and developers and if you convince everyone that follow progress is volunteered and the risks are too high. You have the interfaces in the world between developers and the rest of the world to stop them.
Michaël: So you need some kind of way of interpreting the model and seeing where it’s getting to the point where you think it’s too dangerous.
Shahar: Yeah. You probably want to get to the point where you’re any further progress along this way. We can no longer give you guarantees of safe behavior and that probably should trigger you not being able to run the experiment.
Michaël: I think in some sense, if we know that it’s impossible to build align AI without a certain guarantee. We’re maybe closer to something, not really but approximating something, climate change. I don’t think anyone thinks that climate change as a potential to be good. People see it as an imminent threat and they try to slow it down. They want to make it happen later. ⬆
What Climate change Teaches Us About AI Strategy
Michaël: So we talked about an alignment tax but I was just curious about if you think that the carbon tax we had for climate change, this is completely out of topic but if you think climate change regulation, isn’t an example of society successfully shaping the future of technology to a way where we don’t have any catastrophic risk from climate change. Do you think it’s a good evidence of a succeeding in regulating systems?
Shahar: I think it’s very mixed evidence. So lots of people will have different takes on this and again, I’m not a climate change regulation expert. If you zoom out massively, you have the discovery that you can get a lot of energy out of burning fossil fields and it looks amazing, right? You have cheap abundant source of power that allows you to radically transform society and get all of the benefits of that. It’s initially steam and then it’s electricity and then it’s petrol and then combustion engine and all of these amazing stuff and along the way, you’ve been creating this risk that you weren’t really aware of.
Shahar: Discovery of the greenhouse effect, the role of carbon in it, the role of humans burning it comes much after the discovery that you can burn coal and fuel on an industrial scale to do things and then you go into a long period in which there’s denial of this risk, very adversarial relationship between the people who benefit from the system as it is now and the people who want to mitigate the harm.
Shahar: But there is also not a super clear alternative. It is at the point where you have a notion of renewables, that you can have very high energy societies with a lot of growth and material wellbeing and so on. Without the impacts of burning fossil fuels, putting carbon in the atmosphere and changing the climate but you really start to have a plan of action.
Shahar: Well, talking about going to global zero becomes not just meaningful but politically viable and then there are the nitty gritty details of then how do you get a thing to go? And you have investments in green energy and renewables and you have investments in reducing the harm, the amount of carbon emitted foam burning fossil fuels and you have coordination mechanisms that try to shape the incentives around all of that.
Shahar: I think the Paris agreement is a great achievement in that respect and it arguably was better than what the game theory would suggest.
Shahar: Carbon taxes, I think are a very good way of connecting the high level value of, we want less of this bad thing to this particular company now has a financial incentive to do the right thing. You need many mechanisms, all operating in tandem to get us to where we want to go. I think it’s only a mixed success because we lost many decades and we may end up losing more decades and we don’t have that many decades to spend. The more we wait, the worse the outcomes will be.
Michaël: And we might even have less decades for AI.
Shahar: Correct. ⬆
Can We Actually Regulate Compute
Michaël: So maybe now moving on to concrete things we could do to slow down AI development. I have some questions from Twitter that I asked a few hours ago by people. So do you believe regulating the amount of compute an organization can use is a viable strategy? How would you go or not about it? Is there something similar to it that could be done?
Shahar: I’m working on one proposal in this domain. I know other people are working on proposals in this domain. I think compute is a pretty good proxy for how capable your system is. Particularly if you’re building a general system, if you’re building a foundation model and arguably later on, if you’re building something that approaches AGI.
Shahar: So the more compute is being used for a particular system, that could be the thing that is similar to how deadly the pathogen is, that determines whether it’s biological safety level one or two or three or four. So the more compute is being used for a training run. The more scrutiny there should be for that system, either during development or even before development.
Shahar: Well from the world we are in now, I think we just need to get government to start measuring that, just getting the information and thinking about what happens if an actor is trying to not share the information or skilled around showing the information. From there, we can start asking if you have the information and you know where the largest training runs are happening, what kind of risk assessment or assurances you would want to let it go ahead or not? Or how you could use that information to prepare for the outcomes of such models.⬆
How Feasible Are Shutdown Switches
Shahar: I know some people are going much more extreme in terms of, “Can we just get remote shutdown, switches into all of the hardware and all of the world.” I’m very excited about people exploring that because it’s good to know what’s technically possible because I don’t know what will be politically feasible five years from now. I know what will be politically feasible 10 years from now. The world has changed politically a lot in the last 10 years.
Michaël: So you would have some red buttons and shut down all the Google servers and we are sure that there is no AGI present.
Shahar: Yeah. I don’t know what gets us to a world in which the Chinese president has a red button for all Google servers and the US president has a red button for all Tencent and Baidu servers. Maybe nothing gets us to such a world but flashing out the technical description of precisely how this would work, if it was ever politically feasible, seems a useful thing to spend time thinking about.
Michaël: Should we force more interpretability in the training as well?
Shahar: That sounds great. Across the board, figuring out how to do interpretability that spells in the first place is great and then mandating transparency interpretability for the potentially biggest scariest models also seems a pretty good idea. ⬆
Data Is Much Harder To Regulate Than Compute
Michaël: We talked about compute, then there’s the number of parameters of your model. You could think that Transformative AI will have X amount of parameters, maybe 10 ^ 20 parameters. So we could limit the number of parameters but then you could always think of new architectures with less parameters.
Michaël: It’s hard to come up with one metric that would survive all the innovation. One is data. I think what the Chinchilla paper show is that you can use more data and less parameters and scale the models and there’s also another trend, which is that companies might be bottlenecked by data because there’s not that much public data available on the internet.
Michaël: So we were talking about private data versus public one. If Facebook or Gmail were to use all the email from Gmail or all the pictures from Facebook to train their models, they could have access to a much private data but maybe they’re not allowed to use these data. If you use all the emails from Gmail and if a language model that can spit out all the private emails that’s really possible.
Michaël: We might be bottlenecked on public data and they might not be able to train on some private data. Do you think we could then regulate the use of private data? We can say, “Oh, you can only use public data and this way, we’re sure you’re not going to build AGI.”
Shahar: It’s an interesting space to explore. I think data is much harder to regulate than compute because compute is a physical object. You can quantify it. If you have one GPU sitting in front of you getting a second GPU just next to it is pretty much impossible. You have to go back to the GPU factory. Whereas if you have a bunch of data here and you want a copy of it on a folder next to it, it’s basically free.
Michaël: Now, you have the cloud, right? So if open AI partners, whereas Microsoft, Azure, I’m sure all the Azure servers are not in one place. They can just run AGI.exe on remote server in Taiwan.
Shahar: So if AGI.exe can run on one GPU, then we’re dead anyway. Right?
Shahar: So how large of a dedicated cluster do you need for AGI.exe? And it’s an empirical question, right? And might also be a moving target but that shapes the kind of interventions you can have on compute, whether it’s through monitoring or whether it’s through restriction of access.
Shahar: Restricting access to data is much harder. Verifying compliance with data restrictions is much harder. There are ideas of how to do it. In fact, ideas of how you can leverage, how well to kind of get verification about data sharing policies that how to do it will be great to get them but also the world clearly has enough data to train human level general intelligences because babies are born into the world and they experience the world and that they become human level general intelligences. Maybe it won’t be large language models and so maybe it won’t be in the next, I don’t know, five, 10, 15 years. Maybe it’ll take longer but ultimately we can get enough data from the world to train AGIs. ⬆
How Will Humans End Disputes In Governance?
Michaël: Some people say you just train on YouTube. You predict the next frame on YouTube. You have a bunch of videos on YouTube, much more than you have text. In your view how will humans end all dispute in AI governance? Would we have AIs in the loop? And then we need to have AI making some decisions and humans as well negotiating some things.
Shahar: So I’m really bullish on not building agential systems until we figure out all of the other stuff and so in my ideal version, which I agree maybe is not very realistic. You just have really good sanity and perception of reality for all actors negotiating.
Shahar: You just have pretty deep understanding on the needs, the motives, the preferences of the person you’re negotiating with. And AI helps you with it, right? If I’m the president of the US and I have just a deep understanding of the entire Chinese economy and I can come in, I can say, “Look, this is just a clearly good outcome for your economy. If we go and coordinate in this particular way, right? You can do trade. If you have a very good understanding of where your party is, what they want and by jointly leveraging technology, you can make a much bigger future for both of us.” Then the option for cooperate-cooperate becomes bigger and so I think just getting this information value, understanding what the other person needs, understanding solutions that can work for both of you. Drexler has again… I think I really like his idea. He has this notion of paretotopia, which is instead of fighting for your show of the cake, you just make the cake much bigger, in which case that the adversarial dynamic has decreased significantly.
Michaël: So you just negotiate over the future light cone of humanity, expanding into space.
Michaël: I think that’s the best ending we could have. Thanks Shahar for coming on the show. Do you have any last plug or word for the audience about your work or whatever you think about?
Shahar: I guess if we invite to research convenings, it would be great to bring you in. We are very into interdisciplinary and diverse projects but also if you think Intelligence Rising is an interesting thing that you would like to play, reach out to the team or reach out to me.
Shahar: We are currently working on building a new product. There is much more tailored to the needs of a particular sets of audiences, investors, tech, regulators, company, policy teams. So if you think you would want to work with us to develop such a product, please reach out.
Michaël: It’s a great game I can recommend. Yeah. Thanks. ⬆