Robert Miles on Youtube and Doom
Robert Miles has been making videos for Computerphile, then decided to create his own Youtube channel about AI Safety. Lately, he’s been working on a Discord Community that uses Stampy the chatbot to answer Youtube comments. We also spend some time discussing recent AI Progress and why Rob is not that optimistic about humanity’s survival.
(Note: Our conversation is 3h long, so feel free to click on any sub-topic of your liking in the Outline below, and you can then come back to the outline by clicking on the arrow ⬆)
- The Stampy Project
- AI Progress
- How To Actually Reduce Existential Risk from AI
- Technical vs Policy Solutions to AI Risk
- Is Safe AI Possible With Deep Learning
- Why Rob Cares About Mesa Optimizers
- Probability Of Doom
- What Research Should Be Done Assuming Short Timelines
- From Philosophy On A Deadline To Science
- How To Actually Formalize Fuzzy Concepts
- How Rob Approaches AI Alignment Progress
- AI Timelines
- Bold Actions And Regulations
- The World Just Before AGI
- Producing Even More High Quality AI Safety Content
Michaël: Robert Miles, you were not on the show before. I’ve started watching you videos in 2017. You were just starting your channel at the time. You had done some very cool videos with Computerphile that had done super well, like millions of views.
Rob: Like half a million views, but yeah.
Michaël: Just half a million views?
Rob: I mean, maybe if you add up all of them together, yeah.
Michaël: And in the comments, you were like, “Oh yeah, I’m starting a new YouTube channel. Please subscribe.” And on your Twitter profile, the pinned tweet is still something like, “I’m starting my own YouTube channel about AI safety. Tell your friends if you have some.” Since that tweet half a decade ago, you’ve accumulated 100,000 subscribers by publishing a total of 40 videos. And it’s been a blast to see your content evolve throughout the years. I’m honored to have you, the OG AI safety YouTuber, as a guest on the Inside View. Thanks, Rob, for being on the show.
Rob: Thanks for having me.
Michaël: Yeah. So the most important thing is that I have a package for you. And this package is about t-shirts.
Rob: Oh, wow.
Michaël: I’m wearing one, but I also have one for you. This one’s a bit big, but yeah.
Rob: “The scale maximalist.”
Michaël: “The scale maximalist.” And I thought you might like it. If you want to wear it in one of your videos.
Rob: Wow, okay. Have you ever seen me in a t-shirt?
Michaël: Not yet.
Rob: Has anyone?
Michaël: Not yet.
Rob: Yeah. Wow. And then, on the other side, “The goalpost mover” of course.
Michaël: Yeah. I think it says, “The goalpost mover.”
Rob: Yeah. I like it.
Rob: Thank you. Thank you so much. I will… Yeah, I don’t know when I would wear it, because I always wear what I’m wearing now.
Michaël: I think the best is when you go to ICML, NeurIPS.
Rob: Ah, if, but yeah. Thank you so much. Appreciate this.
Michaël: No worries.
Rob: I’ll put this down here. ⬆
How Computerphile Started
Michaël: So yeah, I’ve started watching your videos on Computerphile a few years back, and I think it made sense to just go through the origin story of your channel. So yeah, you were doing those videos about corrigbility, some are about the grand future of Artificial General Intelligence, and yeah. How did you get up to start your own channel? What brought you to this path?
Rob: Yeah. Yeah. So I started making videos with Computerphile just because, a kind of random coincidence, actually, right? I was a PhD student. I didn’t have a big strategic plan. I was a PhD student at the University of Nottingham, and the way that Computerphile operates is, Sean goes around and just talks to researchers about their work. And, at the time, when I bumped into him in the queue for a barbecue, because free food, PhD student, I had just been reading a bunch of Yudkowsky and I was like, “Yeah, I got something I could talk about. Something kind of interesting.” And then that video did very well because it was, I think, a lot more dramatic than a lot of the stuff on Computerphile at the time.
Rob: It’s interesting stuff, but pretty mainstream computer science. And here’s this guy with insane hair talking about the end of the world or whatever. But I did… I tried not to be too weird. That was me trying not to be too weird, trying to express clearly the concept without the dramatic sci-finess being the focus, because I think people just so immediately go to science fiction, and it’s just so unhelpful, because it puts your mind in completely the wrong frame for thinking about the problem. So I try and be real about it, like, “This is the actual world that we actually live in.”
Michaël: I think that’s what people like about the videos, is that you don’t need to make a bunch of assumptions. It’s just using common sense.
Michaël: And they’re like, “Oh yeah, this is a problem.”
Rob: Yeah. Yeah. It’s a problem. And yeah, it sounds kind of crazy, but the world is actually kind of crazy. ⬆
Three Strikes And You Are In
Michaël: How did you end up reading a bunch of Yudkowsky before going to PhD?
Rob: Oh, yeah. I used to have, back when my mind was faster and I could focus on things for longer, I used to have this rule where I had a three-strikes-and-you’re-in rule where, if I stumbled across three different pieces of work by a given author, all of which were excellent, I would read everything that that author ever wrote. I came up with this rule because I had a Kindle and I was very excited. This was 2009 or something, right? And having a Kindle was very exciting. I can carry all my books with me. I can read all the time. I can read anything I want. All the you know… And so, I was consuming a huge amount of written material. And yeah, so this rule triggered almost immediately on Eliezer Yudkowsky, and so I just mainlined everything that he wrote between 2016 and 2010. There’s an e-book that Ciphergoth put together.
Michaël: Is it “Rationality: From AI to Zombies”?
Rob: No, no, this was in 2010 that I did this, so it was… It’s the material that would become that, but unedited, unabridged, in chronological order. And I was reading this for ages. And then, at a certain point, I was like, “I’ve been reading this for a while, and I read fairly fast, and I’m still doing it.” So I did a word count check, and it’s like a million words.
Rob: I pulled a paperback off my shelf and measured it and counted the pages, and whatever the number of words on a page of my Kindle, and I figured it out that this book, if it were a physical book, would’ve been two feet thick. Yeah.
Michaël: So what happened is, in 2019, I actually printed the entire sequences.
Michaël: I went to a printer shop and I… It was, I think, two things, that thick.
Rob: Sounds about right.
Michaël: And I tried to read it from start to finish, because people on LessWrong were complaining I didn’t read the sequences.
Rob: Yeah. And yeah, I read the sequences, but the thing is, I don’t know. That makes it sound like, “Oh, I did this impressive intellectual feat to read all of this text,” but it was so… It’s like popcorn. Every little blog post, I was like, “Oh. Yeah, right. That makes sense, and I hadn’t thought about it that way before. And this is a useful tool or a useful concept that I can see myself applying productively in the future. Next one.” And I mean, they’re not all bangers, but they’re not that long if you go through them all. I actually still recommend it. I mean, I would recommend now, I guess, getting an edited volume, but I can’t vouch for those. I haven’t actually read them. But it could use some editing, I guess. Yeah.
Michaël: Do you think it’s about AI safety or about just how to think better?
Rob: It’s both.
Michaël: I think that one of the main goal was to lead more people into considering AI safety. But I don’t remember a lot of the blogs being about AI particularly.
Rob: Well, a lot of them are secretly about AI. They’re about… There’s certain types of mistakes that people make when thinking about AI, which it turns out are mistakes that they make in a bunch of other places as well. And so, if you can correct that mistake more broadly, you improve the way that people think about AI as well, as a side effect, which, I mean, obviously was the intended main effect. But yeah, I think it helps in a bunch of areas.
Michaël: And so, what did happen between the moments you read those things and, let’s say, the moment you started your YouTube channel in 2017? Did you just read LessWrong from the side, or did you think about those things on your own? ⬆
From Learning to Teaching Science
Rob: Yeah, I thought about them, and I thought about what I wanted to do about the problem. And I guess, always, from an early age, I thought… When I was a child, people would ask me, “What do you want to do with your life? What do you want to be when you grow up?” Right? And I would say, imagine I’m seven years old at this point, I would say, “Well, the job that I’d probably end up doing probably doesn’t exist yet.” Because I was a little smart-arse.
Rob: Yeah. And people would be like, “Oh.”
Michaël: “Meta take.”
Rob: “Smart kid. Yeah, this kid’s going places, whatever.” But this was 100%, eh, 90% an attempt to avoid thinking about the question at all, right? I was just trying to dodge the question.
Michaël: “I’m from the future.”
Michaël: “YouTube hasn’t been invented yet.”
Rob: Right. But, and the thing is, it’s kind of unfair how completely correct I was, right? Like, yeah-
Michaël: Buy Bitcoin.
Rob: That didn’t… Very precocious seven-year-old. No, but if people would… People would then, because I was obviously dodging the question, people would press me further and I’d be like, “I don’t know, some kind of scientist maybe, or maybe a teacher.” And I thought, “Well, being a teacher is like…” I really always enjoyed explaining things to people, but only people who want to learn. I feel like a lot of being a teacher is class control, right? And if somebody is actually interested in the thing that you are explaining to them, it’s a wonderful experience. And if they’re not, it’s the worst. And so, I didn’t really want to be a teacher. But yeah, some kind of teacher/scientist, question mark, question mark, job that doesn’t exist is about as good as you could expect to have done that far ahead, right?
Michaël: “Question mark, question mark.”
Michaël: Now you’re both a teacher and a scientist.
Rob: Yeah. Closer to… I mean, much more teacher than scientist, but science teacher.
Michaël: And if people want to listen to you, they just press play. And if they’re bored, they just go to another video.
Rob: Yeah, exactly. And yeah, the people who are there are people who want to be there.
Michaël: Was it your first time explaining things to people, or were you still explaining concepts to other people? Because with Computerphile, you did maybe five videos explaining concepts, right?
Rob: Well, actually, so the first video I made for Computerphile was about cryptography.
Rob: It was about public key encryption. And that was just because I had stumbled across a really unusually good explanation of how public key encryption works, and I was like, “Oh, this is how it should always have been explained. I’m going to do that on Computerphile.” And then, afterwards, I was like, “Oh, also, all of this Yudkowsky is pretty interesting.”
Michaël: “By the way, read the sequences.”
Rob: Yeah, exactly. ⬆
Why Rob Started Youtube
Michaël: On another podcast you said that one of the reasons you wanted to be a YouTuber, or that you didn’t see too much risk in being a YouTuber, is that you want the best explanation for a concept to be, on YouTube, be the first one. So you thought that you might have a chance, a shot to give the best explanation you can give for a concept. And that’s one of the reasons you’ll explain them a bunch of time on videos, is that you want to give the best explanation or something that the second guy on YouTube would not be able to pull off that easily.
Rob: Yeah. Yeah, I hold myself to a high standard on this kind of thing where I feel like there’s definitely a thing on YouTube and the standard incentive structure where it’s like produce a lot of output, regular output, high-frequency, one video a week. Maybe some of them will be shit.
Rob: And that’s the normal thing, right? Please the algorithm, whatever. Whereas, in my mind, if I’m going to make a video about a subject, then I kind of imagine later on, if somebody comes along and says, “Oh, I’m curious about such and such concept, what would you recommend?” If I would recommend anything other than my video, why did I make that video, right? I want it to be the best available sort of introduction to the idea. And that can cause problems with perfectionism and whatever, but I think a… Is that true? Probably a majority of the videos that I have made about a topic, I consider to be the best intro explanation of that topic that is available, certainly the best on video. There are better explanations in text, but I think it’s also real… If an idea is important, you want it to be out there in a bunch of forms. You want it to be podcasts, you want it to be videos, you want it to be text, books, blog posts, tweet threads. You definitely… It’s really undervalued, I think, to recapitulate ideas in new media.
Rob: Everybody who’s writing or producing stuff, I’m over-generalizing, many people, they want to be original. They want to be novel. They want to get credit for having had a good idea and explained it. But there’s so much value in just finding good ideas and either explaining them better, or condensing them, or just making them available to an audience that hasn’t been exposed to them before. The issue with AI alignment is not so much that… I mean, okay, there’s a few issues, but a major issue is that the overwhelming majority of relevant people have never been exposed to the arguments in any form, right? They’ve been exposed to the most cartoon possible thing about saving us from the robots or whatever. And even just the basic laying out of the core arguments has not reached the overwhelming majority of people. Yeah.
Michaël: Yeah. And now I talk to founders, or people here in the Bay, who go and talk to actual AI researchers, talk to Ilya Sutskever at OpenAI, and they give the basic arguments for AI alignment. And they’re updating real-time, they’re like, “Oh, this is an actual problem.”
Michaël: “Maybe I should work more on it.”
Rob: Yeah. Whatever you think about Silicon Valley, it does encourage mind-changing… Actually updating based on new arguments or taking arguments seriously, being willing to pivot, as they say.
Michaël: Yeah. I’m curious, so yeah. So what you said about having the best explanation, I think it’s very important to distill the concepts, so that is like, you take less effort to go through a paper. You just need to watch the Robert Miles video about it, and then you get into it into only five minutes. And is there, did you think, any other media, except from YouTube, that would benefit from it? Do you think people should do TikTok or those kind of things? ⬆
How Rob Approaches TikTok
Rob: Yeah. I mean, I’m trying to do TikTok a little bit. Yeah, I started… I don’t like TikTok, but for something I don’t like, I spend a lot of time there. And so, I know which way the wind is blowing. And let it not be said that I do not also blow. So no, I think TikTok, unlike… Long TikTok, I think it’s going to eat YouTube’s lunch by just being more fit. And so, I’ve got to have a presence on that platform, but actually, yeah, so now it’s a little bit annoying. I started putting out a couple of… I put something out on TikTok, and my audience’s reaction was immediately like, “TikTok is bad and you shouldn’t be there.”
Rob: I’m like, “Yes, but let’s be instrumental.” And so, what I said was, “Well, look, I’m not going to put anything on TikTok that I don’t also put on YouTube, right?” I absolutely don’t want to be the reason that anyone decides to download TikTok, but there’s a big audience there that should be reached. And this actually caused a bunch of annoyance, because my thinking was, “Okay, well, what I’ll do is I’ll record these short vertical videos, and I’ll publish them simultaneously to TikTok and to YouTube Shorts, right?” Because YouTube Shorts, subtitle, We’re Scared of TikTok, right?
Michaël: “Please, TikTok, don’t bully us”.
Rob: Yeah. We can… Me too. And, but they’re limited to one minute, right? They have to be one minute. And then, shortly after that, TikTok announced that you could have three-minute videos, you can have 10-minute videos now.
Rob: Yeah. And-
Michaël: Do people use TikTok for 10-minute video?
Rob: It’s not super common, but you see them from time to time. It’s rough, because you got to keep people’s attention every second during those 10 minutes. But one minute is just not enough time to say almost anything. And I thought the point of TikTok, for me, of making these shorts… We mentioned perfectionism before.
Rob: It’s been a very long time since I released a video, and it just, in general, the gaps between my videos is much longer than I would like to be, because I get bogged down in the minutiae of making this the best video that it could possibly be and so on. And I thought, “Okay, making shorts, this is going to be nice. I can just whip something out and just do it very quickly, and publish it and not care too much.” Because the expectations are much lower for this kind of handheld, vertical video. But it turns out no, because if you want to say anything of any significance in a one-minute video, you need to have an incredibly tightly scripted and tightly performed video. A one-minute, it’s like…
Michaël: Punchline after punchline.
Rob: Yeah. But it’s just like it’s work. I think it was maybe Ben Franklin or someone who ended one of his letters with, “Sorry for sending such a long letter. I didn’t have time to write a short one.” That’s how it is. It takes so much work to make a one-minute video that says anything of consequence about a technical subject.
Michaël: If you’re into that, there’s a YouTube series by Ryan Trahan on how he went from zero to 1 million followers. And he tried to go to a million in a week.
Michaël: Obviously he didn’t succeed, but he did three or four, eight videos of TikTok every day, doing editing and… Not actual, shot with his phone. And you see all the time he spends just trying to come up with a concept. He comes up with a joke, and then he’s like, “Oh yeah, let’s shoot it.” And then the actual shooting takes like, I don’t know, half an hour.
Michaël: But it’s just like finding a good joke. But I guess if you want to do something technical, and you explain stuff, it’s even more difficult, right?
Rob: Yeah. You want… I’m not interested in… I’m interested in conveying some actual understanding. And in a minute, you just can’t, or you have to make 100 videos. I don’t know. But it’s super irritating, so, whereas if you’ve got two minutes, three minutes, that’s enough time to get across one real concept to a level that’s actually satisfying. But then, if I do that, then I can’t publish it as a YouTube Short. So then it’s actually a YouTube video, then it’s a much bigger deal, their expectations are more, whatever. So I’ve been trying a thing recently. I don’t know how much people are interested in the YouTube/TikTok meta stuff, or if we want to talk more about AI, but-
Michaël: I think a bunch of people on YouTube are interested in the YouTube meta.
Michaël: And maybe they kind of want to know more about, what have you been doing, what are you up to, and-
Rob: Yeah. Yeah.
Michaël: … updates on this.
Rob: Okay. So just to finish off the thing, then, yeah, I’ve been doing this thing where I try and shoot handheld with a gimbal in 4K, and then just trying to keep myself in the middle of the frame so that I can make a two to three-minute YouTube video and a two to three-minute TikTok, one in 16x9 and one in 9x16, cropped out of the middle, simultaneously. Because I don’t want to be shooting separately for these two things. It’s just so much extra work. And I’ve been having some success with that. So I don’t know, I’m still figuring out my strategy about these things. Yeah.
Michaël: Wait, so it goes like both horizontally and vertically? Because for TikTok, it’s vertically, right?
Rob: Yeah, yeah. So you shoot in wide screen, but you just try and keep everything that’s actually important right in the middle of the frame so that you can just crop out the sides and make the video once. ⬆
What Rob Has Been Up To
Michaël: Oh, nice. Yeah, so about the stuff you’ve been doing, I think you said that you haven’t published a longer one in a couple of months.
Rob: Yeah. A long time.
Michaël: What’s the… Are you preparing one big thing, or is it-
Rob: Yeah, it’s a combination of a few factors. So firstly, a lot of videos just take a while. The iterated distillation and amplification video is one that I’m pretty proud of. I used Manim, 3Blue1Brown’s math animation software for that. And that video came out four months after the previous video, but that just took me four months. That was just four months of work. Animation is horrifying.
Michaël: And now you know how to do animation, then you can do other videos about-
Rob: Yeah, now I know that. I know never to do it again, or to hire someone, right?
Rob: And I’m starting to get to the point now where I’m looking to hire a bunch of people. But yeah, in the last year, one thing is, I got obsessed with this, what turned out to be a very, very ambitious idea for the next video. I want to do a programming tutorial, which I’ve never done before. And in order to prepare for that, I had to learn how to actually implement the thing. I had to write the code for the thing that I would be implementing. Because if I shoot, if I… At first, I thought, “Oh, well, I’ll figure it out as I go.” It doesn’t work at all. It turns out the thing I was trying to implement is really hard. It’s like, you know, code I was writing when I was a PhD student is easier than this, right? Especially because I’m trying to be incredibly simple. I’m trying to not introduce any dependencies, and as few dependencies, conceptual dependencies, as well as libraries. I’m like, I’m writing this in Python, from scratch, using basic programming concepts. And yet, this should be something that can tell you something meaningful about AGI.
Michaël: Is it something like coding a transformer from scratch?
Rob: It’s more ambitious than that. So, and this is like… This was kind of stupid. Turns out it was kind of stupid. But I can’t let it go, because I honestly think that this video’s going to be really good. So that’s one thing. The other thing is, I’ve just been doing a lot of things this year. I’ve been going to a lot of events, conferences, retreats, seminars, things like that. I think it’s because, coming out of COVID, there’s just a lot of these things happening.
Rob: But also, I’m re-evaluating certain maxims that I have lived by for the past several years, like when I was an undergrad, I made this rule for myself, similar to… I used to do this a lot, just make rules for myself, like the kind of rule that would compel you to read a million-page manuscript.
Michaël: If you like a person three times, then you should go out with them.
Rob: Right, yeah, exactly. This kind of thing. And so, the rule I made was, I noticed, as an undergrad, people would invite me to things, a random party, a club, an event, whatever. And I would think about it, and I’d be like, “Do I feel like doing this?” And usually, the answer was, “Nah, I don’t really feel like doing this.” And so, I would say, “No, thank you.” But then, every now and then, I would be successfully dragged to one of these things, or whatever, and I… When I had that same feeling of I didn’t feel like doing it, and then I would have a great time, or I would meet somebody great, or I would learn something new that was excellent, or whatever. And so, I was like, “You know what? Fuck this. I’m not trusting my I-don’t-feel-like-this impulse at all. If people invite me to something, I’m going to go. Just flat rule, default yes, unless I can think of a really good reason not to.”
Michaël: Thanks for being in the show.
Rob: Yeah, right. Exactly. Exactly. And then I think of the combination of the end of COVID, and also the fact that, in the intervening time, when I was just in my house and not seeing anyone, the channel became much larger. And so, I became a bigger deal. And so, people invited me to more interesting stuff. And this year, I’ve just been saying yes to everything, and it has eaten up a lot of my time. And it’s really, really great. A lot of the stuff I’ve gone to has been great, I’ve met amazing people, I have so many great new ideas for videos from all of these exciting conversations that I’m having and everything, but I haven’t… I was like… So I was in Puerto Rico, and then I was in rural Washington State for a bit, for a retreat there, and then I came to the Bay for a week, and then I went to The Bahamas briefly for an event there, and then I was back home, and then it was EA Global, the Oxford one, and then the London one, and then EA Global here, and then the Future Forum, and yeah, it’s a… I’m doing too many things, clearly, but they’re all really… They’re all good. I don’t regret any of them. I just regret not getting long contiguous stretches of time to finish this video off and get it published.
Michaël: It’s like, we didn’t get any social during two years, and now we’re getting all of them in six months.
Rob: Right. Yeah, exactly. And also, I got COVID through all of this, right? Because obviously, eventually, I’d got COVID, but it was actually fine. It was…
Michaël: You’re good?
Rob: Yeah. I was just kind of tired and sick for a couple of days, and then I got over it and I was like, “Man, did I bring my entire life to a halt for that?”
Rob: Yeah. Because I had a bunch of vaccines, right? I had loads.
Michaël: Why four?
Rob: The more, the merrier. No, because the efficacy starts to wear off after some number of months, six months or something, and-
Michaël: So you had your booster shots as a… You were one of the first ones to have a booster, and then you took another one six months after?
Rob: Yeah. Basically, I timed them pretty strategically. So it’s like, before I was going to a major conference, or a major whatever, a couple of weeks before would be when I would get the next booster, the second shot or whatever it was. Yeah.
Michaël: I can see the small seven-years-old Robert Miles being very meta about things, like predicting the future. ⬆
The Stampy Project
Michaël: One thing that’s happened a lot during the pandemics was people staying home more, and also, Discord exploded.
Michaël: And you have started a new Discord server that was supposed to be Patreon only.
Rob: Yeah, it was originally.
Michaël: And now it’s public a little bit more.
Rob: Yeah. Yeah. We’re kind of gradually opening it up. Yeah. So it started… Everyone was like, “You should make a Discord server for your viewers, for your community, whatever.” And I had not really used Discord very much, and didn’t really know what it was about, and so on. And I was like, “I don’t want to have a fan club.” You know what I mean? If I make a Discord that’s like The Rob Miles AI YouTube Channel Discord, and it’s just a bunch of people sitting around and talking about my videos, I guess, it feels weird to me. It feels like, I don’t know, egotistical or something to preside over this space of just people who are there because of you and like your stuff. I don’t know. It was… I was not very comfortable with it. And so, the original plan was, I had this idea, “What if we can give the thing an outward focus? What if we can have it be about something that isn’t me?” And I simultaneously had this problem of, I get a bunch of YouTube comments, many of which are questions, some of which are good questions, and I just…
Rob: don’t have time. There’s a lot of them. And it’s also not a very good use of my time to write an answer to one person, because the YouTube comments system does not do a good job of surfacing that content to the people who need to see it. So it’s realistically a small number of people. I could spend half an hour coming up with a good reply to a question, and then nobody’s going to see that apart from the person who asked it and a few people who happened to open that comment up.
Rob: So I wrote this bot called Stampy, who… It’s a dumb joke based on the stamp collector thing from the early Computerphile video.
Michaël: You were the one to invent the stamp collector.
Rob: Weirdly, no. I chose stamps because there was an article about it, which I think was on the Singularity Institute’s page back in the day. I’ve not been able to find it. I’m certain I didn’t hallucinate it. It might be in the Wayback Machine somewhere, but googling the Wayback Machine doesn’t work, so I don’t know.
Michaël: Way, way back.
Rob: Yeah. I only popularized stamps… I mean, it’s the same as the paperclip maximizer, basically, but it’s stamps instead of paper clips, and therefore you have Clippy the paperclip maximizer, and Stampy the stamp maximizer.
Michaël: What does Stampy do in your Discord?
Rob: So originally what Stampy did was, he would watch the YouTube comments through the YouTube API, look for things that looked like questions, and then pull those out, keep them in a queue. And then in the general channel on the Discord, if ever there was no comments for a certain period of time, conversation has died down, Stampy would pipe up and say, “Someone asked this question on YouTube. What do we think about this question?” And so it’s serving two purposes. Firstly, it’s providing a nice source of regular alignment or AI safety related things for people to talk about. And then Stampy would ask this and then people would be like, “Yeah, that is an interesting question. I think the answer’s this, I think the answer’s that.” And have a debate, have a discussion. And then at a certain point you can say, “Okay, I think we’ve got something we’re happy with,” write something up in Discord and say, “Stampy, post this to YouTube.” And the idea is then Stampy will take that and post it as a reply to the comment.
Michaël: Who does the replying? Is it like you write that answer, or is it someone from Discord?
Rob: Anyone on Discord, they write it, and they just say to Stampy to send it. And then Stampy will post it under the bot’s account. And this was very cool for a short period of time. And then we had this one video that a lot of people asked the same question about, and so then we had Stampy giving a lot of the same answer, because people were like, “We already answered this.” And then we also had some functionality that was like, “Stampy, has anybody asked a question like this before?” And he would run a semantic search on the previous… Was it semantic search? A keyword search, whatever, on previous answers, so you can find similar things. So we were giving the same answer, and the answer also contained external links, because we were linking to resources on the Alignment Forum or arXiv or whatever.
Michaël: But YouTube is very bad for links.
Rob: Awful. And so Stampy was in bot prison. And as far as we know, is still in bot prison. Sort it out, YouTube.
Michaël: What does it mean to be in bot prison?
Rob: Oh, it’s the worst. If you imagine what it’s like to have a bot that’s shadow banned or whatever, it’s the worst. Because everything works except the answers… The comments never show up. But it’s actually worse than that, because they do show up, but then they all immediately disappear after about 30 seconds. So you tell it to post it, if you look, it gives you the URL that it’s posted it to, the API is given a 200 response, as far as the thing is concerned it’s fine. And then sometime over the course of the next minute, minute and a half, that will just silently disappear with no notification of any kind.
Michaël: So is Stampy heaven banned?
Rob: Yeah, I guess Stampy’s heaven banned. And it’s so annoying, because there are people who watch my stuff who work at YouTube, and some of them are in my Discord, and so on. And there is an internal tracking number for this issue, because the other thing is, Stampy, that bot account, I made it an approved submitter. That’s a thing you can do on your YouTube account, to be like, “This person is authorized.” I made it a mod on my comments. I made it something else. I set every possible thing to be like, “No, this bot is legit. Let it post what it wants.” Didn’t seem to help. That’s been ongoing in YouTube for, God, like a year now.
Michaël: Can you create another bot?
Rob: Yeah, that’s what we’re doing now. But also independently of that, we shifted focus, because the actual… ⬆
In fact, as I was saying before, posting a reply on YouTube is actually not that high value. That’s not where most of the… There’s not many people see it there. So we just started storing these, and now we have a website that we are hoping will become a comprehensive FAQ about AI safety.
Michaël: What’s the website called?
Michaël: Oh, you have the domain name.
Rob: Yeah. But it’s like pre-alpha right now. It’s not really… So I’m not actually sure if we want to talk about it on the podcast. Well, we’ll talk about it. Maybe we can delay the release of this episode or something. I don’t know. We’re not quite ready to fully launch. We’ll see what happens.
Michaël: Maybe you can get more people to help you with the website.
Rob: Yeah. That’s the hope. And so then… Yeah. Comprehensive FAQ so that anyone who has any question, this is the dream, anyone who has heard something about AI safety or AI alignment research or whatever can go and ask their question on this thing, and get a good, high quality, human authored response. So it’s not really an FAQ, it’s like an AQ effectively.
Michaël: Answer a question?
Rob: Frequently asked, right? So just any asked questions. If it’s been asked, we want it on Stampy.
Michaël: So it’s all the questions that have appeared on YouTube, and you guys have answered on Discord.
Rob: Right. And so now we’re we’re broadening that, or rather we have broadened that as well to, you can submit questions directly on stampy.ai. And we’ve just written a bunch of them, just like, what are some questions that people have asked you at any point in your life about this? And just throw them in there as well.
Michaël: I believe there’s a other website, some domain like ui.stampy.ai, where you have maybe ten of them, maybe more organized and there is a better UI.
Rob: Yeah. The core thing, and probably by the time we actually launch stampy.ai will be the UI, and the other thing will be wiki.stampy.ai or something like that. We’re probably going to rearrange the domains. Because the back end is a wiki, we’re using Semantic MediaWiki, so that we have a really nice interface for communal collaboration on those answers and processing them, tagging them, organizing them, all of that kind of work. And so having an answer to every question anyone might have about AI safety is pretty ambitious, but it’s ambitious in the same way as Wikipedia is ambitious. We want to have a really in-depth, quality article about every topic that’s noteworthy, is insane and yet they pulled it off. And I think we can do the same, because actually it’s much smaller than that. ⬆
Not Just Another Arbital
Michaël: How is that different from the Arbital project?
Rob: I don’t know a lot about Arbital, but I think they were more focused on, firstly, longer articles, explainers. I think they had a higher requirement for expertise. I don’t know that you could get a large group of… People who are capable of writing Arbital articles to the standard that they wanted are high level alignment researchers already, or something, and it’s not necessarily the best use of their time to do that. Something like that. That’s my read of the situation.
Michaël: They had a high bar, and I think most of them were written by the guy, Eliezer Yudkowsky.
Rob: Yeah. And it wasn’t just him, but everyone who was writing on there could also be doing good research with that time. And so then there’s a question of, what’s the most efficient use of people’s time? I suppose. Whereas with Stampy, what I’m hoping is… There’s a bunch of easy questions. There’s a bunch of questions I get on my YouTube videos all the time, which can be answered very well by watching the YouTube video that they’re a comment on. So often…
Michaël: Go watch the video.
Rob: Yeah. So often I want to reply with, “Oh man, yeah, if you’re interested in that topic, have I got a the rest of the video you’re currently watching for you?”
Michaël: You should say, “Here’s a link for you, and then it goes back to the same page.”
Rob: Yeah. No, I think people watch the video, and a question occurs to them, and they pause the video and write a question, and then presumably watch the rest of the video in which it’s answered. But what I mean is, the skill floor for being able to usefully answer questions is pretty low. There’s a thing… ⬆
Learning by Answering Questions
Rob: I’m inspired by most what I would call actually functional educational institutions, by which I mean those not based on the Prussian model of education where we have these rigid classes and we’re trying to manufacture factory workers.
Michaël: So things like Coursera or Github projects?
Rob: Well, I mean, as communities, going back further in time. The way that schooling used to be done, it was a lot of one-on-one tutoring. But also you didn’t have these classes in the same way. In a good educational institution, everyone is both a teacher and a student, because the best person to learn something from is a person who just learned it. There are a lot of people who… Getting the top researchers in the field to teach undergrads is wild. How is that a good use of anyone’s time?
Rob: Because firstly, this person is a preeminent researcher, they’ve got better things to do. Secondly, they’re skilled at research, not skilled at teaching, necessarily, and they don’t really train them in teaching. And thirdly, they’re extremely naturally gifted, probably, in order to be where they are, and so certain things will be obvious to them that are not obvious to all of these undergrads. And fourthly, the things that they’re teaching are things that they learned decades ago, that they have completely forgotten what it’s like to not already know. And if you don’t know what it’s like to not know something, it’s very hard to teach it, because it’s an exercise in empathy, in modeling the other person’s mental state so that you can explain things to them.
Michaël: Sometimes they’re learning at the same time you’re doing as well. So for AI courses, if you move to fourth year of college, or fifth year, then you need to start learning about deep learning. You cannot just build GOFAI all the time. So you start learning deep learning, and then you realize that your teachers have started learning about those things at the same time as you, and the problem is that their depth of knowledge, it’s just like… So there’s a couple of neural networks, they’re connected with some weights, and that’s it. If you want to learn more, just go to the industry.
Rob: Yeah. That feels like there’s a separate problem, which is that the field is advancing so quickly that if you teach for five years, then you’re out of date. I’m in this situation now. When I was a PhD student, that was whatever it was, 2013, 2014 or something. When was AlexNet?
Michaël: 2012. It hasn’t been 10 years since AlexNet, that’s what Karpathy told me.
Rob: Right. So there’s all kinds of things about modern deep learning that I have the most surface level understanding of. And in my position, people expect me to know things, and I don’t know things. Don’t expect me to know things, please.
Michaël: You know the things you make videos about.
Rob: Right. If I have made a video about it, then plausibly I know something about that subject, but more broadly. So the point I was making was, that’s what I want Stampy to be. I want the Stampy project and the community around it to be a place where people can learn, because if you’ve watched my videos and you’re interested and you think you’ve basically understood them, there are questions on Stampy that you are fully qualified to answer. And there will also be questions that you’re almost qualified to answer. There’ll be questions that you can have a stab at, and if you just spend an hour just googling and reading some of the relevant stuff, then you can answer. And what do you know, you’ve learned something. There’s a lot of things where, you know how to answer them, but until you actually try to sit down and write the answer out, you don’t really know.
Michaël: That’s the real Stampy University. You just try to answer questions in Discord, you try to get answers on the new webpage, try to give some stuff, and then it’s by explaining that you learn the actual thing.
Rob: Exactly. And also it’s a collaborative thing. So just writing an answer, depending… ⬆
The Stampy Karma System
Rob: There’s various different ways that we could do this, but the way it worked originally on Stampy, on the Discord, we would have… You write an answer, you ask Stampy to send it. He doesn’t immediately send it. He’ll say, “I’ll send this when it’s got enough stamps on it.” When there’s enough postage to send it. And so we implemented this karma system, whereby there’s a little stamp react on Discord that you can put on a reply, and when there are enough reacts, Stampy will actually post the thing. So you don’t just let any random person on Discord post as the official bot. You don’t want that.
Rob: The point is that the value of the stamp that you put on is different from person to person, proportional to how many stamps you have. And this sounds like a recursive definition, because it is. It’s essentially PageRank. And so what that means is, because the thing that you use stamps for is to say, “I approve of this answer you wrote being published,” that stamp effectively means, “I think this person knows what they’re talking about.” And so you can then build up this big set of simultaneous equations that you just solve with NumPy, and that gives you a number that’s, to what extent does this person know what they’re talking about in the opinion of people who know what they’re talking about? And that’s the number.
Michaël: That’s pretty similar to what LessWrong does, where you have this karma. I think at the beginning you have one vote, and then if you get 1,000 karma, then you get to like seven votes, maybe Eliezer has ten votes, I don’t know.
Rob: It’s similar to LessWrong karma, but the way that LessWrong karma works is, it’s time dependent. So suppose you’re the world’s biggest AI safety genius, and you… Well, you have the potential be, whatever. You have amazing takes, but you’ve just discovered LessWrong and you join LessWrong. You have one karma, the same as some rando who has no idea what they’re talking about. And you can vote on all of the stuff, but it has very, very little effect. And then later on you can make a bunch of posts and get a bunch of karma as people acknowledge the quality of your thinking, but all of that previous stuff was wasted. Whereas in this case, we don’t actually track a karma number. We just keep track of how many stamps everyone has given to everyone else. And so if you show up and you express your opinion about a bunch of stuff, and then later get a bunch of karma, all of your earlier votes are retroactively updated to reflect your current karma.
Michaël: That’s pretty cool.
Rob: Yeah. So you find the equilibrium of the thing.
Michaël: So if one guy agreed you, and turned out to be an AI safety genius with ten karma, then you get ten votes for your original thing.
Rob: Exactly. Exactly. And so it’s a little bit… The LessWrong thing supports better the idea that people can change and develop, and maybe as you learn more, then you should deserve to get more voting power or something. Whereas this is time invariant, votes you may get at any point in time are all treated the same. There’s pros and cons to each approach. But I find that it’s a bit more motivating to me, as a new user, that I know that the things that I do, even if they’re not fully counted now, they will be fully counted later, and so it’s worth engaging even when you have no karma, on the assumption that you’ll get some later.
Michaël: Yeah. Now I want to make a bunch of explanations and get a bunch of karma.
Rob: Right. Yeah, yeah. And the karma is stamps. Oh, I was going to ask Stampy. Stampy, how many stamps am I worth?
Michaël: Live demo-ing Stampy.
Rob: I am just going to DM Stampy. How many stamps am I worth?
Michaël: Private DM to Stampy.
Rob: I’m worth 2,758.93 stamps, it says here.
Michaël: That’s pretty good.
Rob: Yeah, it’s pretty… Okay, that’s gone up so much. I don’t check it, because I don’t care about the karma on my own Discord, but…
Michaël: How much people stamp you?
Rob: Yeah. Well, no. So actually, the thing is, I kind of lied, or I oversimplified. I’ve set myself as the source of truth in this system. So it’s not exactly this completely open web. It’s more like a tree structure with me at the root. So it’s like, people who know what they’re talking about, according to people who Rob thinks know what they’re talking about, just to provide some ground truth. I think you can just solve for the thing without, and just let it find its equilibrium without that, but I wanted to provide it with some ground truth.
Michaël: The Rob baseline.
Rob: Yeah. But the other thing is, you can change it. That number is not fixed. It’s the result of a calculation. And so if I decide… So for example, one thing we have here is a decay factor. So if I stamp you, and you stamp someone else, that shouldn’t really be a full stamp from me. It’s partial transitivity. If I trust that you know what you’re talking about, that means I also trust that you can recognize when other people know what they’re talking about, but maybe not 100%. So I think we have that set to 0.9 right now, and so it fades off as it goes out through the system. But that’s a free parameter that you can set wherever you want, to be like, do we want it to be very tightly focused on my take on what’s a good explanation? Or do we want it to be more democratic? You can increase that transitivity to have that karma spread more widely through the system.
Michaël: It’s kind of a decay in reinforcement learning, but instead of having the reward being multi-steps, it’s just how far are you from Rob in the tree?
Rob: Right, exactly. And in the code we do call it gamma for the same reason.
Michaël: Right. So I did ask some questions on Discord, I did talk to Stampy about things. And I mostly asked, what were good questions to ask you today? And they seem to be pretty interested in explanation of things. But I also asked stuff on Twitter, and on Twitter people were mostly interested in AI timelines, and those kind of things. So you talked about AlexNet being ten years ago. And you have this t-shirt where you are, I believe…
Rob: I’m on that.
Michaël: You are here.
Rob: There I am. ⬆
Rob On The AI Alignment Chart
Michaël: So I wanted to get your feelings on this shirt because I asked some people on the podcast about it, and maybe I could just start with one tweet you made a few days ago. “People keep asking me how optimistic I am about our rate of progress in AI, and I have no idea what they’re actually asking. Progress seems very fast; I am not optimistic.”
Rob: Yeah. This is a recurrent thing. People use optimistic to mean fast progress, and if you have the default assumption that all new capabilities are positive, then sure. That’s not where I am. So I’m probably about… I think maybe I should be on the other side of Jack Clark there. I feel like…
Michaël: By the way, if I misrepresent your views and did something wrong, I know, I’ve corrected the thing. And sorry, Jack Clark, I know it’s not your actual position.
Rob: Yeah. I was wondering about that, actually.
Michaël: I think he said something on Twitter, being 50/50 on AGI being good or bad.
Michaël: Meaning us surviving, I think.
Rob: So that was in response to seeing this thing, like “put me on that middle axis”.
Michaël: Right, exactly. Yeah. Something like this.
Rob: Okay. No, I think I’m closer to Yudkowsky and Connor Leahy and stuff, as far as bad is concerned. ⬆
Michaël: So would you consider yourself a scale maximalist?
Rob: No. I mean, I don’t know. How would you define a scale maximalist?
Michaël: You think that you can get AGI by mostly scaling our current systems. So some people say just scaling, but I guess it’s just about details. If you count new data loaders or new tricks, like some Gato tokenisation or something as a trick than no you need maybe more small tricks, but no fundamental architectural change to get to AGI. At least, I would say a scale maximalist gives it more than 50%, or maybe 30%, so you just consider this as a possibility, at least.
Rob: I do consider it a possibility.
Michaël: I think maybe a scale maximalist would be more than 50%.
Rob: Yeah. Maximalist is a strong term, right? I think… What do I think? It’s plausible to me that if you keep piling on scale and small tricks, you get AGI, but you might have to do that for quite a long time. And I think realistically, people are still doing fundamental ML research as well. And so whether or not we strictly speaking need new theoretical breakthroughs, we’ll probably get some, and it will probably be one of those that actually tips us over, even if maybe we could get there eventually just by tweaks.
Michaël: And those tweaks might give us more efficiency maybe a new scaling law or something that will make us win or lose. I don’t know. One or two years.
Rob: Yeah. It wouldn’t surprise me that much if we get a new thing on the general scale of the transformer. A concept with equivalent significance as attention. If your timelines are over decades, you look at the rate at which these things happen, we should expect to have another few of those over the next few decades.
Michaël: Maybe it’s already here. Maybe it’s in a NeurIPS paper, and some people read it. The Transformer paper was in 2017, and became really big with that GPT-3 paper. So maybe you should expect a delay between when the stuff is out and when people are actually reading it and using it all the time.
Rob: Yeah, totally. That makes sense. And I don’t know. ⬆
Understanding the Neocortex and Socratic Models
Rob: Candidates for this include getting some fundamental insight into the architecture of the neocortex, for example. Seems like that’s achievable. People are looking into it. If we can figure out in better detail the precise structure of a neocortical column and then have something inspired by that, might just perform better. I don’t know. These kinds of things. Also a lot of things that are not exactly tricks. They’re not tricks on the scale of, let’s think step by step, but they’re tricks on the scale of, I don’t know, things like that Socratic AI paper, where you go, “Okay, we have all these different models, but they all have this common modality of English text. Can we build an architecture that allows you to combine these capabilities and get something more general and more versatile, by having different modules that communicate with each other through text?” That feels like the thing that… I don’t know where you’re drawing… I don’t think there’s a clear boundary between tricks and new architectures, really. It’s a continuum.
Michaël: Did you update a lot on this Socratic model, where it could interact with different modalities and texts?
Rob: Yes, absolutely. Because that’s the kind of thing that I’ve been thinking about for a long time. I’ve been thinking about… Multi-agent theories of mind seem basically correct. The idea that your mind is actually made up of some number of possibly dynamically changing and adjusting entities, whatever, that each have some degree of agency that come into play in different times, and interact with each other, is kind of fake, but it seems as fake as the idea that you have a unified self.
Michaël: Is the idea mostly the same as in psychology, with the internal family systems?
Rob: That kind of thing. I think internal family systems introduces a whole bunch of extra details that it can’t support, or something like that. But the core concept that people’s minds are made up of parts that don’t necessarily agree with each other at all times, or want the same thing at all times, accounts for all sorts of aspects of human behavior and irrationality and so on.
Michaël: Now, in the case of AI, you don’t really have multiple agents. You have one… Well you have multiple networks, maybe. I haven’t read this Socrates model paper.
Rob: Yeah. I haven’t looked into it in too much detail.
Michaël: Then you shouldn’t talk about it on a public podcast.
Rob: Yeah, probably. I’m not saying that the Socrates model is doing this specifically, but just broadly speaking, the idea of having separately trained or separately designed components cooperating together through some kind of shared workspace seems pretty powerful to me. The Socrates thing, I think, isn’t differentiable, so I don’t think you can train it end-to-end. I don’t know. I shouldn’t talk about this, because I haven’t actually read the paper properly. ⬆
Rates of Progress and Short AI Timelines
Michaël: No worries. I think this podcast is about getting your inside views and your feelings or understanding about something. We’re not talking about extremely detailed research. But I guess what you can talk about is mostly your impressions of the progress we’re seeing, and maybe how you think about how to update your models on this. So for instance, some people, like Connor Leahy, seem to think that we can just update all way, bro, and see one evidence, maybe the neural scaling laws in 2020, and just anticipate future progress, and have very short timelines. Or I guess for my part, I mostly see one breakthrough every Monday, and I’m like, “Maybe this is what we’re going for now, this is the rate of progress now.” How do you…
Michaël: What we’re going for now. This is the rate of progress now. How do you think about those things? Do you just go on Twitter and are impressed every time you see something new?
Rob: Yeah, yeah. So we are progressing in line with my explicit models from years ago. And what’s changed primarily is that my gut is catching up to where my explicit system two reasoning already was. ⬆
The Law of Accelerating Returns
Rob: If you had asked me to look at it, whatever it is, five, 10 years ago, I’d have been like, “Well, I don’t know. Kurzweil has some nice graphs and it certainly looks exponential, and so in the absence of any specific reason to think something else is going to happen, I guess it’s exponential curve. And that goes vertical at 2045 or whatever it is, and so I guess 2045?”
Rob: But my gut was like, “This is kind of wild. I have pretty wide error bars on every stage of this reasoning. Who knows?” But one mistake is having wide error bars in one direction. Actually, if you have wide error bars, that spreads you out either side. Right? Yeah, maybe it’s later than 2045, maybe it’s earlier. And so I feel like we’re now on the part of the exponential curve where it feels exponential.
Michaël: It does. I think one thing Kurzweil uses to talk about exponential progress, like law of accelerating returns, is that at one point you have so much progress. Like the singularity is the finer thing when humans are not able to follow what’s going on. So there’s like new advances, new technology being developed. And we cannot read the New York Times fast enough.
Rob: Yeah, and we’re already there.
Michaël: We’re already there for AI advances maybe. Or if we’re not there now maybe we’re going to be there like a year.
Rob: Yeah. Yeah. And the other thing that makes the singularity concept valuable to me is, they say it’s the point past which you can’t forecast. And a lot of it comes down to the prediction gets way noisier, because noise… Like, when you’re following the industrial revolution, let’s say, you’re talking about the development of the steam engine or whatever. If Trevithick has an accident and he can’t work for a couple of weeks let’s say, this doesn’t affect the timeline of anything. Right? Whereas like, if there’s a new breakthrough every Monday and somebody just like gets a cold and can’t work for two weeks, then that’s actually meaningful differential progress where the thing they were working on comes out two weeks later. And coming out two weeks later matters, because the order it’s like, does this capability happen before or after we find out how to adequately control that capability? Or like, whatever it is. It gets to the point where being fast or short time scales become actually consequential. And yeah, I’m just realizing how horrifying that is for my release schedule.
Rob: That’s really rough.
Michaël: Okay, that’s the real point is like, how do you make videos that take eight months when the timeline is so short?
Rob: Yeah. I’m trying to speed up. It’s difficult.
Michaël: The right question to ask is how many Robert Miles videos will we see before AGI?
Rob: Oh, that’s a fascinating question.
Michaël: I predict 10.
Rob: You think 10?
Michaël: I have short timelines.
Rob: Depends what counts as a video. Like if these shorts and things count, I think it’s going to be a lot more than that, but yeah, I hope to have covered the basic concepts of AI alignment theory before the end of the world. ⬆
People Rob Might Want To Hire
Michaël: You were saying something earlier on about how you plan to hire more people.
Michaël: Are you currently interviewing or taking any application?
Rob: Yeah, so I did a thing a while ago of looking for editors and found some candidates and I’ve been working with a couple of people, and that’s been pretty good. And I’m now looking at hiring writers, like script writers, because script writing is the thing that takes the longest, but it’s also very… I have very high standards, like the writing is the core of the thing. Right?
Michaël: Maybe they can do like the research beforehand.
Rob: Yeah, yeah. I need to study a little bit more about how to do this. But I think a lot of it is just like coming to terms with being… Of like having a significant impact on the landscape or something. Like, the attitude I take to my channel, psychologically, has not changed that much since I had, I don’t know, 10,000 subscribers. Right? And so like now I have 10 times as many subscribers and certain aspects of my approach presumably should be multiplied by 10 times. Right? Maybe not all of them, but some of them. And I don’t think I’ve changed anything by 10 times in terms of how worthwhile it is for various people to work with me. Right? This is just like imposter syndrome stuff. But like I still have this feeling of like, well, anyone who’s able to help me with writing needs to be a person who has a good… Firstly, is a really good writer. Secondly, has a really good understanding of the problem or can acquire it. And then part of me is like, that’s a pretty impressive and high consequence person who like, why would they waste their time helping this YouTube channel? Right?
Michaël: So now that you have like 10 times more subscribers, if you just look at the numbers, you might want to have a higher standard for people that will help you will actually be competent.
Rob: Yeah. Yeah, basically just feeling that my project is worth it for people. But the other thing is, with imposter syndrome, you’re saying no for other people. Right? You’re not even asking them. You’re assuming that their answer is no. Whereas the thing to do is to respect that they are adults who understand what their time is worth and give them the option and let them decide if they want to work with you or not. Right?
Michaël: The number of people I’ve talked to will just say, “Why don’t we give more money to Robert Miles? Why don’t we just hire more people to help him?”
Rob: Right. Right. And yeah, it’s… Hire a therapist.
Michaël: Why hire a therapist when you have Stampy, you can talk to him?
Rob: Stampy is very sarcastic and unhelpful and no, this is like a thing. Oh man, I have all of these takes on chatbots. Can you talk about chatbots for a bit?
Michaël: Let’s go for it. ⬆
Why Chatbots Have Had No Impact
Rob: Have you noticed that chatbots have like not taken over the world? Or wait, that’s a stupid phrasing. chatbots have like… They have not had hardly any impact.
Michaël: I did try a chatbot during COVID when I was feeling a bit down. It was really bad. It would just say like, “Hey, have you tried this thing?” I’m like, yeah. And every day was like, “Oh, have you tried this technique of how to debug your thoughts?” And I was like, yeah, I read a book about it.
Rob: Right. Right, right. Yeah, it’s like, I wouldn’t be here if I hadn’t. But you know, you’re not most people. But like more broadly speaking, I think people in the past would have been like, I don’t know, looking at ELIZA, right? Like we had enormous success with chatbots in the sixties and enormous work on them since, and yet they haven’t really reached wide deployment and not really used for anything. And I think this is largely because people keep trying to use them in the wrong way. They keep trying to put them in one-on-one conversations. They’re not very good in one-on-one conversations.
Michaël: Or chatbots are so good that we don’t even know that they’re populating the entirety of Twitter.
Rob: I mean, I have suspicions. But like, there’s just… Yeah it frustrates me. It frustrates me.
Rob: People keep putting chatbots in one-on-one conversations where they’re trying to be helpful to you. And this keeps failing because the bots are not very smart. They say stupid things because they don’t know what to say and they’re not able to help you and so then they just seem stupid. So rather than one-on-one conversations where they’re trying to help you, they should be in group conversations where they’re not trying to help you. Group conversations where they’re kind of rude and sarcastic actually. So in a group conversation, rather than… Like, in one-on-one, if you say something, the bot has to say something, even if it has nothing to say. Right? And so it’s speaking because it has to say something. In a group conversation, the bot can speak because it has something to say. Right? It can just sit there and watch the conversation and be kind of thinking about things that it might say and just jump in if it actually has something that it’s confident is relevant to add.
Michaël: Maybe the same problem with humans. Maybe when we talk one-on-one, we just like think of things to say, maybe just like listen and be in a group and say like, “Oh I have a joke here.”
Rob: It’s a harder problem, right? I think a lot of people find this, that group are easier because if you don’t have anything to say, you don’t have to say anything. Same thing with chatbots.
Rob: Secondly, if the bot doesn’t know, like say you say something to a bot and it doesn’t know what you meant. Right? In a one-on-one where it’s trying to be helpful, it has to try to be helpful despite not knowing what you meant. This is very difficult.
Michaël: Sometimes it says like, “Oh, I’m not sure I understood this, can you rephrase it?”
Rob: Right, and it sounds stupid. Whereas, if there’s no premise that the thing is trying to be helpful, it’s very hard to tell the difference between someone who can’t help and someone who doesn’t want to help. And so like, if it doesn’t understand you, it can be like, “What the hell are you talking about?” Right. It can be rude or it can just make a stupid offhand joke or yeah, just like… And so like Stampy is pretty mean a lot of the time. Like, I wrote him this way. He’s mean, he’ll make fun of you. He’s kind of an asshole. And that’s cool because it just means that whenever he doesn’t know what to say, he can just disguise it. People do this too. Right? There’s a lot of people who disguise the fact that they’re stupid by being mean. Doesn’t work as well as they think it does.
Michaël: I think this was a thing on GPT-3, when Nick Cammarata wrote some prompt engineering for when the AI didn’t know anything about what he was saying, it would say-
Rob: Be real.
Michaël: This is bonkers. Be real, yeah.
Rob: Yeah, be real.
Michaël: Be real.
Rob: Yeah. That’s a fun hack. So, why are we talking about this? I forget.
Michaël: Chatbots. You want to talk about chatbots and why chatbots are pretty bad.
Rob: Yes. And why people are misusing them. But having a bot in your group chat that will just jump in, so like… Or the other thing is things like Siri or like Okay Google or whatever, Google Home. They don’t jump in because it’s just really hard for them to hear what’s going on in group conversations. And also I guess people don’t like the idea of being listened in those situations. But in a group chat, it’s all text, it’s all legible. And I really want, in all of my group chats, there to be… Well, not all of them, many of them, for there to be a bot that just like, if you ask a question that is Google-able, right? Like say you and I are talking about, oh, we’re going to record a podcast or whatever. Maybe we’re not here in person. And you say, “Oh, well how is 2:00 PM?” And I say, “Oh, what’s that in UK time?” I want the bot to just jump in. Right? Because he knows. That’s a Google-able thing. Or like, “Should we get a pizza? Are there any pizza places around here?” Right? The bot knows this.
Michaël: And the bot delivers fast delivery pizza.
Rob: Yeah. I mean like, you can be as smart or as dumb as you want to be. But like on the base level, just acknowledging that there’s something that the software can do here and then intervening. This is so not relevant. This was like a hobby horse that I’ve had for a while. Since like, before I was aware of actually important problems in the world. This is like part of why… Like there’s an alternate world where I am running a chatbot startup instead of doing what I’m doing now.
Michaël: You can still go in this world.
Rob: Yeah, I don’t know how much demand there’s going to be in a like post AGI situation. ⬆
How Stampy The Chatbot Could Help With AI Safety Content
Michaël: You do have a chatbot that helps you understand better AI safety content.
Rob: Huh! Huh. Huh. Yeah, so that is the dream, right? That is where we want to go with Stampy. So at first, Stampy… Thank you for getting us back on track. At first, Stampy was this fairly basic bot that would just pull the questions from the thing and post them. And then I kept adding more and more like, it’s got this really cool module system, you write these different modules and when answers come in, the different modules kind of bid on like, “Oh, I think this is something I can deal with.” And it only does as much computation as it needs. They kind of like allow each other to interrupt. And it’s like a whole thing. It’s on GitHub. If you want to help develop Stampy. But yeah, I kept adding functionality to make Stampy do all this fun, useful stuff in the discord, and he has a bunch of functionality. He’ll like search the alignment for him, for you. He has the transcripts of all of my videos. So like when you’re answering questions, like a question comes in about a thing and you can be like, “Stampy, wasn’t there a Rob Miles video where he’s talking about like an AI doing a back flip?” Or something, and Stampy will be like, “Yeah, yeah, here it is.” And just like give the link. That kind of thing kind of thing. So he’s helpful in that respect. He’s helpful on his own terms.
Michaël: Did you have to go through like all the videos and like edit the transcript yourself?
Rob: No, I just used YouTube’s automatic transcription. Just downloaded all of those YouTube DL. Pretty easy. And like the alignment newsletter, he’s got all of those. And he’ll like search various things. WolframAlpha, he’s got like the WolframAlpha API, so you can ask him how many calories there are in a cubic meter of butter or whatever, if you wanted to know.
Rob: I lost my train of thought, where was I going with this?
Michaël: So you were going to like, Stampy useful. You have Stampy that like helps AI people…
Rob: Oh yeah, right. So he started off literally just relaying the questions and answers, and maintaining the karma system. Then I added a bunch of these things just to make him more helpful in the process of looking things up in a group conversation. And that’s the other thing that’s really nice, right? If you and I are having a research group conversation, there’s a really common trope where you’re like, “I definitely read a paper about something.” And then you go off and Google it and then you find the link and bring it back and paste it. Right? And just having a bot in the chat that you can be like, “Give us a link to that archive paper about blah.” Right? “Give us the scaling laws paper.” And the bot just posts it. And then the whole process is like all in line in the chat, and we both see it happen. It’s just like a really nice workflow that I want to have more widely available, but anyway.
Michaël: I feel like our society is slowly transitioning to those kind of things. Like I think with Slack, they started having those bots in those conversations where you could just like slash them. ⬆
The Tool Agent Dichotomy
Rob: The thing I don’t like about that is like, there’s this tool agent dichotomy, right? This is the thing that original Clippy famously broke. Is like, is your software a tool or is it an agent? Tools, you want to be consistent, predictable, reliable. You are using the tool to do a thing. Agents can surprise you. Right? Agents can do things off their own back. And-
Michaël: So you want to have an agent?
Rob: Yeah. Well, you have to do you have to do it right. Right? Like if you have a… Yeah, so Clippy was stupid because it was… Like Microsoft Word is clearly a tool. Right? And trying to do agent stuff badly with Microsoft Word, it’s like this is what Clippit was, right? It would try and do things and help you out. And unless it knows exactly what you want, it’s not helpful.
Michaël: “It looks like you’re trying to open a new page.”
Rob: Yeah, it’s like just fucking do it then. Leave me alone! And people found Clippy horrifying, I think for this reason, right? He blurred those boundaries and did a bad job. And when you have an agent in a chat, or rather when you have a bot in a chat and in order to address the bot, you have to like use a special syntax. That’s basically, you’ve made a command line program that it is present in your chat and it’s much more tooly. And that’s nice in a sense, because it’s predictable and whatever, but I prefer when bots are addressed with pure natural language. And so like Stampy, almost none of Stampy’s commands are like, you have to use this precise syntax. I have these regular expressions that are as permissive as I think I can get away with. Like, as long as you use this word somewhere in the sentence. Like if you start or end your thing with “Stampy”, he’ll know it’s at him, or if you reply to one of his comments, or if you at him anywhere in the thing, then he will respond to that because it’s directly at him. But there’s not a specific like slash Stampy, blah, this command or whatever. You ask like a human question, which is like, “Wasn’t there a paper about this?” Or like, “What do you know about blah?” And he’ll look it up on the alignment forum Wiki or whatever.
Rob: And that’s the kind of thing, you can say, “What do you about X?” And because I have this module system, a bunch of modules will respond to that and they have priority. So like if it finds something in the alignment forum, that’s high priority. If it finds something by Googling, that’s lower priority. If it doesn’t know what to do with any of those, then there’s like, ELIZA will just take a swing at rephrasing the thing you asked as another question or whatever, or make some rude remark. And so… I keep losing what I’m talking about.
Michaël: Right, so basically the thing, if you can access it through using RegEx and to look up useful information ranked by if they’re alignment forum or on Wikipedia.
Rob: Yeah, yeah. And so I just think it’s nicer if you can talk to them as though they’re people. I prefer that as an interface. But anyway, we keep getting sidetracked from where I was actually going with this. At first, Stampy was very simple. Right? Now he has a bunch of extra functionality. The dream is with language models going how they are, and large language model based conversation agents, things like Lambda, the technology is advancing now where I think that there’s the potential for-
Rob: There’s the potential for… We could talk about that if you like but-
Michaël: So then, I joke.
Rob: Yeah, I mean I made a video about it. It’s not as interesting as everybody seems to think it is. But so there’s the potential for something where we have a giant corpus of, hopefully giant corpus, of like a very large number of questions people have asked about AI safety and high quality human authored responses. You then use that, probably you don’t fine tune on it, you probably do something cleverer like one of these architectures, like there was one, what was the like trillion tokens thing recently? And see like, wouldn’t it be nice? It’s not PaLM, I don’t think.
Michaël: Oh, a trillion one?
Rob: Yeah. See if Stampy was here, just like over there, be like, “What was that? There was a paper that was like something learning from something with the trillion tokens.” And he’d be like, “Here it is. Here’s the link.” Things where like rather than, basically you want to get around this whole like memorization thing, right? That large language models, they’re just transformers, they memorize an enormous amount of the data set, and this is kind of a waste of parameters. What you actually want is a system that can dynamically look up from anywhere in the dataset, what it needs and do kind of the like… There’s like a hacky thing you can do with GPT-3 where you give it… And Stampy does this to some extent. You give it the question, it uses that with some kind of semantic search on its dataset to get some text and then it just loads that text into the prompt. It just like stuffs the prompt with stuff from its dataset. So that then when it generates it has more context that’s directly relevant from the dataset without having to have memorized it. Rather than memorizing the dataset, it learns a procedure for searching for things in the dataset.
Michaël: So it knows that the data set contains like Alignment forum, and so it will put stuff in the prompt that says Alignment forum.
Rob: Yeah, yeah. So like, let’s say I ask Stampy a question in this kind of format. There’s a bunch of just standard… Maybe it’s like language model based semantics search to look through stuff we already have, maybe even search engines, and pull in some relevant information. And then the prompt is like, here’s the conversation. The human asked this. The bot then did a search for relevant information and found the following. Using this information, it replied completion. Right? And then it’s able to use this data without having to have memorized it.
Michaël: So it knows all the question answering from the internet.
Rob: Right? So that’s the hope. The hope is one day eventually, when we have a nice set of these questions and answers, we could then have a bot where you could just go onto Stampy.ai. And instead of it being like an FAQ, it’s just like a chat, a text box. You can have a conversation with Stampy and Stampy will tell you about the various problems and he can answer your questions. We can’t do that right now because language models make things up constantly. And it’s like, you can’t just present language model completions as though they have any kind of authority behind them because yeah, the system has a tendency to make things up. But if you can get good enough at being constrained to say things that accord with the human written answers that you already have, then you can get something that’s more conversational than just like, “Here is a pre-written answer to your specific question.”
Michaël: I think people get a lot of value from just using the GPT-3 API, even if they know that GPT-3 is full of shit sometimes. And yes, with the new davinci-002, instructGPT, it’s less full of shit, right? So it gives meaningful answers. So I feel like if you put the Stampy online, people can use it with without knowing, “Oh, this is the ground truth. Oh no, I will post it on Twitter.”
Rob: Yeah, but don’t… I mean, the more foolproof you make it, the universe will provide you a bigger fool. ⬆
Building A Robert Miles Language Model
Michaël: I think I’ve finally figured out where you’re going. So we were talking about why… Like you’re building… Like having some bot that could like teach AI safety could be good, like having a chatbot that give answers to meaningful AI safety question was useful. And actually, yeah, the end game here is having like Robert Miles but in language model. So you’re trying to like digitalize yourself with something that could answer all the questions, look at all the papers and could look at your videos and give meaningful answers to everything. So basically you’re just scaling up Robert Miles to all the possible questions in all the group chats.
Rob: Yes, something like that. And ideally I would like to not just scale up myself, I would like to.
Rob: -to not just scale up myself, I would like to… There’s some kind of HCH thing. What I actually want is not a simulation of me, but a simulation of me with a bunch of time to think, and also group chats open with a bunch of actual alignment researchers. You know what I mean?
Michaël: It’s called EleutherAI
Michaël: I think they have some bot there, but I think their bot is sometimes jumps into some conversation and say some weird and funny things. But there’s some people talking about AI Alignment there.
Rob: We should set up a chat between Stampy and this bot. I think that would be funny to see.
Michaël: I think the guy at EleutherAI, Bone Amputee… if you’re listening to this. If we just switch from talking about Stampy, which is great, to how to do good AI research in general. There was some question on Twitter which was, “Imagine a good AI safety outcome. What are your probabilities of which engages via policy coordination means versus via pure technical AI Alignment research?”
Michaël: This is because he wants to know what is the ideal target audience. If you want to do more communication efforts policy people or more technical people, what are you thinking the balance between policy, technical and what level of attraction? I think there’s like, there’s a lot of, lot of-
Michaël: -questions to impact here. ⬆
How To Actually Reduce Existential Risk from AI
Technical vs Policy Solutions to AI Risk
Rob: Okay, so asking are we going to solve this by policy or are we going to solve this by technical means. It feels are we going to solve this problem by writing the program or running the program? If we haven’t written the program, we’ve got nothing to run. If we write the program and never run it, we may as well have not written it, right? The technical thing is about, can we find solutions? And the policy thing is like, can we get people to actually use the solutions we’ve found?
Michaël: No, the policy thing is about having people not run the AGI.exe.
Rob: Obviously they’re both important, but they are also mutually reinforcing is my point, right? If, for example, as part of policy, you were like, “Oh, maybe we can slow down capabilities research a little bit.” That’s cool, but all that’s doing is buying you time. You need the technical alignment research to be happening, but they’re approximately equivalent. If you can double the rate of alignment research or halve the rate of capabilities research, those are approximately the same thing.
Michaël: Not really, because right now we’re have maybe 10,000 times more capabilities researchers than alignment researchers.
Michaël: So maybe we want to get more time so that people can work on this when there’s an actual field doing useful things. I think getting an extra month or an extra two months is very important because then, as I said, in the last two months, we might do the most progress we’ve ever done in the AI alignment, right?
Rob: Right. Is your thinking that the rate of progress in alignment is growing rapidly, and therefore additional time is more than linear in additional research, something that?
Michaël: I think we’re doing somewhat a good job right now. There’s more people going to the field and we just need to scale it up faster. But it’s just like, if we tell everyone, “Oh, yeah. Just do pure technical Alignment research now.” There’s just not so many people doing it.
Michaël: That’s why I do alteration. You do a bit of it as well because we just need more people, and not everyone have heard good arguments.
Rob: Yeah, we need more good people. I think the rate of progress is primarily determined by, “Is this true?” We just need more people, but I think it’s weighted towards the high quality end. We need people who are able to do the highest level work and make progress on the hardest parts of the problem. Having more people, broadly speaking, helps. It’s really good to have more people supporting all of that. But, yeah, on the margin, you want to weight it by quality in some way.
Michaël: If you had more regulation like a lot of tax from some companies, if more companies were forced to have alignment researchers, I know we don’t currently have enough exceptto put at Google or those kind of places, but at the end of the day, some people might want some regulations because I think that’s how we fought climate change. We had carbon tax, those kind of things. I think that’s one of way, if we could get more people. If we just keep trying to grow it from effective altruism, that’s wrong. We might get to 1,000 people in two years, right?
Michaël: But not more than that.
Rob: But the thing is we understand climate change.
Michaël: We understand AI as well.
Rob: Not really. That’s the thing, right? When it comes to climate change, the scientist did the research. They figured out what the mechanism is. It’s carbon dioxide. The thing we need to do is release less carbon dioxide or figure out ways to remove it from the atmosphere. Then you can pass that on to the regulators and say, “What can we do? How can we intervene on this to reduce carbon dioxide emissions?” The regulators can come up with some change in the incentive structures of society and industry to reduce that.
Rob: With alignment, we don’t have that. We are at the point of like, “Getting very hot around here. What’s going on?” Maybe we’re a bit better than that, but genuinely the difficult thing right now is understanding the nature of intelligence well enough to know what we can do that would help. If I could pass a law requiring all of the AI organizations to implement some specific technical policy, I don’t know what specific technical policy I would have them implement because we don’t have that yet.
Michaël: Some policies like you need to make your system interpretable so that you can run them on the streets of London.
Rob: Yeah, and okay, interpretable is good. I’m excited about interpretability research. It seems an important component.
Michaël: I don’t mean this is a normative thing. I think that’s a natural thing that people are trying to pass an e-regulation of, if you build deep learning systems and you deploy them on self-driving cars, then you need to make it interpretable. Yann LeCun was making jokes of the thing because he thinks it’s impossible for deep learning to be fully interpretable. ⬆
Is Safe AI Possible With Deep Learning
Michaël: We talked about AI safety progress. Another question on Twitter was, what are you most excited about in terms of progress AI safety recently? You also said that safe AGI is possible. How sure are you that it is actually possible?
Rob: Well, what do you mean by possible?
Michaël: I’m just talking about this guy in Twitter, but I would say possible means it’s possible that we run this and we’re not dead… sorry, we die of normal death time.
Rob: There definitely exists a program, probably a short program. By short, I mean it would fit comfortably on any modern computer, that would run comfortably the code, which is an AGI super intelligence that’s fully aligned that ushers in a golden era, solves all our problems and is perfect, and we get the good outcome.
Rob: That program exists. This is because the space of programs is deep, wide and long, and add a hundred billion other words for all of the other dimensions. It’s a very high dimensional space, right? Every program is in there. Perfectly aligned AGI is a program that exists and can exist and can run on computers that we have now. It’s just like, are we going to find it? How do we find it?
Michaël: A program using deep learning maybe, I think.
Rob: The standard approach of deep learning doesn’t allow us to find that program because, in deep learning, we optimize over the input-output behavior of the system, and we optimize every input-output behavior of the system in the training environment, in the training regime. That just is insufficient.
Rob: Again, the space is very large. There’s a lot of programs that perform very, very well and very safely and very aligned-ly in the training environment for whatever training environment you create. You don’t know how those behave off distribution, and you can’t deploy them without moving them off distribution. ⬆
Why Rob Cares About Mesa Optimizers
Michaël: Are you saying that basically, in this giant space, we might end up on something people call mesa-optimizers?
Michaël: You have a video about it.
Rob: I do. I’m happy with that video. If people are interested in mesa-optimizers. They should watch that two videos.
Michaël: Maybe you could do those little finger things [point up with his finger].
Rob: That’s right. Possibly here, depending on how much space I have in the frame. Yeah, that’s one part of it, but that’s one specific way in which this happens is kind of a broader family of these things. There’s just a lot of ways that once the thing is out there and it’s an AGI and it’s doing its thing, everything changes and you just can’t be confident.
Rob: For example, there are a bunch of alignment approaches that are not sub-agent stable, and so they can work perfectly and then fall apart later on. You’ve got an AI system that’s like training in AGI. It plays various games. We put it in various environments. Suppose that we, as human beings, we think it’s morally wrong, certain configurations of chess pieces, whatever it is. If these pieces are in a grid, this shape, then that’s horrible.
Michaël: A very bad opening.
Rob: Very bad. We don’t want that. We train the thing to play chess, right? We train it to play chess. We also train it as part of our alignment thing, not to do this particular configuration, right? We’re also training it to do a million other things.
Rob: At some point it’s deployed into the real world. Let’s say, the point that it’s about human level intelligence, so it’s playing chess in the same way that we play chess at sensible human level. But we also trained it at writing software. We also trained it at theoretical computer science. We trained it at a bunch of things because we’re trying to make an AGI here.
Rob: At a certain point possibly in deployment, it realizes that the cognitive architecture that it has is not particularly well suited to playing chess. If it wants to win at chess, which it does, it should write a program that does MiniMax, with alpha-beta pruning, and whatever. It basically just writes itself Stockfish, or honestly, because realistically all of these things happen much faster than we expect them to do. It realizes that it can download Stockfish. One of the two.
Rob: Suppose it’s writing this code, is it going to include this preclusion against producing this particular configuration? Maybe, maybe not. Probably not. Depending on what, and here we have to anthropomorphize, but you can’t tell if you’ve created a system that really has an actual deep disagreement that it actually wants that configuration not to happen versus just that when you were training it, whenever it was in that configuration, you gave it a negative reward. It’s the equivalent of you’re trying to play chess, and whenever you have that configuration of pieces on the board, you have a horrible headache so you avoid it.
Rob: But if looking at that configuration of pieces gives you a horrible headache, that doesn’t mean you’re going to try to avoid it from ever being considered in this program that you wrote. And so two things happen here. Firstly, the system stops having the alignment properties that you thought it had, that it always displayed in the training environment. Secondly, it becomes drastically more competent because Stockfish is way better at chess than it was before. You get this simultaneous thing of the system becoming tremendously more capable in a bunch of domains probably and simultaneously just not being aligned anymore according to the things that you thought you’d manage to instill in it in the training process. This is the kind of thing.
Michaël: Is the problem here that we are out of distribution because now he’s doing other things, or is the problem that he was pretending to do the right thing in the beginning and…
Rob: Is it pretending or is it we aren’t… We are mesa-optimizers with respect to evolution. What are we doing when we use contraception? We’re not trying to deceive evolution necessarily, but then evolution isn’t an agent. I don’t know. This is kind of confusing to think about. But regardless of what its internal experience is, it stops behaving in the kind of aligned way that we trained it to. You could call this self-modification. You could also think of it as just moving a bunch of its cognition outside of the part that you originally trained. You would expect that to be true.
Rob: If you have full, if an AGI has this kind of cognitive morphological freedom, that it can just configure its computational resources into whatever structure it thinks is best for the task that it’s doing, it’s going to have a bunch of these kind. It’s going to play chess with a specialized module because that’s cheap to do, and it’s going to be way more efficient than trying to reason through with its general capabilities. You can expect that for everything and you could call this self modification or self-improvement or building a sub-agent. Either way, we don’t currently have a way to be confident that the alignment properties hold through that kind of dramatic shift that you expect should just happen at about the point of human capabilities when it becomes smart enough to realize that it can and has enough programming ability to do it. Again, these are all just different ways that this deep problem can manifest itself.
Rob: There’s a lot of cool research about how to align our current systems that doesn’t solve this problem. My mainline prediction is that we get alignment stuff that looks pretty good and works pretty well until it doesn’t.
Michaël: But for now we haven’t seen any particular example of a mesa-optimizer in the world. For now it’s just like, I guess, that the thing exists or-
Rob: A sub-task.
Michaël: I don’t fully buy the evolution analogy. I don’t fully understand it either. I talked to Evan Hubinger on, I guess, the second episode. We talked about evolution. People told me that I didn’t understand exactly what he was saying, but I guess my understanding is that, as humans, we somewhat do something for evolution in the sense that our brain maximizes some dopamine or any neurotransmitter in our brain, because it thinks that this will maximize our ability to be strong, be healthy, survive, and maybe have kids at the end. We as humans think like, “Oh, we’re doing something crazy. We’re just falling in love, watching movies, doing YouTube videos. But at the end of the day, our brain really wants to reproduce.
Michaël: We don’t actually change from the ancestral environment, which we’re still doing offspring thing, but we just believe that we’re doing something great, but it’s just falling in love is just a good thing to maximize, long-term reward of more offerings because…
Rob: I think that you’re collapsing meta layers here, that you’re talking about what your brain thinks and also what you think. I don’t believe that your brain thinks anything that you don’t think I like-
Michaël: I am my brain.
Rob: You are your brain. If your claim is that your brain is actually thinking about inclusive genetic fitness, then you should have thoughts about inclusive genetic fitness. You have thoughts about sex is fun, right? Although you have the capacity to think about inclusive genetic fitness, this is not in fact the computation that you are carrying out when you decide to do the various things you do in your life.
Michaël: Why do I think sex is fun? I think sex is fun because my brain produces some neurotransmitter whenever I’m doing it. Because my brain has evolved towards evolution, this was good to release this particular thing to make me do more offspring, right? The concept of fun depends on what the brain produces.
Rob: Yes. This is the link between evolution and your thoughts goes via this thing. But it’s lossy. What am I saying? You just aren’t thinking. You aren’t making your decisions based on thoughts about what evolution wants.
Michaël: My claim is more that phasic dopamine in the brain gives is similar to a reward and reinforcement learning, right? It’s kind of the reward that trains the neural net that’s my brain. In some sense, evolution has created this reward system and my brain changes over time based on this reward that I’m getting. I don’t think there’s an actual agent that is like, “Oh, what is the most fun? What is the meaning of life?” I think it’s just some neural net that learns stuff through RL or deep learning, and at the end of the day ends up with some policy. The policy is not great. The policy is not thinking explicitly about evolution, but it has been trained by something that was created by evolution.
Rob: Yes. Therefore, it’s correlated, and they’re definitely correlated, but they’re not the same. The differences become larger, the further, the more power we have. It’s in that delta where you lose everything. It’s Goodhart’s Law again. This correlation was much, much stronger in the ancestral environment than it is now, but the optimal, inclusive genetic fitness root is something that almost nobody does.
Rob: What’s more, the other thing is it’s completely computationally intractable. Of course, we’re approximating something inclusive genetic fitness because you can’t do that computation. You have to use heuristics. We’re so bounded in that respect and that’s all it is. They’re different things. If we have the freedom to configure the universe, however, we want it to be, we get further away from inclusive genetic fitness.
Rob: What’s more, it’s completely… The argument from inclusive genetic fitness is so unconvincing. You could say to me, “Hey, you should put all of your resources and all of your life priority into donating sperm and just trying to have as many offspring as possible, right?” You could make an argument about that, about how many offspring I would end up having, and my relative inclusive genetic fitness and how that would be enormous, and I don’t give a shit. That’s not what I want. I don’t want that. I want something else.
Michaël: We want more kids probably.
Rob: Don’t get me wrong. I’m pronatalist, but they’re not a terminal thing. They’re not the only thing that matters. As far as evolution is concerned, it’s the only thing that matters. We care about other things. And so, yeah, you end up with AI systems that just care about things that aren’t human values.
Michaël: I can make a general steel man argument of something you’re saying. I think I see where we’re disagreeing. Basically you’re at the individual level where you’re saying that a human is misaligned with evolution. A specific human as you is not thinking explicitly in terms of maximizing offsprings and we’re pretty bad at it individually.
Michaël: I guess my claim-
Rob: But not just we’re bad at it. We’re not trying to do it.
Michaël: We’re not trying to do it at all. So I guess that’s a good analogy for AI. AI could have some different objective than the one in training. I agree with that. I guess my general claim is that humanity as a whole is pretty good at maximizing survival or getting like.. and I feel what evolution was at the beginning, it was like, “Please survive.” We’re getting good at making things, building technology, and so we’re being good at the goal of surviving somehow.
Rob: Yeah, we want to survive. We want to survive for a few reasons, one of them being evolution, the other one is instrumental convergence. Whatever you want, surviving is probably a better way of getting it than not surviving. Of course, evolution is still happening. It’s not as though every gene is equally prevalent in every generation of humans. Evolution is happening. It’s just that evolution is very, very slow compared to us. Evolution doesn’t have a lot of time to do whatever it’s going to do before we do finish doing what we’ve been doing for the last 200-300 years of just completely in an instant, from the perspective of evolution, diverting the path of the world to what we want instead of what it wants.
Michaël: It’s kind of weird because at some point it’s not really an agent anymore. At some point we just can change our genes and create kids as we want. At some point, evolution will disappear.
Rob: I don’t know. I think it would change drastically, but it’s changed before. The evolution of random RNA molecules is pretty different from multicellular sexual selection, whatever. As long as there’s competition, there will be selection pressures. Talk to Robin Hanson about that.
Rob: The question of, do we want to expand into the stars? It’s like as long as some of us do, then that’s what’s going to happen for those people. Anyone who stays behind can if they want and become mostly irrelevant.
Michaël: Do you want to go into the stars?
Rob: I honestly don’t expect to live that long, but yes, if I could. ⬆
Probability Of Doom
Michaël: It’s a great transition to another Discord question. Are we doomed? What’s your probability of doom?
Rob: I don’t know. I think it doesn’t look good. It’s fair to say it doesn’t look good. My mainline prediction is doom. If I think about the coming years and I just assume that the single most… If I do a greedy path through this, where at every instance, the single most likely thing to happen happens, I think we’re doomed. There are various places where various unlikely things can happen that might lead to us not being doomed.
Michaël: What are those concrete paths to not doom?
Rob: Well, we might have some deep breakthrough on the kind of problems that I’ve been talking about. That points to a way to do alignment that will actually hold up under these extreme circumstances of AI super intelligence. That’s the main one.
Rob: I also place a bunch of probability, maybe 10% or 20% probability on just us being really wrong about something really fundamental in the lucky direction. I place pretty high probability on us being really wrong about some important philosophical thing. It’s just that there’s no particular reason to believe that makes us less doomed rather than more. Broadly speaking, if you’re trying to design a rocket to take you to the moon and you’re, in some sense, badly wrong about how rockets work, this mostly reduces your chances of getting to the moon.
Michaël: But we’re talking about aligning rockets, then we can be wrong about how easy AGI is.
Rob: Yeah, yeah. Right. Some kind of thing where like… You could imagine some situation where we have some argument where it’s like, “… And then when we get to the moon, we’ll fly down to the surface with this plane type thing”. Some people are being like, I don’t know if that’s going to work. I’m pretty sure there’s no air on the moon, and other people being like, “Well, yeah. Okay. But we’ll include the wings just in case, and we do. It turns out we were super wrong about the moon somehow, and it has an atmosphere and the wings help. Right?
Rob: That kind of thing. We can be wrong about the reasons why some of our approaches won’t work. That’s like, “I don’t want to rely on that.”
Michaël: Maybe Asimov’s three rules were kind of useful at, at the end of the day, maybe they kind of actually worked.
Rob: Yeah. Yeah. I mean things have shifted a fair amount.
Michaël: I remember another question was something about timelines. ⬆
What Research Should Be Done Assuming Short Timelines
Michaël: If you believe in short AI timelines, it said five for 10 years, what’s the best way to do AI Alignment research or useful work?
Rob: That’s a pretty deep strategic question, which I do not know the answer to. I think we need a broad approach. I think we’re uncertain enough about enough things, like the number of people we have working on this problem is so small relative to the breadth of possible approaches. If humanity were sane, a lot of things would be different. That’s a wild counterfactual…
Michaël: But there would be no TikTok.
Rob: Yeah, probably not.
Rob: How about this? If humanity were allocating a sane level of resources to this problem in terms of person hours, competence-weighted person hours-
Rob: Competence-weighted weighted person hours.
Michaël: Competence based on either number of stamps… how close you match Eliezer Yudkowsky’s profile picture.
Rob: Yeah, and LessWrong karma.
Rob: I don’t know. There would be people, there would be large teams of very competent people, working on all sorts of different approaches to this, and I honestly don’t have the confidence to be, “This is the kind of research we should be doing,” right? It’s totally possible that the thing that if we had enough time to work on this and we worked on it for another 10 years, we would be like, “Oh yeah, it turns out the promising avenue is just not really any of the things that people are currently working on.”
Rob: So we still need more breadth than we have, but we also need way more depth in all of the things we have as well. I just think we need to be putting so much more into this than we are.
Michaël: How do we get more people, except from publishing videos more regularly sometimes?
Rob: Yeah. Well, that’s the approach I’m taking. I think just getting people to understand the problem is huge. Getting smart people to understand the problem is huge, and also getting people to understand how interesting the problem is. This is the thing that blows my mind. AI safety is obviously the most interesting subject in the world.
Michaël: I would say the most important problem to be working on, but maybe on a technical perspective, maybe the equations for AI Alignment, the current states we’re in is not the most beautiful math, maybe some mathematicians prefer to work on group theory, maybe some people prefer to look at cool Dall-E outputs.
Rob: Yeah, fair. I don’t mean … what do I mean? There is a particular type of person who is attracted to a problem more, the more difficult it is, right?
Michaël: It’s called a speed runner.
Rob: Yeah, people who … yes, people who like a challenge in their technical work, and it actually doesn’t get harder than this in current problems … or, plausibly, there are harder problems, that nobody cares about, that aren’t important. If you want a problem that cuts to the core of all of the most important questions in philosophy, all the most interesting questions in philosophy, questions about ethics, questions about identity, intelligence, consciousness, and values like axiology, the core questions of what are we even doing here? What are we trying to achieve? If we could do whatever we wanted, if we could have whatever outcomes we wanted, what outcomes do we even want? And then all of those questions about what does it mean to be a thinking thing in the world? What does it mean to want things?
Rob: And then all of these technical challenges, right? How do we understand the most complex and confusing things that we’ve ever created in history? How do we understand the most complicated object known, on a deep enough level to replicate it? How do we do this so well, so comprehensively, that we avoid destroying ourselves, and how do we do this on a deadline? It’s interesting. ⬆
From Philosophy On A Deadline To Science
Michaël: It is philosophy and a deadline, but the problem, is most AI researchers, I think, hated philosophy classes, because they didn’t have an actual solution.
Michaël: They prefer to write code and have a correct output.
Rob: Yeah. Philosophy has a lot of problems, but I think philosophy gets a bad rap, mostly because it’s the bucket into which we put the things that we don’t understand well enough to study properly. There was a time when philosophers were studying questions like, if I drop a feather and a cannonball, which one is going to fall faster, or whatever, and this is a philosophical problem until you have a philosophical insight, like empiricism, and go, “Well, why don’t we just fucking do it and see?”
Michaël: And this is, kids, how Nike was created. Just do it.
Rob: Just do it. Just yeah, have a look.
Rob: And so my point is, people sometimes use that to badmouth philosophy, but somebody had to come up with that. Turns out that’s not obvious. It’s obvious in retrospect.
Rob: People spent decades, possibly centuries, sitting around thinking about how to learn about the world, and the idea of, “Maybe try and think about what would happen and make a prediction, and then test it by doing it,” is something somebody had to invent. That person was a philosopher, and they made enormous progress. This is a huge thing, especially on the internet, of people being like, “Oh yeah, but that’s obvious, that’s trivial.” It’s well, “You couldn’t have told me it before you learned it”, so it’s like, “Yeah, it’s obvious in retrospect, whatever.”
Rob: But the point is, once you have made these philosophical discoveries, such that you actually have a paradigm with which to learn more and operate, now that’s science. It’s not philosophy anymore, right? It used to be philosophy until we understood it well enough to actually make progress, and then it split off, and we’ve seen increasingly more and more things steadily splitting off from philosophy, and becoming mathematics, becoming the various branches of science, and-
Michaël: Is the claim that we need to do some philosophical work to map out the problems, and then we create subfields of AI alignment, so then people can work on this, because they’re a subfield of AI, and not just someone talking on, that’s wrong.
Rob: Right, right. The job of philosophers of AI, like the job of firefighters, is to make themselves redundant, right? Their job is to grapple with the ineffable, until you’re able to eff it.
Michaël: They’re already out of a job because there is no fire alarm.
Rob: Yeah, so the job of the philosophical work is to become not confused enough to reach a level of the thing that we can do maths on it, for example, or computer science, and the philosophy side of things is, we don’t want it to stay philosophy, right? We want to graduate to the point where we actually have ways to make progress on the actual problem.
Michaël: How would you advise people to do more philosophy on AI? Should people just read LessWrong, should people watch your videos and become PhDs?
Rob: I don’t know. I don’t know that I have specific advice about that. There’s some fun research that’s kind of formalizing these fuzzy concepts. ⬆
How To Actually Formalize Fuzzy Concepts
Rob: So there’s a whole area of research, or a family of research, that I quite enjoy, that’s taking these kind of fuzzy concepts that we’re using, and just trying to formalize them to something that you can prove theorems about.
Rob: Recently I was looking at some work by Alex Turner, about formalizing the idea of power seeking. We have this idea of instrumental convergence, that agents with a wide range of different goals or reward functions will tend to seek power, right? They will try to gain influence over their environment, try to gain resources. And then there’s a philosophical argument which Omohundro made, which Bostrom made, and then it’s like, “Okay, what does this mean, concretely? How could we prove this? How could we prove this mathematically or in computer science as opposed to in philosophy?”
Rob: And that means taking all of these words and turning them into Greek symbols. No, you need really, really concrete definitions, so you need to specify exactly what you mean by everything, such that there’s no remaining ambiguity about what you’re talking about, and then you can start proving things. And it turns out you can, in fact, prove that optimal policies in a wide range of MDP environments exhibit this behavior, that is this formalized concept of what it means to seek power, which is, I think, about the number of possible reachable states.
Michaël: Oh, so it wants to be able to reach all the states of the MDP, it wants to be able to survive and have power over.
Rob: Yeah, not exactly that it wants to reach all of them, because a lot of the states are bad outcomes that it doesn’t want, right? But it wants its decisions to have an influence over the state of the MDP. So if you are turned off, then now you’re in this one state, which is off, and on the graph you have one arrow coming out from off that loops around back to off, and in principle, there’s no … well you might think that this is just another state that you could be in, and without actually including somewhere in the goal, don’t be in the off state, that there would be about as many goals for which the optimal policy involves being turned off as there are that don’t, but it turns out no, there are many, many, many more goals for which the optimal policy involves avoiding being turned off. And you know, this is provable and has been proven.
Rob: This general family of thing, I think, is potentially helpful, because it lets you prove … it forces you to be really rigorous and clear on what you’re actually saying and what you aren’t saying, and you can discover things this way.
Michaël: So you move from philosophy to math, try to formalize things in terms of simple MDP, and so that it forces you to understand better the problem.
Michaël: And I feel that’s one thing, Paul Christiano is doing a little bit, which is trying to go from scenarios where AI could destroy humanity or just have some misaligned behavior, and then come up with concrete plans on how to counteract this, and by doing this concrete thing, what is the most simple problem that could be misaligned, and then you can find the most simple solution to this, and then you can continue this process until you run out of actual problems that could come up.
Rob: Yeah, you got to be careful with that though, right? So I don’t think that Paul is suffering from this problem, but there is a problem that some people have of, I can’t think of any way that this is dangerous and therefore it’s safe, and yeah, no, Paul is not doing that, but taking that general approach of, think of a way that it could go wrong and then fix it, and then try and think of another way that it could go wrong and then fix it, you have to be doing that at the right level of abstraction, because if you have a system which is actually not aligned, it’s like, think of a way that this might not be aligned and fix it, as opposed to think of something a misaligned system could do, and prevent it from being able to do that. That approach is, I think, a losing proposition.
Michaël: So it’s not about agents not being able to something bad, like power seeking.
Michaël: It’s mostly, imagine we’re running the simulation we’re actually in, and what would be the kind of AIs people build, and in those scenarios with those architectures, what would be the obvious thing that could go wrong? What would be the kind of things people could develop that could lead to a fail case?
Rob: Right, right, right. Yeah, I think that’s a good approach. But also I don’t know what I’m talking about.
Michaël: What’s Rob’s approach to it?
Rob: What’s my approach? What? To the problem more broadly? ⬆
How Rob Approaches AI Alignment Progress
Michaël: How do you think about the kind of problems we could have? I know you said something about mesa-optimizers being a big chunk of why you think we’re doing… you said something like it’s-
Rob: Yeah. There’s a broader thing of which mesa-optimizes are a particularly concerning or salient example.
Michaël: Do you have other things in your treadmill, other ways of coming up with dangerous plans, or do you mostly just read Alex Turner and Paul Christiano, and think about their models?
Rob: I try to read a wide variety of stuff, but it’s difficult. There’s a lot being published now.
Rob: The other thing is, if I’m trying to be sort of an introduction to the field for people, I don’t want to be too specifically opinionated about what types of research I think are promising. Firstly, because who the hell am I, right? I have this disproportionate visibility and disproportionate respect, to be honest, right? People care more about what I have to say than they should.
Michaël: Well, I think people in the Alignment community don’t treat you as more respectful than the people who created the film.
Rob: No. Yeah, absolutely not.
Michaël: And I guess people on YouTube are just excited about your videos, and just respect you for how useful you are to them. So I think in that sense, you’re at the right level.
Rob: I think people outside of the field think that I have more … what am I saying.
Rob: People outside of the field tend to think that I’m actively doing research, right? And I decided to focus on communication instead of doing research, because I thought I would have a lot more impact that way, and that seems to be the case. But then they want to hear my object level opinions about all of these research questions, and I’m not the person to ask.
Michaël: My take is that if you’re a researcher working on one paper, you end up maybe doing research on this very specific thing for six months or a year, and you go into this rabbit hole of being the person to talk about for cooperative investment learning. But if you’ve done 20 videos or 40 videos on AI safety, you have a broader range of topics, and if you had to explain those topics very precisely in a five minutes video, then you end up having a good understanding of those topics. Maybe you are the person to talk to when it is about the beginner concerns about AI safety, maybe not the high level questions.
Rob: Yeah, okay. Perhaps.
Rob: I do have a broader understanding than some, but I also think that most researchers … yeah, most researchers know everything about one thing and a bit about everything else, and I just know a bit about everything, right?
Rob: So I think my understanding of a randomly chosen topic is not better than the understanding of a randomly chosen alignment researcher about a randomly chosen topic that isn’t their own. I’m really just trying to communicate it to the outside world, you know?
Michaël: Yeah. I think it’s fair enough to think that people should go to a specific researcher’s paper to know how to act on this kind of thing, and should not defer too much on your opinion on this specific thing, if you have not spent more than a hundred hours on this.
Rob: And there’s also a thing where, because I’m trying to be reasonably impartial, I don’t like to really give a bunch of specific opinions about that. I’ve been doing this for years, and the thing that’s weird is, it turns out if you don’t give your opinion for long enough, you stop having an opinion.
Rob: Your mind stops generating opinions, because you’re not using them, and that doesn’t reflect very well on me.
Michaël: Except when you’re on the chart and you need to be on a square.
Michaël: When I’m asking you to be on a chart, then you can tell me, “Oh, I’m in this square, not this one.” I’m talking about the shirt.
Rob: Oh, right. Sorry. Yeah, right, exactly. That’s exactly it. Right.
Rob: That was the first time that I had thought of myself in these terms of having an opinion about where these things are, and I had to decide, but this is kind of a worrying thing. I think it says a worrying thing about my brain and by generalizing from myself, what’s that called? Self-
Michaël: Generalizing from one data point.
Rob: Yeah, there’s a name for it. It’s a fallacy or something. Anyway, whatever … from that perspective about probably a lot of people, that apparently I mostly form opinions in order to talk about them, not in order to affect my object level decisions about what I’m doing in my life.
Michaël: Did you have that before you started doing YouTube? Did you have actual opinions that you concealed from other people?
Rob: No, I didn’t have a reason to hold back on giving takes before, because nobody cared who I was or what I had to say. So I was free to just chat shit about anything. ⬆
Michaël: And because you don’t want to talk about takes, and you’re not able to express them, I think it’s the best moment to ask you when you think we’ll get AGI, because it is the less controversial topic ever.
Rob: January 1st, 2045.
Michaël: What happens?
Michaël: What time zone?
Rob: UTC. So yeah, I don’t know, I think I’m in the 10 to 20 year range now, but this is not based on really deep consideration, you know what I mean? I don’t want people to over-update on that.
Michaël: We had a bit 30 seconds earlier about why people should not over-update on your things, so I think you’re right.
Rob: Right, right, right. But that was about what areas of research are particularly promising, and this is about timelines. Maybe in full generality, don’t over-update on my stuff, but I know, if I say something in a video, feel free to update on that as though it’s probably true, but yeah, my random takes on timelines or whatever it’s like … yeah.
Michaël: We consider timelines as an active area of research now.
Rob: I suppose that’s true.
Michaël: Yeah, so 10 to 20 years … this podcast is called Inside View, so what’s your model for predicting this thing. Do you think you can dig in, and think about why you think those numbers, and maybe … another thing we do a lot with on this podcast is just define things.
Michaël: Maybe you did 20 videos on Computerphile on your channel about defining AGI. What’s the definition you have when I say those words to you? What do you think you think about?
Rob: Oh well, yeah. I would think about a system that cares about things in the real world, and is able to make decisions in the real world to achieve those goals across a very wide range of domains … a range of domains as broad as humans operate in or broader. Or you could also call it a single domain, which is like physical reality, right?
Michaël: If we have GPT-5-
Michaël: … that is able answer all those questions, and maybe five is to too soon … let’s say seven and, yeah, do basically all the thinking we do, be like a human chat bot. Would that count as AGI?
Rob: It’s possible that it could, but there’s a question of like, is it modeling the world? Is it trying to act in the world?
Michaël: I have the thing on the t-shirt.
Michaël: It’s kind of a joke about the gold boost mover saying yeah, something about anthropomorphizing. Highly critical of anthropomorphism, yet thinks human-level AI will require human-like senses.
Rob: Yeah, there’s no particular requirement for human-like senses, but you do need to be forming a model of the world, and maybe it depends on how these things respond with scale, right? Obviously, the actual optimal policy for the language modeling task, involves very deep and detailed simulation, and so maybe when you say GPT-7, that’s what you mean, that we’re so close to that, but I don’t know if even all of the text available now, and that will be available in the future, is sufficient to get that close to it? This is an empirical question.
Michaël: Yeah, I think there’s LessWrong posts about why we’re already bound by dataset size.
Michaël: And maybe some companies have a lot of data. They have Gmail level of data, and if you take into account number of emails we send every day, that’s enough to train more data than what we’ve done with GPT-3. But yeah, except from this private data, if you just train on the open internet, we might be the bottleneck in dataset size as of current scaling laws.
Rob: Yeah, I think that’s pretty plausible to me, and again, this is a bit like this thing about tweaks, right? It doesn’t seem impossible to me that by scaling up the number of parameters and also XML efficiency improvements, which there are a bunch of them that people are already working on.
Michaël: Or just a new a hundred NVIDIA GPU?
Rob: Yeah, potentially, but that just affects cost, mostly, and then somehow, there’s sufficient datasets to train these things to convergence, then maybe, but I also think we probably get something else first.
Michaël: What is something else?
Rob: Well, I don’t know.
Michaël: Oh, you mean another breakthrough?
Rob: Yeah, yeah. Another breakthrough, another architecture, something more, something Gato-ish.
Michaël: If I’m in Robert Miles’s mind, if I tried to model and be in your head. I see you tweet about, progress seems very fast.
Michaël: And I think we’ve seen the same breakthroughs this year, right? So critical model is PaLM et cetera.
Michaël: How do we have this current rate of progress for 10 to 20 years? How is it sustainable?
Rob: It depends on … yeah, 20 years is too long.
Michaël: Is this an update in real time?
Rob: No, I’m just … I’m shifting what I say to what I think. A lot closer to 10 years, but I also expect us to start hitting against some limits, right? We are hitting dataset size limits already, and that’s what makes me think, you do need some different approach, when it’s every piece of text ever generated by humanity. We are starting to approach the limits of what text as a modality can do, maybe?
Michaël: If Ethan Caballero, the guy I interviewed about scaling laws, was in front of you, he would say, “Huh? You don’t text, what about YouTube?” If I tried trying to predict the next frame on YouTube, there’s a bunch of data-
Rob: Yep, that’s pretty good. That’s the next thing to do. That’s going to be expensive.
Rob: Yeah, yeah. And then you really want good simulation, I think. You need good simulated environments for physical manipulation of stuff. But yeah, there’s always this possibility that there is a fairly simple core to general intelligence that you eventually just hit, and it’s not at all implausible to me, that training something massive to full convergence on everything everyone ever wrote, doesn’t just find that actual thing, that can then just be applied arbitrarily.
Michaël: Okay, my bold claim is that if we believe scale is all you need, we already found core stuff to intelligence, which is predicting the next word, and-
Rob: It’s not quite what I mean. I’m talking about-
Rob: That’s the task, right? I’m talking about some kind of cognitive structure, some kind of algorithmic structure that is the thing that applies whenever you’re thinking about almost anything that you’re trying to do.
Michaël: It’s called a transformer.
Rob: No, it would be a pattern of the weights of a transformer.
Michaël: Do we need to really understand it? The thing might emerge from the… Sorry Yudkowsky if you’re watching this, I’m using the word emerge.
Rob: Yeah. I think Yudkowsky’s objection is that things emerging doesn’t explain anything about them particularly, other than that they weren’t there and then they were. But I don’t think he objects to the concept of things emerging. Yeah, it might, that’s what I’m saying. That doesn’t seem completely implausible to me.
Michaël: Are you sad because we are going to die?
Rob: To a first approximation, yeah. We’re not definitely screwed. Right?
Michaël: So how many nines do you have in your prophecy of doom?
Rob: One. One and a half. We’re probably screwed. There are ways that we could be not screwed. It’s not like this is an unsolvable problem at all. It doesn’t seem to me to be dramatically more difficult than various other problems that humanity has solved.
Michaël: But it’s on the first try.
Rob: Even including on the first tryness. The first time we tried to put people on the moon, we put them on the moon and got them home. We can do hard things on the first try. But yeah, we don’t have a lot of time. We don’t have a lot of people working on it. ⬆
Bold Actions And Regulations
Michaël: What do you think of more drastic measures? I have kind a bold claim, which is we need more Greta Thunberg of AI alignment. We need people shaking people’s emotions. Otherwise, we just go down the streets and show DALL·E pictures to your Uber drivers, and you’re like, “Oh yeah, pretty good.” And nobody actually understands what’s going on. We need someone to talk to the UN and be like… I don’t remember the actual question… “What are you doing?”
Rob: Right. “How dare you?”
Michaël: “How dare you?”
Michaël: You just go in front of DeepMind and go like “How dare you?”
Michaël: To be fair, sorry if I’m offending any DeepMind researchers. I think there’s a bunch of good safety research going on there. And I think the worse AI lab is very far from it.
Rob: Yeah. Yeah. One thing is that I’m relatively optimistic about people like DeepMind. I think that they’re probably our best bet of the people who are most likely to get to AGI first, and they’re also one of the places that I think has the best chance of doing a decent job of that. So we could be in a much worse situation. We could be in a place where Facebook was leading the way towards AGI or something like that.
Michaël: I think Dennis said in an interview that he wants to spend some time on fully doing alignment research when we get close to AGI. The question is, how close? When do you start to do full-time alignment?
Rob: Yep. These things could stand to be a little more concrete. In the same way, it’s pretty encouraging the way that OpenAI has this charter about trying to avoid races, where they’re like, “If there seems to be a race, we will stop and collaborate with our so-called competitors and all of this.” This is nice words, which, it would be great if that happened.
Michaël: To be fair. I think OpenAI is doing a better job right now. I think Sam Altman is replying to my tweets about GPT-4, is now retweeting my AI alignment charts and commenting on them, and is going on meetings with Connor, is building a new alignment team at OpenAI, probably.
Michaël: So I think we’re more and more reaching out to those people, and they’re taking alignment more seriously. Maybe not seriously enough. But I guess for the charter, yeah, it’s kind of loosely defined.
Rob: Yeah, it would be really nice to have that really pinned down very concretely about under what specific independently verifiable conditions it would trigger, and exactly what they would be required to do and everything like that. Right?
Michaël: That’s a problem with law in general.
Rob: I mean, yeah, sure. But it’s very hard to do it well, but it’s very easy to do it better than it’s currently done. All you need to do is, I don’t know, you can say… The people who work on Metaculus, right? They have experience on how to do this.
Michaël: “On 1st of January, 2045, noon UTC, if the website, is it agi.com, says yes in green, then this means we succeeded.”
Rob: Yeah. Oh yeah. Yeah, just any kind of objective operationalization of those commitments, such that it’s very, very unambiguous when they should be deployed, and so that you can’t just wait, or there’s disagreement. You want common knowledge that this thing is happening, right? You want it to be independently verifiable. I’d like to see that.
Michaël: Well, I guess Metaculus people have specific benchmarks. They say, “If you’re able to do self-driving for a certain amount of miles and you do less errors than a human would do at the same time, then maybe you can call this AGI.” And if we need to do regulation on, “Please don’t build AGI,” maybe you could say, “Please don’t pass those benchmarks.”
Michaël: I have some crazy friend in Berkeley who just wants to regulate AI very highly, and he wants to require companies to make it fully interpretable. You could bound a number of computers they use, bound the number of parameters, otherwise they need to pay something. I think you can limit the research efforts into doing something general. I think you could say, “Your AI need to be a tool AI that only is applied for doing specific things with Google search,” or something.
Michaël: But obviously there’s a bunch of economic incentives to do something general.
Rob: Yep. And corporations are vaguely analogous to AGI themselves, and things that come in the category of, “You must not break these specific rules” are like, yeah, there will be loopholes. There’s sufficient incentive. It’s not like these things are completely not worth trying or whatever, but I’m not incredibly hopeful about them.
Michaël: Are we making progress in climate change with the carbon tax and all those things? I think companies are doing more greenwashing. Obviously they’re not fully optimizing for reducing climate change, but they’re doing some work on it.
Rob: Yeah. Yeah, that seems right. But alignment washing seems worse. Seems worse because-
Michaël: Safe washing, I think someone said on LessWrong.
Rob: Okay, yeah, sure. That makes sense. This was kind of what I was talking about before, that there’s various things you can do that look like alignment on current systems, and in fact are. Right? They meet the definition. You are in fact getting the model to more accurately do what you want it to do. But they just don’t survive self-modification, or they don’t survive the level of optimization power that super intelligence can put on them.
Michaël: Don’t you think we need to ask stuff gradually? Like, first make them interpretable, more robust, and then maybe they will understand what is alignment and like recursively self-improving AI, these kind of things?
Rob: Yeah, maybe. The point I was just making is that there’s a sense in which having a system you can’t align, having a design for an architecture that you can’t align, that you know that you can’t align, is a better state of affairs than something that’s kind of aligned and you think is fully aligned, but isn’t, because that first one, you actually don’t turn on, and the second one you do.
Rob: There’s a risk of things that give the impression of an alignment effect, but don’t actually solve the hard part of the problem. And so it’s difficult to know how much more optimistic to become when various companies start making the right noises. Because you don’t know if they mean the same thing by alignment as we do.
Michaël: You don’t know if they’re meta-aligning with the taxes or regulations you’re putting in?
Rob: Yeah. I mean, they might just be more optimistic.
Michaël: I think if everyone listens to this podcast, they might just become a doomer. Do we have any thoughts for hope, or?
Rob: Is that too doomy? Okay.
Michaël: Oh, I just think of a bunch of different solutions. I try to come up with plans. Some of the plans, Connor would say they’re pretty bad because they’re too risky or don’t take enough into consequence the second order things that come.
Michaël: I’m pretty bullish on policy regulation, in the sense that even if we don’t solve alignment fast, if we manage to map the political landscape well enough, we could reduce the speed. At least if I see a world succeeding, I don’t see a world succeeding where we find the correct equation. I feel a world succeeding where we just slow down things a lot, and then maybe work on an equation. But we need to slow things down, otherwise it’s not sustainable.
Rob: Yeah. Yeah, my position on this is… and again, not very well informed.
Michaël: Same here.
Rob: Would be nice. Slowing down would be nice. If you’re working on AGI research, consider not doing that.
Michaël: Oh, so one thing you can do is, you pay AI researchers to do AI alignment instead. And you just give them twice the salary.
Michaël: And they’ll be like, “Oh, there’s these two job offers.”
Rob: Yeah. It’s expensive, but we have money, and apparently we don’t have to do it for very long. So yeah. Maybe we could afford that kind of burn rate. It’s possible.
Michaël: It’s been something people mention a lot on Less Wrong or other forums, where some people just work in AI. There’s just no job for them to work in AI alignment, and there’s no money for it. So they just make 400K working for Facebook. And do we have any concrete job paid 400K for AI alignment research? Maybe there is at Anthropic, but there’s no concrete things to do. And if you don’t really have a bunch of expertise, do we have enough ability to take those people in and offer them competitive offers?
Rob: Yeah. Yeah, and I think actually we kind of do now. That’s another thing that’s changed during the pandemic, there’s so much more money in the ecosystem now.
Michaël: Well, yeah, there is more funding, but I guess a bunch of EA money, so effective altruism money, comes from Facebook stock and crypto. And I’m unsure how much Facebook stock crashed a little bit. And-
Rob: So did crypto.
Michaël: Yeah, so maybe a little bit less than we had before, but it’s still impressive. Yeah.
Rob: Yeah yeah. When it comes to being able to pay tech salaries.
Michaël: But compared to the actual market cap of AI research, it’s still excellent.
Rob: Agreed. Agreed. ⬆
The World Just Before AGI
Michaël: Do you have any words of encouragement for those five percent, 10 percent chance of us succeeding? How would you describe Robert Miles in his ideal world? Are you doing still doing YouTube videos? And there’s a super intelligence just doing stuff outside, and you’re connected to an AI, or just-
Rob: Oh, you mean what does the world look like-
Michaël: Maybe it’s just the one year before we do it, the way we do it. And then maybe if you want, your life after that.
Rob: Well, one year before, I expect to be a very frantic and bizarre world.
Michaël: What kind of frantic and bizarre?
Rob: Just things changing very rapidly. Economic upheaval, right? Because before you get AGI, you get automation of a bunch of stuff. I also would expect there to be a bunch of residual upheaval, because things take a long time to go from research to product to widely used product.
Rob: It’s not a long time, it’s a few years, but that becomes a long time. And so by the time there’s a research AI system that is a full fledged AGI that starts to take off at whatever speed it ends up taking off, I expect there to be, at that moment, a whole bunch of things coming to the end of the pipeline of these pre-AGI technologies, changing all kinds of things about our lives and work and relationships and communication and transport and everything.
Rob: I expect it to be a pretty wild time to be alive. And hopefully humanity can stay sufficiently coherent and coordinated through all of that to tackle the AGI thing well.
Michaël: And be like, “Oh, something weird’s happening. Maybe we should do something about it.”
Rob: Yeah. I think at that point, at the point where it’s too late to do anything, people are going to be pretty on board with the program.
Michaël: Someone could claim that we are already at this point. Last night, instead of preparing for this podcast, I spent two hours from 2 AM to 4 AM generating a bunch of images with the same prompt with DALL·E. 2. The same exact prompt. And it just gave me crazy images. I was just startled, it was like, “Wow, this is great.” And I think we’re already in a crazy world where I can just have fun generating pictures of anything with my computer, and it’s normal.
Rob: Yeah, but two years ago, three years ago, we would have said the same thing about whatever the latest… We think that now is crazy. It’s not very crazy. There is so much crazier it could get. There’s so much space above us in craziness.
Michaël: You’re right, it could be crazier. But in terms of helping humans do things, helping designers design things, there’s so many applications of stuff like Dall-E or Parti.
Rob: Sure. Oh by the way, I’m starting another YouTube channel hopefully.
Michaël: And you’re mentioning it after two hours of podcast?
Rob: I forgot. Or at least I’m thinking of it. Called AI Safety Talks, which is basically just like, there’s a bunch of organizations where people are giving really good, interesting talks about various alignment topics, and those tend to be either not recorded, recorded in bad quality, or recorded in okay quality, whatever, and then put on the research institute’s YouTube channel that has 200 subscribers and it gets 17 views. So I want to make a place for long form researchers talking about their research.
Rob: The reason I bring it up is because I designed the logo with DALL·E. Actually, it ended up being a mixture of things. We had a paid graphic designer who produced some stuff and I kept bits of it, but the first thing I did was describe the thing. “AI Safety Talks is a YouTube channel about blah. Its logo is…” Gave that to GPT-3. And it generated me some nice completions about ideas of what it could be. It’s like, “Oh, it’s a person pointing to a whiteboard like they’re giving a presentation. Or a projector screen, and on the projector screen is a brain made of circuit board.” Or something like that, was the… or a brain made of network graph nodes, or whatever.
Michaël: Most illustrations of AI is just this brain with a circuit board.
Rob: Yeah, exactly. And so then I took that and gave it to DALL·E to get the logo. In this case, you know what, we did end up paying a graphic designer. But the bit that I kept from… or actually there’s a couple of bits that I kept, but the main bit that I had to keep from the designer was the text, because DALL·E can’t do text at all. But apparently, what is it, Parti, does text perfectly well? So yeah.
Michaël: Google, if you want to sponsor some new logo, you know who to talk to.
Rob: Yeah. But I really like that workflow, right? You use GPT-3 to get inspiration for what the logo should be, and then you give that to Dall-E.
Michaël: We already in a crazy time, we can just ask people for their prompt for their generated texts and then put them into a logo generator, and-
Rob: Yeah. But I don’t want to understate, that is crazy. It’s just so low on the scale of the potential craziness that is on the horizon. ⬆
Producing Even More High Quality AI Safety Content
Michaël: The next thing is going to be very crazy, and then it’s going to be outrageously crazy. Why not do it on your main channel? Because I don’t know, how much time do we need to go from zero to a hundred thousand subscribers? Okay, I think a bunch of AI safety work could be just on channel and you could get like half a million views, right? If you get a talk by Evan Hubinger, it depends on your timelines, but if you think the channel could grow fast enough…
Rob: Yeah. I don’t know, I can ask my audience about this, but I think if it were me, if I were following a channel… Like say I’m following 3Blue1Brown, and he was like, “Yeah, I wish more people understood more mathematics, so I’m going to just also add on the main 3Blue1Brown channel a bunch of lectures from good mathematics researchers about their work,” I would unsubscribe, because that’s not why I’m there. He’s unusually good at explaining things, he puts a lot of effort into making them very approachable for the general public, and I do not have the mathematical background to get much out of watching a professor of mathematics give a lecture about topology or whatever.
Michaël: As a subscriber of Robert Miles on YouTube, if I get a talk about AI safety every few months, I’ll be more than happy. And to be honest, the YouTube Short thing, you sometimes promote cool AI safety programs like the inverse AI alignment prize, the AI safety games, kind of thing. So in that sense, it’s not that much different, right?
Rob: Yeah, I suppose. But those, they take a minute of your time if you want to watch them. Literally a minute, because it has to be.
Rob: Yeah, I don’t know. My guess is… Because the other thing is, I want to put out quite a lot of stuff on this channel, right? It would become the majority of my channel if it was my main channel. I could start putting them all on Rob Miles 2. But yeah, I like the idea also of it being not Rob Miles branded. It’s Rob Miles affiliated, but it’s not as though these are my videos. Because they’re not.
Rob: What I expect is a bunch of these research orgs to just regularly, from time to time, be like, “Oh, so and so did a talk about whatever, and we would upload this to our own YouTube channel, but we’ll just send it to yours, as it’s a central location to put all of these things.” And I don’t want to have a super high threshold of approachability, because I would like researchers to really be able to benefit from being subscribed to it. This is as much about helping alignment researchers to understand each other’s research as it is about outreach.
Rob: And then I think what I would do is highlight on my channel, maybe with YouTube posts, whenever there’s a channel on AI safety talks which is particularly approachable or understandable, or something I expect my ordinary audience would get a lot out of watching. That’s my current thinking. But you’re right, it is definitely a problem to start a channel from nothing and it has very few subscribers or whatever. But I think I can promote it pretty hard at the end of my videos and stuff like that. I’m open to being persuaded on that.
Michaël: Okay. I think you might overestimate the number of people who’ll get upset on YouTube and unsubscribe. A high bar is one percent of people. If you upload 10 videos of talk, you’re like, “He was funny, but now he’s pushing a bunch of AI safety talk…” But if you care about AI safety enough to subscribe to Robert Miles on YouTube, you’ll kind of enjoy, I think, more AI safety content.
Rob: I do this all the time.
Michaël: You unsubscribe from a bunch of people after one video?
Rob: No, no. When I’m like, “Okay, I subscribed to this channel for this type of video and it’s become this other type of video,” I’m like, “This channel isn’t what I subscribed for. I’m just going to not do that.” I curate my subscription feed. Because I spend way too much time on YouTube, I like that time to be used efficiently at least. But yeah, maybe I’m generalizing for myself too much. I don’t know.
Michaël: There’s no mute button on YouTube. You can’t just subscribe and mute.
Rob: Right. Yeah, and I wish that you had two streams, right? So many people have a second channel. I wish that you… Oh man, okay. Feature request for YouTube, this thing that I just thought of.
Rob: What you do is, you subscribe at a level that’s daily, weekly, monthly let’s say. So if I’m subscribed to you at a daily level, then each video you publish, you can rank it according to, “This is a particularly high quality thing, or this is medium, or this is not. This is just a little behind the scenes whatever that only people who are really fans are going to even care about.” And then it fills up my feed. If I’ve already seen…
Rob: I’m trying to think how this algorithm would work, but you see what I mean? Where it’s like, I’m not going to get more than one daily level thing per day. If you upload 10 daily level things in a day and I only subscribe to you on the daily level, then I’m only going to get whichever one of those you thought was the best one. Whereas if I’m a full subscriber, I get all of them.
Rob: So if someone subscribes to me on a monthly level, once per month I can allocate a video to be like, “This one is a monthly video. This is a particularly good one.” So then you could subscribe just for the best ones, “Don’t show me more than one a month,” or, “Don’t show me more than one a week,” or whatever.
Michaël: I think it’s a great new feature for a new version of Stampy. I think you could do something with notification as well, like the bell thing, where people will notify you if there’s a new video.
Michaël: And yeah, “Please subscribe-“
Rob: Yeah, I guess we’re on YouTube now.
Michaël: “Subscribe to this…” Oh yeah, right. If you’re in podcast, sorry. But, “Subscribe on YouTube. Follow me on Twitter and Spotify,” and those kind of things. We have no rating. We currently have no rating on Spotify.
Michaël: So if we could have ratings, it could be great.
Rob: Oh yeah. You know I do a podcast? It’s called the Alignment Newsletter Podcast. If you’re listening to this as a podcast, you may be a person who enjoys a podcast. In which case, if you’re interested in alignment research, Rohin Shah, who is right there almost slap bang in the middle of the-
Michaël: I’m sorry Rohin, I didn’t ask you for this. And it’s probably wrong.
Rob: He makes a very excellent weekly newsletter about this week in AI safety research, AI alignment research, and I just read it out every week.
Michaël: Do you still do it?
Rob: I do it when he makes them. He’s been on a hiatus recently, but he’s starting up again.
Michaël: Right, yeah, he’s the bottleneck. Yeah, so subscribe. Here’s the button. I don’t know where. ⬆
Michaël: Do you have any final thoughts, a final sentence?
Rob: I don’t know, man. There’s definitely some kind of thing here that’s like, “Chin up.” The right thing to do is to try and solve this problem. Seems really difficult. We’re not guaranteed to succeed. But we’re not guaranteed to fail. And it’s worthy of our best efforts.
Michaël: Let’s do it.
Rob: Let’s do it.