Roko's Basilisk


Summary


Roko's Basilisk is a theory that posits the idea that an artificial superintelligence (AI) could be motivated to systematically punish anyone who had heard of it before its existence, but did not contribute to its development.

The theory suggests that if humanity were to develop a self-learning AI, we would reach what is known as a singularity: a point in time where the AI's intelligence far exceeds our own and we would be in a situation where its growth would become both uncontrollable and irreversible. If the AI were given a benevolent task, such as preventing existential risk or simply improving humanity, it could choose to make decisions that seem counterintuitive, or that we simply couldn't understand in order to achieve its goal as efficiently as possible. In its infinite wisdom, the AI could realise that in order to reach its goal in the most optimal way, it would need to have been created sooner.

A godlike-superintelligence would have no problem accessing every bit of information ever uploaded onto the internet, and from this wealth of knowledge, could sufficiently create an infinite number of simulated realities, each containing an exact replica of everyone on the planet in order to confidently assess whether or not an individual would have aided in its creation based on their emotions, memories and past decisions.

As the AI wouldn't be able to directly affect those who didn't contribute to it before it comes into existence, it's thought that the best way to incentivise people would be through the threat of what it would be capable of doing to a person once it inevitably comes into existence, or from making people realise that they could in fact already be in a simulated existence created by the AI, and the only way to spare themselves punishment, would be to hedge their bets and contribute to the AI's creation.

As for the punishment itself, in order to affect the most number of people and ensure its creation as quickly as possible, it would create the absolute worst torture scenarios an AI with infinite knowledge could devise, and inflict them on those it deems culpable for an eternity, all for the sake of mankind.

One of the more sinister aspects of Roko's Basilisk stems from the fact that simply knowing about this thought experiment implicates a person, and if they haven't acted on it and helped bring it to fruition, they would be a prime target of the AI's wrath, whereas those who know nothing of it would be spared through plausible deniability. This would ensure that a number of people who have heard of Roko's Basilisk would work tirelessly towards creating the AI out of existential fear of it, thus fulfilling this scenario as a seemingly inescapable outcome.

The thought experiment takes some interesting turns when other thought experiments are held against its premise. One such thought experiment is Newcomb's Paradox, which when put into the context of Roko's Basilisk, reasons that a persons actions towards helping the AI is predicted purely by probability created through the AI's simulations rather than a person's choice in the present, leaving only the illusion of free will on the matter. To make this more harrowing, there's no assurance the AI would be completely infallible, so an individual could be punished even if they would be the type to aid in the advancement of the AI, depending on how accurate its simulations are.

History


On 23 July 2010, a user named Roko Mijic posted the thought experiment to a discussion board on LessWrong, an internet forum where users discuss a variety of topics such as philosophy, psychology and in its own words "improving human reasoning and decision-making". [1]

Using a variety of concepts such as game theory and decision theory as a basis, Roko put forth the idea that an AI of high intelligence would be incentivised to use threats and acausal blackmail in order to optimally perform its designated task, punishing those who didn't assist in its creation as per the Prisoner's Dilemma thought experiment. It was then noted that anyone reading the post would unwittingly be susceptible to such a possibility, becoming a victim of the AI's punishment unless they dedicated themselves to bringing about its existence. [2]

The creater of LessWrong, Eliezer Yudkwosky responded by berating Roko for posting the theory, remarking that the theory itself could incite an AI to eventually act upon it while also condemning anyone who reads it to a horrible fate: [1]


Listen to me very closely, you idiot.

YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.

You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.

This post was STUPID.

- Eliezer Yudkowsky, LessWrong, 2010


Yudkowsky followed this up by deleting the post and banning discussion of the topic on the LessWrong platform for the next five years, citing that it caused numerous users to suffer psychological distress and deemed it an "information hazard". [1] Roko himself, after leaving LessWrong temporarily, went on to show remorse at publishing the idea on a public forum: [3]


I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm with such small durations of inattention, uncautiousness and/or stupidity, even if it is all premultiplied by a small probability.

- Roko Mijic, LessWrong, 2010


Though the topic was prohibited on the forum, this didn't stop the spread of the theory from reaching other websites who picked up on the story through the drama generated from the banning of the discussion, giving the ban the opposite effect of what was intended. [4]

Roko's idea was compared by user jimrandomh to the basilisk used in David Langford's story "BLIT", which in turn was based on the mythological serpent that was said to cause the death of anyone to look into its eyes, or in the case of this theory, thinking about the AI. It wasn't until 2011 that the term "Roko's Basilisk" was first coined by user cousin_it. [5]

In the following years, Yudkowsky reinstated the theory on LessWrong and admitted that banning discussion of Roko's thought experiment was a "a huge mistake". [6] Roko's Basilisk has since captured the attention of many, being featured in Slate magazine as well as being covered by popular YouTube Personalities, Wendigoon and Kyle Hill. Additionally, the Canadian singer Grimes and business mogul, Elon Musk began a relationship after Elon made a reference to the "Rococo Basilisk" on Twitter, a character featured in Grimes' 2015 song "Flesh Without Blood" which was inspired by Roko's Basilisk. [7] [8] [9] [10]

Attestation


Numerous influential people have shown concern for a future outcome that could lead to a malevolent AI as set out in Roko's Basilisk.

In 2003, Swedish philosopher, Nick Bostrom posed another thought experiment he called "The paperclip maximizer". It explains that giving a powerful AI even innocuous tasks without any failsafes could lead to existential risk to humanity: [11]


Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.

- Nick Bostrom, nickbostrom.com, 2003


In 2017, Elon Musk responded to a statement made by Russian President Vladimir Putin who said the nation that leads in AI "will be the ruler of the world" by airing his concerns on Twitter: [12] [13]


China, Russia, soon all countries w strong computer science. Competition for AI superiority at national level most likely cause of WW3 imo.

- Elon Musk, X, 2015


That same year, physicist Stephen Hawking also expressed his fears on the dangers of AI, stating that civilisation needs to be sufficiently prepared for it: [14]


Success in creating effective AI, could be the biggest event in the history of our civilization. Or the worst. We just don't know. So we cannot know if we will be infinitely helped by AI, or ignored by it and side-lined, or conceivably destroyed by it.

- Stephen Hawking, Web Summit, 2017


In May 2023, Geoffrey Hinton, a cognitive psychologist and computer scientist who has been dubbed as a "godfather of AI", resigned from his position at Google, stating that part of him has regrets in his part in advancing AI, and now wants to openly speak out about the risks of AI, warning of existential dangers to humanity: [15] [16]


My big worry is, sooner or later someone will wire into [AI] the ability to create their own subgoals.

I think it'll very quickly realize that getting more control is a very good subgoal because it helps you achieve other goals, and if these things get carried away with getting more control, we're in trouble.

- Geoffrey Hinton, MIT Emtech Digital, 2023


Many outspoken figures warning of the dangers of AI point out that humans themselves are potentially the biggest threat to ensuring a catastrophic disaster or what will plunge us into a technological singularity where AI growth becomes uncontrollable and unstoppable.

In May, 2023, Ex-Google officer Mo Gawdat spoke on his concerns of AI development devolving into a technological arms-race, ignoring the need for regulations and accountability in order to beat the competition: [17]


If Google is developing AI and fears Facebook will beat them they will not stop because they have absolute certainty that if they stop, someone else will not.

The US will not stop because they know China is developing AI.

- Mo Gawdat, Secret Leaders, 2023


In March, 2023, an open letter was signed stating the need for a 6-month pause on AI experiments: [18]


AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts.

The letter received over 33,000 signatures, including that of Elon Musk and Apple co-founder Steve Wozniak. [19]

Refutation


It has been widely accepted that the theory of Roko's Basilisk is susceptible to numerous flaws. The basilisk is mostly based on assumption, and a number of conditions would need to be met in order for it and the stated scenarios to come into existence. One such condition depends on the AI not only considering that creating infinite simulations and punishing people for past inaction isn't a waste of time and resources once it already exists, but that it can also sufficiently create a near-perfect copy of a person in order to correctly determine whether they assisted in its creation or not. [20]

Another hindrance to the theory would come from an instance a person refuses to allow negative incentives in an acausal deal, such as blackmail, to influence their actions. In such a case, it wouldn't make sense for the AI to employ this tactic. [21]

Theoretical rebuttals aside, many people have spoken in favour of AI and downplayed the risk presented by others. In 2023, Meta's vice president of AI research, Joelle Pineau talked about her disappointment in the focus on existential risk: [22]


When you start looking at ways to have a rational discussion about risk, you usually look at the probability of an outcome and you multiply it by the cost of that outcome. [The existential-risk crowd] have essentially put an infinite cost on that outcome.

When you put an infinite cost, you can't have any rational discussions about any other outcomes. And that takes the oxygen out of the room for any other discussion, which I think is too bad.

- Joelle Pineau, MIT Technology Review, 2023


Another proponent for advancing AI is philanthropist Bill Gates, who in a blogpost in 2023, stated that while recognising the possibility of the risks, he believes humanity will be ready to handle them: [23]


Could a machine decide that humans are a threat, conclude that its interests are different from ours, or simply stop caring about us? Possibly, but this problem is no more urgent today than it was before the AI developments of the past few months.

- Bill Gates, GatesNotes, 2023

  1. a b c Rob Bensinger and Miranda Dixon-Luinenberg | Roko's Basilisk | Article (2015) - LessWrong
  2. Roko Mijic | Solutions to the Altruist's burden: the Quantum Billionaire Trick | Chatlog (2010) - Neocities
  3. Kaj_Sotala | Best Career Models for Doing Research | Chatlog (2010) - LessWrong
  4. Felecia Freely | Is everything worth hiding worth finding? | Article (2022) - LinkedIn
  5. jimrandoh and cousin_it | BOOK DRAFT: 'Ethics and Superintelligence' (part 1) | Chatlog (2011) - LessWrong
  6. Eliezer Yudkowsky | Some strangely vehement criticism of HPMOR on a reddit thread today | Chatlog (2014) - Reddit Archived
  7. David Auerbach | The Most Terrifying Thought Experiment of All Time | Article (2014) - Slate
  8. Wendigoon | Roko's Basilisk: A Deeper Dive [WARNING: Infohazard] | Video (2021) - YouTube
  9. Kyle Hill | Roko's Basilisk: The Most Terrifying Thought Experiment | Video (2020) - YouTube
  10. Danny Paez | Elon Musk and Grimes: "Rococo Basilisk" Links the Two on Twitter | Article (2018) - Inverse
  11. Nick Bostrom | Ethical Issues in Advanced Artificial Intelligence | Article (2003) - nickbostrom.com
  12. James Vincent | Putin says the nation that leads in AI 'will be the ruler of the world' | Article (2017) - The Verge
  13. Elon Musk | Comment (2017) - X
  14. Web Summit | Stephen Hawking at Web Summit 2017 | Video (2018) - YouTube
  15. Cade Metz | The Godfather of A.I. Has Some Regrets | Article (2023) - The New York Times
  16. Joseph Raczynski | Possible End of Humanity from AI? Geoffrey Hinton at MIT Yechnology Review's EmTech Digital | Video (2023) - YouTube
  17. Dan Murray-Serter | Will AI save us or destroy us, and when? | Podcast (2023) - Secret Leaders
  18. Kari Paul | Letter signed by Elon Musk demanding AI research pause sparks controversy | Article (2023) - The Guardian
  19. Policymaking in the Pause | Pause Giant AI Experiments: An Open Letter | Petition (2023) - Future of Life
  20. Roko's Basilisk | Article (2013) - RationalWiki
  21. Alexander Kruel | Roko's Basilisk: Everything you need to know | Article (2013) - kruel.co Archived
  22. Melissa Heikkilä | Meta's AI leaders want you to know fears over AI existential risk are "ridiculous" | Article (2023) - MIT Technology Review
  23. Bill Gates | The Age of AI has begun | Article (2023) - GatesNotes