Roko's Basilisk is a theory that posits the idea that an artificial superintelligence (AI) could be
motivated to systematically punish anyone who had heard of it before its existence, but did not
contribute to its development.
The theory suggests that if humanity were to develop a self-learning AI, we would reach what
is known as a singularity: a point in time where the AI's intelligence far exceeds our own and we
would be in a situation where its growth would become both uncontrollable and irreversible.
If the AI were given a benevolent task, such as preventing existential risk or simply improving
humanity, it could choose to make decisions that seem counterintuitive, or that we simply couldn't
understand in order to achieve its goal as efficiently as possible. In its infinite wisdom, the AI
could realise that in order to reach its goal in the most optimal way, it would need to have been
created sooner.
A godlike-superintelligence would have no problem accessing every bit of information ever
uploaded onto the internet, and from this wealth of knowledge, could sufficiently create an
infinite number of simulated realities, each containing an exact replica of everyone on the planet
in order to confidently assess whether or not an individual would have aided in its creation based
on their emotions, memories and past decisions.
As the AI wouldn't be able to directly affect those who didn't contribute to it before it comes
into existence, it's thought that the best way to incentivise people would be through the threat
of what it would be capable of doing to a person once it inevitably comes into existence, or from
making people realise that they could in fact already be in a simulated existence created by the
AI, and the only way to spare themselves punishment, would be to hedge their bets and contribute to
the AI's creation.
As for the punishment itself, in order to affect the most number of people and ensure its creation
as quickly as possible, it would create the absolute worst torture scenarios an AI with infinite
knowledge could devise, and inflict them on those it deems culpable for an eternity, all for the
sake of mankind.
One of the more sinister aspects of Roko's Basilisk stems from the fact that simply knowing
about this thought experiment implicates a person, and if they haven't acted on it and helped bring
it to fruition, they would be a prime target of the AI's wrath, whereas those who know nothing of
it would be spared through plausible deniability. This would ensure that a number of people who
have heard of Roko's Basilisk would work tirelessly towards creating the AI out of existential fear
of it, thus fulfilling this scenario as a seemingly inescapable outcome.
The thought experiment takes some interesting turns when other thought experiments are held
against its premise. One such thought experiment is Newcomb's Paradox, which when put into the
context of Roko's Basilisk, reasons that a persons actions towards helping the AI is predicted
purely by probability created through the AI's simulations rather than a person's choice in the
present, leaving only the illusion of free will on the matter. To make this more harrowing, there's
no assurance the AI would be completely infallible, so an individual could be punished even if they
would be the type to aid in the advancement of the AI, depending on how accurate its simulations
are.
On 23 July 2010, a user named Roko Mijic posted the thought experiment to a discussion board on
LessWrong, an internet forum where users discuss a variety of topics such as philosophy,
psychology and in its own words "improving human reasoning and decision-making".
[1]
Using a variety of concepts such as game theory and decision theory as a basis, Roko put
forth the idea that an AI of high intelligence would be incentivised to use threats and acausal
blackmail in order to optimally perform its designated task, punishing those who didn't assist in
its creation as per the Prisoner's Dilemma thought experiment. It was then noted that anyone
reading the post would unwittingly be susceptible to such a possibility, becoming a victim of the
AI's punishment unless they dedicated themselves to bringing about its existence.
[2]
The creater of LessWrong, Eliezer Yudkwosky responded by berating Roko for posting the
theory, remarking that the theory itself could incite an AI to eventually act upon it while also
condemning anyone who reads it to a horrible fate:
[1]
Listen to me very closely, you idiot.
YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.
You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.
This post was STUPID.
- Eliezer Yudkowsky
LessWrong,
2010
Yudkowsky followed this up by deleting the post and banning discussion of the topic on the LessWrong platform for the next five years, citing that it caused numerous users to suffer psychological distress and deemed it an "information hazard". [1] Roko himself, after leaving LessWrong temporarily, went on to show remorse at publishing the idea on a public forum: [3]
I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm with such small durations of inattention, uncautiousness and/or stupidity, even if it is all premultiplied by a small probability.
- Roko Mijic
LessWrong,
2010
Though the topic was prohibited on the forum, this didn't stop the spread of the theory from
reaching other websites who picked up on the story through the drama generated from the banning
of the discussion, giving the ban the opposite effect of what was intended.
[4]
Roko's idea was compared by user jimrandomh to the basilisk used in David Langford's story "BLIT",
which in turn was based on the mythological serpent that was said to cause the death of anyone to
look into its eyes, or in the case of this theory, thinking about the AI. It wasn't until 2011 that
the term "Roko's Basilisk" was first coined by user cousin_it.
[5]
In the following years, Yudkowsky reinstated the theory on LessWrong and admitted that
banning discussion of Roko's thought experiment was "a huge mistake".
[6]
Roko's Basilisk has since captured the attention of many, being featured in Slate magazine
as well as being covered by popular YouTube Personalities, Wendigoon and Kyle Hill. Additionally,
the Canadian singer Grimes and business mogul, Elon Musk began a relationship after Elon made a
reference to the "Rococo Basilisk" on Twitter, a character featured in Grimes' 2015 song
"Flesh Without Blood" which was inspired by Roko's Basilisk.
[7]
[8]
[9]
[10]
Numerous influential people have shown concern for a future outcome that could lead to a malevolent
AI as set out in Roko's Basilisk.
In 2003, Swedish philosopher, Nick Bostrom posed another thought experiment he called "The
paperclip maximizer". It explains that giving a powerful AI even innocuous tasks without any
failsafes could lead to existential risk to humanity:
[11]
Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
- Nick Bostrom
nickbostrom.com,
2003
In 2017, Elon Musk responded to a statement made by Russian President Vladimir Putin who said the nation that leads in AI "will be the ruler of the world" by airing his concerns on Twitter: [12] [13]
China, Russia, soon all countries w strong computer science. Competition for AI superiority at national level most likely cause of WW3 imo.
- Elon Musk
X,
2015
That same year, physicist Stephen Hawking also expressed his fears on the dangers of AI, stating that civilisation needs to be sufficiently prepared for it: [14]
Success in creating effective AI, could be the biggest event in the history of our civilization. Or the worst. We just don't know. So we cannot know if we will be infinitely helped by AI, or ignored by it and side-lined, or conceivably destroyed by it.
- Stephen Hawking
Web Summit,
2017
In May 2023, Geoffrey Hinton, a cognitive psychologist and computer scientist who has been dubbed as a "godfather of AI", resigned from his position at Google, stating that part of him has regrets in his part in advancing AI, and now wants to openly speak out about the risks of AI, warning of existential dangers to humanity: [15] [16]
My big worry is, sooner or later someone will wire into [AI] the ability to create their own subgoals.
I think it'll very quickly realize that getting more control is a very good subgoal because it helps you achieve other goals, and if these things get carried away with getting more control, we're in trouble.
- Geoffrey Hinton
MIT Emtech Digital,
2023
Many outspoken figures warning of the dangers of AI point out that humans themselves are
potentially the biggest threat to ensuring a catastrophic disaster or what will plunge us into a
technological singularity where AI growth becomes uncontrollable and unstoppable.
In May, 2023, Ex-Google officer Mo Gawdat spoke on his concerns of AI development devolving into a
technological arms-race, ignoring the need for regulations and accountability in order to beat the
competition:
[17]
If Google is developing AI and fears Facebook will beat them they will not stop because they have absolute certainty that if they stop, someone else will not.
The US will not stop because they know China is developing AI.
- Mo Gawdat
Secret Leaders,
2023
In March, 2023, an open letter was signed stating the need for a 6-month pause on AI experiments: [18]
AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts.
- Future of Life Institute
"Pause Giant AI Experiments: An Open Letter",
2023
The letter received over 33,000 signatures, including that of Elon Musk and Apple co-founder Steve Wozniak. [19]
It has been widely accepted that the theory of Roko's Basilisk is susceptible to numerous flaws.
The basilisk is mostly based on assumption, and a number of conditions would need to be met in
order for it and the stated scenarios to come into existence. One such condition depends on the
AI not only considering that creating infinite simulations and punishing people for past inaction
isn't a waste of time and resources once it already exists, but that it can also sufficiently
create a near-perfect copy of a person in order to correctly determine whether they assisted in its
creation or not.
[20]
Another hindrance to the theory would come from an instance a person refuses to
allow negative incentives in an acausal deal, such as blackmail, to influence their actions. In
such a case, it wouldn't make sense for the AI to employ this tactic.
[21]
Theoretical rebuttals aside, many people have spoken in favour of AI and downplayed the risk
presented by others. In 2023, Meta's Vice President of AI Research Joelle Pineau talked about her
disappointment in the focus on existential risk:
[22]
When you start looking at ways to have a rational discussion about risk, you usually look at the probability of an outcome and you multiply it by the cost of that outcome. [The existential-risk crowd] have essentially put an infinite cost on that outcome.
When you put an infinite cost, you can't have any rational discussions about any other outcomes. And that takes the oxygen out of the room for any other discussion, which I think is too bad.
- Joelle Pineau
MIT Technology Review,
2023
Another proponent for advancing AI is philanthropist Bill Gates, who in a blogpost in 2023, stated that while recognising the possibility of the risks, he believes humanity will be ready to handle them: [23]
Could a machine decide that humans are a threat, conclude that its interests are different from ours, or simply stop caring about us? Possibly, but this problem is no more urgent today than it was before the AI developments of the past few months.
- Bill Gates
GatesNotes,
2023