Roko's Basilisk is a theory that posits the idea that an artificial superintelligence (AI) could be
motivated to systematically punish anyone who had heard of it before its existence, but did not
contribute to its development.
The theory suggests that if humanity were to develop a self-learning AI, we would reach what
is known as a singularity: a point in time where the AI's intelligence far exceeds our own and we
would be in a situation where its growth would become both uncontrollable and irreversible.
If the AI were given a benevolent task, such as preventing existential risk or simply
improving humanity, it could choose to make decisions that seem counterintuitive, or that we
simply couldn't
understand in order to achieve its goal as efficiently as
possible.
In its infinite wisdom, the AI could realise that in order to reach
its goal in the most optimal way, it would
need to have been created sooner.
A godlike-superintelligence would have no problem accessing every bit of information ever
uploaded
onto the internet, and from this wealth of knowledge, could sufficiently create an infinite number
of simulated
realities, each
containing an exact replica of everyone on the planet in order to confidently assess whether or
not an individual
would have aided in its creation based on their emotions, memories
and past decisions.
As the AI wouldn't be able to directly affect those who didn't contribute to it before it comes
into existence,
it's thought that the best way to incentivise people would be through the threat
of what it
would be capable of doing to a person once it inevitably comes into existence, or from making
people
realise that they could in fact already be in a simulated existence created by the AI, and the only
way
to
spare themselves punishment, would be to hedge their bets and contribute to the AI's creation.
As for the punishment itself, in order to affect the most number of people and ensure its creation
as quickly as possible, it would create the absolute worst torture scenarios an AI with infinite
knowledge could devise, and inflict them on those it deems culpable for an eternity, all for the
sake
of mankind.
One of the more sinister aspects of Roko's Basilisk stems from the fact that simply knowing
about this thought experiment
implicates a person, and if they haven't
acted on it and helped bring it to fruition, they would be a prime target of the AI's wrath,
whereas those who know nothing of it would be spared through plausible
deniability.
This would ensure that a number of people who have heard of Roko's Basilisk would work
tirelessly
towards creating the AI out of existential fear of it, thus fulfilling this scenario as a seemingly
inescapable outcome.
The thought experiment takes some interesting turns when other thought experiments are held
against its premise. One such thought experiment is Newcomb's Paradox, which when put into the
context
of
Roko's Basilisk, reasons
that a persons actions towards
helping the AI is predicted purely by probability created through the AI's simulations rather than
a person's
choice in the present, leaving only the illusion of free will on the matter. To make this more
harrowing, there's no assurance the AI would be completely infallible, so an individual could be
punished even if they would be the type to aid in the advancement of the AI, depending on how
accurate its simulations are.
On 23 July 2010, a user named Roko Mijic posted the thought experiment to a discussion board on
LessWrong, an internet forum where users discuss a variety of topics such as philosophy,
psychology and in its own words "improving human reasoning and decision-making".
[1]
Using a variety of concepts such as game theory and decision theory as a basis, Roko put
forth the idea that an AI of high intelligence would be incentivised to use threats and acausal
blackmail
in order to optimally perform its designated task, punishing those who didn't assist in its
creation as per the Prisoner's Dilemma thought experiment. It was then noted
that anyone reading the post would unwittingly be susceptible to such a possibility, becoming a
victim of the AI's punishment unless they dedicated themselves to bringing about its existence.
[2]
The creater of LessWrong, Eliezer Yudkwosky responded by berating Roko for posting the
theory,
remarking that the theory itself could incite an AI to eventually act upon it while also condemning
anyone
who reads it to a horrible fate:
[1]
Listen to me very closely, you idiot.
YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.
You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.
This post was STUPID.
- Eliezer Yudkowsky, LessWrong, 2010
Yudkowsky followed this up by deleting the post and banning discussion of the topic on the LessWrong platform for the next five years, citing that it caused numerous users to suffer psychological distress and deemed it an "information hazard". [1] Roko himself, after leaving LessWrong temporarily, went on to show remorse at publishing the idea on a public forum: [3]
I wish very strongly that my mind had never come across the tools to inflict such large amounts of potential self-harm with such small durations of inattention, uncautiousness and/or stupidity, even if it is all premultiplied by a small probability.
- Roko Mijic, LessWrong, 2010
Though the topic was prohibited on the forum, this didn't stop the spread of the theory from
reaching other websites who picked up on the story through the drama generated from the banning
of the discussion, giving the ban the opposite effect of what was intended.
[4]
Roko's idea was compared by user jimrandomh to the basilisk used in David Langford's story "BLIT",
which in turn was based on the
mythological serpent that was said to
cause the
death of anyone to look into its eyes, or in the case of this theory, thinking about the AI. It
wasn't
until 2011 that the term "Roko's Basilisk" was first coined by user cousin_it.
[5]
In the following years, Yudkowsky reinstated the theory on LessWrong and admitted that
banning discussion of Roko's thought experiment was a "a huge mistake".
[6]
Roko's Basilisk has since captured the attention of many, being featured in Slate magazine
as well as being covered by popular YouTube Personalities, Wendigoon and Kyle Hill. Additionally,
the Canadian singer Grimes and business mogul, Elon Musk began a relationship after Elon made a
reference to the "Rococo Basilisk" on Twitter, a character featured in Grimes' 2015 song
"Flesh
Without Blood" which was inspired by Roko's Basilisk.
[7]
[8]
[9]
[10]
Numerous influential people have shown concern for a future outcome that could lead to a malevolent
AI as set out in Roko's Basilisk.
In 2003, Swedish philosopher, Nick Bostrom posed another thought experiment he called "The
paperclip maximizer". It explains that giving a powerful AI even innocuous tasks without any
failsafes
could lead to existential risk to humanity:
[11]
Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.
- Nick Bostrom, nickbostrom.com, 2003
In 2017, Elon Musk responded to a statement made by Russian President Vladimir Putin who said the nation that leads in AI "will be the ruler of the world" by airing his concerns on Twitter: [12] [13]
China, Russia, soon all countries w strong computer science. Competition for AI superiority at national level most likely cause of WW3 imo.
- Elon Musk, X, 2015
That same year, physicist Stephen Hawking also expressed his fears on the dangers of AI, stating that civilisation needs to be sufficiently prepared for it: [14]
Success in creating effective AI, could be the biggest event in the history of our civilization. Or the worst. We just don't know. So we cannot know if we will be infinitely helped by AI, or ignored by it and side-lined, or conceivably destroyed by it.
- Stephen Hawking, Web Summit, 2017
In May 2023, Geoffrey Hinton, a cognitive psychologist and computer scientist who has been dubbed as a "godfather of AI", resigned from his position at Google, stating that part of him has regrets in his part in advancing AI, and now wants to openly speak out about the risks of AI, warning of existential dangers to humanity: [15] [16]
My big worry is, sooner or later someone will wire into [AI] the ability to create their own subgoals.
I think it'll very quickly realize that getting more control is a very good subgoal because it helps you achieve other goals, and if these things get carried away with getting more control, we're in trouble.
- Geoffrey Hinton, MIT Emtech Digital, 2023
Many outspoken figures warning of the dangers of AI point out that humans themselves are
potentially
the biggest threat to ensuring a catastrophic disaster or what will plunge us into a technological
singularity
where AI growth becomes uncontrollable and unstoppable.
In May, 2023, Ex-Google officer Mo Gawdat spoke on his concerns of AI development
devolving into a
technological arms-race, ignoring the need for regulations and accountability in order to beat the
competition:
[17]
If Google is developing AI and fears Facebook will beat them they will not stop because they have absolute certainty that if they stop, someone else will not.
The US will not stop because they know China is developing AI.
- Mo Gawdat, Secret Leaders, 2023
In March, 2023, an open letter was signed stating the need for a 6-month pause on AI experiments: [18]
AI labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts.
The letter received over 33,000 signatures, including that of Elon Musk and Apple co-founder Steve Wozniak. [19]
It has been widely accepted that the theory of Roko's Basilisk is susceptible to numerous flaws.
The basilisk is mostly based on assumption, and a number of conditions would need to be met in
order for it and the stated scenarios to come into existence.
One such condition depends on the
AI not only considering
that creating infinite simulations and punishing people for past inaction isn't a waste of time
and resources once it already exists, but that it can also sufficiently create a near-perfect copy
of a person in
order to correctly determine whether they assisted in its creation or not.
[20]
Another hindrance to the theory would come from an instance a person refuses to
allow negative incentives in
an acausal deal,
such as blackmail, to influence their actions. In such a case, it wouldn't make sense for the AI to
employ
this tactic.
[21]
Theoretical rebuttals aside, many people have spoken in favour of AI and downplayed the risk
presented by others.
In 2023, Meta's vice president of AI research, Joelle Pineau talked about her disappointment in the
focus on
existential risk:
[22]
When you start looking at ways to have a rational discussion about risk, you usually look at the probability of an outcome and you multiply it by the cost of that outcome. [The existential-risk crowd] have essentially put an infinite cost on that outcome.
When you put an infinite cost, you can't have any rational discussions about any other outcomes. And that takes the oxygen out of the room for any other discussion, which I think is too bad.
- Joelle Pineau, MIT Technology Review, 2023
Another proponent for advancing AI is philanthropist Bill Gates, who in a blogpost in 2023, stated that while recognising the possibility of the risks, he believes humanity will be ready to handle them: [23]
Could a machine decide that humans are a threat, conclude that its interests are different from ours, or simply stop caring about us? Possibly, but this problem is no more urgent today than it was before the AI developments of the past few months.
- Bill Gates, GatesNotes, 2023