Abstract
Since OpenAI’s release of the very large language models Chat-GPT and GPT-4, the potential dangers of AI have garnered widespread public attention. In this essay, the author reviews the threats to democracy posed by the possibility of “rogue AIs,” dangerous and powerful AIs that would execute harmful goals, irrespective of whether the outcomes are intended by humans. To mitigate against the risk that rogue AIs present to democracy and geopolitical stability, the author argues that research into safe and defensive AIs should be conducted by a multilateral, international network of research laboratories.
How should we think about the advent of formidable and even superhuman artificial intelligence (AI) systems? Should we embrace them for their potential to enhance and improve our lives or fear them for their potential to disempower and possibly even drive humanity to extinction? In 2023, these once-marginal questions captured the attention of media, governments, and everyday citizens after OpenAI released ChatGPT and then GPT-4, stirring a whirlwind of controversy and leading the Future of Life Institute to publish an open letter in March.1 That letter, which I cosigned along with numerous experts in the field of AI, called for a temporary halt in the development of even more potent AI systems to allow more time for scrutinizing the risks that they could pose to democracy and humanity, and to establish regulatory measures for ensuring the safe development and deployment of such systems. Two months later, Geoffrey Hinton and I, who together with Yann Le Cun won the 2018 Turing Award for our seminal contributions to deep learning, joined CEOs of AI labs, top scientists, and many others to endorse a succinct declaration: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”2 Le Cun, who works for Meta, publicly disagreed with these statements.
This disagreement reflects a spectrum of views among AI researchers about the potential dangers of advanced AI. What should we make of the diverging opinions? At a minimum, they signal great uncertainty. Given the high stakes, this is reason enough for ramping up research to better understand the possible risks. And while experts and stakeholders often talk about these risks in terms of trajectories, probabilities, and the potential impact on society, we must also consider some of the underlying motivations at play, such as the commercial interests of industry and the psychological challenge for researchers like myself of accepting that their research, historically seen by many as positive for humankind, might actually cause severe social harm.3
There are serious risks that could come from the construction of increasingly powerful AI systems. And progress is speeding up, in part because developing these transformative technologies is lucrative—a veritable gold rush that could amount to quadrillions of dollars (see the calculation in Stuart Russell’s book on pp. 98–99).4 Since deep learning transitioned from a purely academic endeavor to one that also has strong commercial interests about a decade ago, questions have arisen about the ethics and societal implications of AI—in particular, who develops it, for whom and for what purposes, and with what potential consequences? These concerns led to the development in 2017 of the Montreal Declaration for the Responsible Development of AI and the drafting of the Asilomar AI Principles, both of which I was involved with, followed by many more, including the OECD AI Principles (2019) and the UNESCO Recommendation on the Ethics of Artificial Intelligence (2021).5
Modern AI systems are trained to perform tasks in a way that is consistent with observed data. Because those data will often reflect social biases, these systems themselves may discriminate against already marginalized or disempowered groups.6 The awareness of such issues has created subfields of research (for example, AI fairness and AI ethics) as well as the development of machine-learning methods to mitigate such problems. But these issues are far from being resolved as there is little representation of discriminated-against groups among the AI researchers and tech companies developing AI systems and currently no regulatory framework to better protect human rights. Another concern that is highly relevant to democracy is the serious possibility that, in the absence of regulation, power and wealth will concentrate in the hands of a few individuals, companies, and countries due to the growing power of AI tools. Such concentration could come at the expense of workers, consumers, market efficiency, and global safety, and would involve the use of personal data that people freely hand over on the internet without necessarily understanding the implications of doing so.7 In the extreme, a few individuals controlling superhuman AIs would accrue a level of power never before seen in human history, a blatant contradiction with the very principle of democracy and a major threat to it.
The development of and broad access to very large language models such as ChatGPT have raised serious concerns among researchers and society as a whole about the possible social impacts of AI. The question of whether and how AI systems can be controlled, that is, guaranteed to act as intended, has been asked for many years, yet there is still no satisfactory answer—although the AI safety-research community is currently studying proposals.8
Misalignment and Categories of Harm
To understand the dangers associated with AI systems, especially the more powerful ones, it is useful to first explain the concept of misalignment. If Party A relies on Party B to achieve an objective, can Party A concisely express its expectations of Party B? Unfortunately, the answer is generally “no,” given the manifold circumstances in which Party A might want to dictate Party B’s behavior. This situation has been well studied in both economics, in contract theory, where Parties A and B could be corporations, and in the field of AI, where Party A represents a human and Party B, an AI system.9 If Party B is highly competent, it might fulfill Party A’s instructions, adhering strictly to “the letter of the law,” contract, or instructions, but still leave Party A unsatisfied by violating “the spirit of the law” or finding a loophole in the contract. This disparity between Party A’s intention and what Party B is optimizing is referred to as a misalignment.
One way to categorize AI-driven harms is by considering intentionality—whether human operators are intentionally or unintentionally causing harm with AI—and the kind of misalignment involved: 1) AI used intentionally as a powerful and destructive tool—for instance, to exploit markets, generate massive frauds, influence elections through social media, design cyberattacks, or launch bioweapons—illustrating a misalignment between the malicious human operator and society; 2) AI used unintentionally as a harmful tool—for instance, systems that discriminate against women or people of color or systems that inadvertently generate political polarization—demonstrating a misalignment between the human operator and the AI; and 3) loss of control of an AI system—typically when it is given or develops a strong self-preservation goal, possibly creating an existential threat to humanity—which can happen intentionally or not, and illustrates a misalignment between the AI and both the human operator and society. Here, I focus primarily on the first and third categories, particularly on scenarios in which a powerful and dangerous AI attempts to execute harmful goals, irrespective of whether the outcomes are intended by humans. I refer to such AIs as “rogue AIs” and will discuss potential strategies for humanity to defend itself against this possibility.
Protecting Humanity from Rogue AIs
The concern about the possibility of autonomous rogue AIs in coming years or decades is hotly debated—take, for example, two contrasting views recently published in the Economist.10 Even though there is no current consensus on the most extreme risks, such as the possibility of human extinction, the absence of clear evidence against such risks (including those short of extinction) suggests that caution and further study are absolutely required.
Specifically, we should be researching countermeasures against rogue AIs. The motivation for such research arises from two considerations: First, while comprehensive and well-enforced regulation could considerably reduce the risks associated with the emergence of rogue AIs, it is unlikely to offer foolproof protection. Certain jurisdictions may lack the necessary regulation or have weaker rules in place. Individuals or organizations, such as for-profit corporations or military bodies, may skimp on safety due to competing objectives, such as commercial or military competition, which may or may not align well with the public interest. Malicious actors, such as terrorist organizations or organized crime groups, may intentionally disregard safety guidelines, while others may unintentionally fail to follow them due to negligence. Finally, regulation itself is bound to be imperfect or implemented too late, as we still lack clarity on the best methods to evaluate and mitigate the catastrophic risks posed by AI.
The second consideration behind the push for researching countermeasures is that the stakes are so high. Even if we manage to significantly reduce the probability of a rogue AI emerging, the tiniest probability of a major catastrophe—such as a nuclear war, the launch of highly potent bioweapons, or human extinction—is still unacceptable. Beyond regulation, humanity needs a Plan B.
Democracy and human rights. In the context of the possible emergence of rogue AIs, there are three reasons why we should focus specifically on the preservation, and ideally enhancement, of democracy and human rights: While democracy and human rights are intrinsically important, they are also fragile, as evidenced repeatedly throughout history, including cases of democratic states transitioning into authoritarian ones. It is crucial that we remember the essence of democracy—that everyone has a voice—and that this involves the decentralization of power and a system of checks and balances to ensure that decisions reflect and balance the views of diverse citizens and communities. Powerful tools, especially AI, could easily be leveraged by governments to strengthen their hold on power, for instance, through multifaceted surveillance methods such as cameras and online discourse monitoring, as well as control mechanisms such as AI-driven policing and military weapons. Naturally, a decline in democratic principles correlates with a deterioration of human rights. Furthermore, a superhuman AI could give unprecedented power to those who control it, whether individuals, corporations, or governments, threatening democracy and geopolitical stability.
Highly centralized authoritarian regimes are unlikely to make wise and safe decisions due to the absence of the checks and balances inherent in democracies. While dictators might act more swiftly, their firm conviction in their own interpretations and beliefs could lead them to make bad decisions with an unwarranted level of confidence. This behavior is similar to that of machine-learning systems trained by maximum likelihood: They consider only one interpretation of reality when there could be multiple possibilities. Democratic decisionmaking, in contrast, resembles a rational Bayesian decisionmaking process, where all plausible interpretations are considered, weighed, and combined to reach a decision, and is thus similar to machine-learning systems trained using Bayes’s theorem.11 Furthermore, an authoritarian regime is likely to focus primarily on preserving or enhancing its own power instead of thoughtfully anticipating potential harms and risks to its population and humanity at large. These two factors—unreliable decisionmaking and a misalignment with humanity’s well-being—render authoritarian regimes more likely to make unsafe decisions regarding powerful AI systems, thereby increasing the likelihood of catastrophic outcomes when using these systems.
It is worth noting that, with only a few corporations developing frontier AI systems, some proposals for regulating AI could be detrimental to democracy by allowing increasing concentration of power, for example with licensing requirements and restrictions on the open-source distribution of very powerful AI systems. If, to minimize catastrophic outcomes by limiting access, only a few labs are allowed to tinker with the most dangerous AI systems, the individuals or entities that control those labs may wield dangerously excessive power. That could pose a threat to democracy, the efficiency of markets, and geopolitical stability. The mission and governance of such labs are thus crucial elements of the proposal presented here, to make sure that they work for the common good and the preservation and enhancement of democracy.
Safe, defensive AIs to counter rogue AIs. How might humanity defend itself against rogue AIs that surpass human intelligence in many critical ways? Let us imagine armies of AI “trolls” on social media. Individual trolls could learn from the online presence of the people they are aiming to influence and engage in dialogue with those targets to sway their political opinions or entice them to take certain actions. Alternatively, consider cyberattacks involving millions of computer viruses, coordinated in a manner impossible for a human team to devise. Such sophisticated attacks would exceed our usual cybersecurity-defense capabilities, which have been designed to counter human-driven attacks.
And there is rapidly growing concern that bioweapons could be developed with the assistance of question-answering AI systems (as per Dario Amodei’s congressional testimony), or even constructed directly by more autonomous AI systems in the future.12 What types of policies and cybersecurity defenses could be established to avert or counteract such dangers? When confronted with an enemy smarter than humans, it seems logical to employ assistance that also surpasses human intelligence.
Such an approach would require careful implementation, however, as we do not want these AI assistants to transform into rogue AIs, either intentionally because an operator takes unwarranted control of the AI or because the operators lose control of it. It is important to note that we currently lack the knowledge to construct AI systems that are guaranteed to be safe—that is, systems that will not unintentionally become misaligned and adopt goals that conflict with our own. Unfortunately, even slight misalignment can result in the emergence of unintended goals, such as power-seeking and self-preservation, which can be advantageous for accomplishing virtually any other goal. For instance, if an AI acquires a self-preservation goal, it will resist attempts to shut it down, creating immediate conflict and potentially leading to a loss of control if we fail to deactivate it.13 From that point on, it would be akin to having created a new species, one that is potentially smarter than humans. An AI’s self-preservation objective could compel it to replicate itself across various computers, similar to a computer virus, and to seek necessary resources for its preservation, such as electrical power. It is conceivable that this rogue AI might even attempt to control or eliminate humans to ensure its survival, especially if it can command robots.
It is therefore critical that we conduct research into methods that can reduce the possibility of misalignment and prevent loss of control. Spurred by scientific breakthroughs and the anticipated benefits of more powerful AI systems, both the scientific community and industry are making rapid progress developing more powerful AI systems—making research on countermeasures a matter of urgency. The risk of dangerous power concentration (either at the command of a human or not) escalates with the expanding abilities of these systems. As they inch closer to surpassing human capabilities, the potential for significant harm grows.
A Multilateral Network of Research Labs
How should we conduct research on safe and aligned defensive AIs, and with what kind of governance frameworks? To begin with, it is essential that we keep defensive methodologies confidential. We should not publicly share many aspects of this research or publish it in the usual academic fashion. Doing so would make it much easier for bad actors or a rogue AI (with access to everything on the internet) to design attacks that circumvent new defenses. Should this research foster advances in AI capabilities, it is crucial that we not disclose those advances to the world. Our aim is for defensive AIs to surpass the intelligence of rogue AIs, which are likely to have been designed based on the most recently published cutting-edge research. This approach to confidentiality mirrors national-security or military-research strategies in many countries, with the key difference being that the anticipated conflict is between humanity and rogue AIs.
It is also critical that research on defensive AIs not be conducted in isolation in just one or a handful of countries. Instead, it should be coordinated by and carried out in many countries. Why? First, because the deployment of proposed defenses may necessitate the cooperation of multiple governments, as computer viruses, like biological viruses, respect no borders. Additionally, as discussed above, avoiding the concentration of power is critical, as it poses a threat to democracy and geopolitical stability. Moreover, if a single democratic country controls the most advanced AI systems, and a significant and unusual political event cripples democracy in that country, humanity as a whole could be in danger. Concentration of power in the hands of a few for-profit companies or AI operators is also problematic and could arise because power or the appeal of power tends to corrupt, the more so when that power is vast. The formation of a cooperative group of democracies working together on the design of well-governed, safe, defensive AI would offer several benefits:
It would dilute power concentration and provide a safeguard against political downturns. For instance, if one of these democracies were to fall or inadvertently create a rogue AI, the remaining countries in the group would have comparable AI power to maintain a balance.
Research thrives when diverse approaches are independently pursued, with each research group sharing its progress with others. Even if unable to share with the entire scientific community, these cooperative labs could share their work among themselves, thereby accelerating their collective progress. That progress could include advances in safety methodologies, AI capabilities, and the AI-driven countermeasures themselves. The existence of multiple labs naturally fosters a healthy “coopetition,” a balanced dynamic of slight competitive pressure and collaboration that enables building on each other’s progress, which is a crucial element of the efficiency of academic research. The initial advancements that enabled deep learning were made primarily in just a few labs.
It reduces risks by avoiding single points of failure: If one of the labs intentionally or unintentionally gives rise to a rogue AI and if the labs have been sharing their progress, the rogue AI will face several good AIs of at least equal capability, rather than become the single dominant power on the planet if the lab that produced it had been substantially ahead of the others.
The research labs undertaking this work would be independent nonprofit organizations, although they should be funded in large part by governments. Being independent helps to avoid the possibility of a single point of failure, which could happen if all the labs are under a single strong-handed authority. The labs should focus on a single mission: the safe defense of humanity against eventual rogue AIs. Other factors, such as commercial pressure, profit maximization, or national interests in economic or military dominance could create a misalignment, resulting in a conflict of interest with potentially serious consequences. In particular, pressure from either commercial or national interests could lead to an AI arms race that could relegate safety to a lower priority than necessary. The rules governing these labs should forbid them from using AI technology to achieve military or economic dominance, which would stimulate an AI arms race. Keeping the labs independent will help to mitigate concerns about power concentration that could negatively affect international geopolitical security and the global economy. A clear mission and governance mechanisms with multilateral checks and balances are needed to focus these labs on humanity’s well-being and to counter the possibility of power concentrating in the hands of a few.
If, as a byproduct of their work on countermeasures, these proposed labs discovered AI advances that had safe and beneficial applications—for example, in medicine or combating climate change—the capabilities to develop and deploy those applications should be shared with academia or industry labs so that humanity as a whole would reap the benefits.
And, while these labs should be independent from any government, they should be mostly publicly funded and would of course collaborate closely with the national-security sectors and AI-dedicated agencies of the coalition member countries to deploy the safety methodologies that are developed. In light of all this, an appropriate governance structure for these labs must be put in place to avoid capture by commercial or national interests and to maintain focus on protecting democracy and humanity.
Why Nonprofit and Nongovernmental?
Although a for-profit organization could probably raise funding faster and from multiple competing sources, those advantages would come at the cost of a conflict of interest between commercial objectives and the mission of safely defending humanity against rogue AIs: Investors expect rising revenues and would push to expand their market share—that is, to achieve economic dominance. Inherent in this scenario is competition with other labs developing frontier AI systems, and thus not sharing the important discoveries for advancing AI capabilities. This would slow the collective scientific progress on countermeasures and could create a single point of failure if one of the labs makes an important discovery that it does not share with others. The survival of not just democracy but humanity itself depends on avoiding the concentration of AI power and a single point of failure, and is thus at odds with commercial objectives. The misalignment between defending against rogue AIs and achieving commercial success could make a for-profit organization sacrifice safety to some extent. For example, an AI that cannot act autonomously is safer but would have far fewer commercial applications (such as chatbots and robots), but market dominance requires pursuing as many revenue sources as efficiently possible.
On the other hand, government funding may also come with strings attached that contradict the main mission: Governments could seek to exploit advances in AI to achieve military, economic (against other countries), or political (against internal political opponents) dominance. This, too, would contradict the objective of minimizing power concentration and the possibility of a single point of failure. Government funding thus needs to be negotiated in a multilateral context among democratic nations so that the participating labs can, by common accord, rule out power-concentration objectives, and the governance mechanisms in place can enforce those decisions.
Consequently, a multilateral umbrella organization is essential for coordinating these labs across member countries, potentially pooling computing resources and a portion of the funding, and for setting the governance and evolving AI-safety standards across all the labs in the network. The umbrella organization should also coordinate with a globally representative body that sets standards and safety protocols for all countries, including nondemocratic ones and those not participating in these countermeasure-research efforts. Indeed, it is quite likely that some of the safety methodology discovered by the participating labs should be shared with all countries and deployed across the world.
Linking this umbrella organization with a global institution such as the United Nations will also help to keep power from concentrating in the hands of a few rich countries at the expense of the global South.14 Reminiscent of the collective fear of nuclear Armageddon after World War II, which provided the impetus for nuclear-arms-control negotiations, the shared concern about the possible risks posed by rogue AIs should encourage all countries to work together to protect our collective future.
As I explained in my testimony to the U.S. Senate in July 2023, many AI researchers, including all three winners of the 2018 Turing Award, now believe that the emergence of AI with superhuman capabilities will come far sooner than previously thought.15 Instead of taking decades or even centuries, we now expect to see superhuman AI within the span of a few years to a couple of decades. But the world is not prepared for this to happen within the next few years—in terms of either regulatory readiness or scientific comprehension of AI safety. Thus we must immediately invest in and commit to implementing all three of my recommendations: regulation, research on safety, and research on countermeasures. And we must do these carefully, especially the countermeasures research, so as to preserve and protect democracy and human rights while defending humanity against catastrophic outcomes.
NOTES
The author would like to thank Valérie Pisano, president and CEO of Mila–Quebec Artificial Intelligence Institute, for her assistance in revising and substantially improving this essay.
Bostan …