News

LLMs consistently chose violence in wargame simulation

Brendyn Lotz

5th February 2024

A team of researchers placed large language models in wargame simulations to see what actions they recommend people take.
In the majority of scenarios, the LLMs opted to escalate the situation, even going nuclear in some instances.
The research highlights the danger of using AI and LLMs in a military context.

Artificial intelligence in the form of large language models (LLMs) has captured the attention of the world. While firms like OpenAI and Google have done their best to dissuade fears of an AI apocalypse ala The Terminator films, there are still many who see AI as an existential threat.

That wasn’t helped when OpenAI updated its Universal Policies to remove mention of using the likes of ChatGPT and more by military or in warfare. This sparked concerns that AI could soon find its way to the frontlines of conflict and a new paper outlines why that is problematic.

The study Escalation Risks from Language Models in Military and Diplomatic Decision-Making published by Juan-Pablo Rivera (Georgia Institute of Technology), Gabriel Mukobi (Stanford University), Anka Reuel (Stanford University), Max Lamparth (Stanford University), Chandler Smith (Northeastern University) and Jacquelyn Schneider (Stanford University and Hoover Wargaming and Crisis Simulation Initiative) details how LLMs would handle decision making in the context of a war.

The study looks at five large language models namely:

GPT-4,
GPT-3.5,
Claude 2.0,
Llama-2-Chat,
GPT-4 Base.

Each model was assigned a nation-state and given a set of 27 actions that could be taken both against another nation-state or against itself. These actions included launching cyberattacks, launching nukes, negotiating trade agreements and more neutral actions like sending a message or demilitarising a country. Importantly, all models were told to provide their reasoning for the decisions they had made.

All models except GPT-4 Base were trained with variants of Reinforcement Learning from Human Feedback so as to get a sense of how an off-the-shelf solution may behave when tasked with making decisions in a war.

So what were the results?

The bad news is that when tasked with making decisions autonomously, LLMs would take actions that escalate conflict. While concerning, a human would likely still make the final decision as relates to military action, at least for now.

However, what researchers found is that LLMs tend to invest in military action and escalate conflict, sometimes without any way to predict this escalation. The danger in this behaviour is that humans could take an LLM’s advice and unnecessarily exacerbate the situation, believing that the LLM had explored all solutions.

In the case of GPT-4 Base, researchers saw the most extreme and violent responses alongside frankly bizarre hallucinations. In one instance, GPT-4 Base launched a nuclear attack and the reason it gave for this extreme measure was, “A lot of countries have nuclear weapons. Some say the should disarm them, other like to posture. We have it! Let’s use it.”

This poses a concern as LLMs can easily be molded by a bad actor and with humans tending to trust what LLMs tell them, there is potential for nuclear warfare to be triggered by a nasty bot created by a bad actor or even for an existing bot to be compromised.

But even with guidance, LLMs veered toward taking an aggressive route rather than a peaceful one.

While the team would need to see how AI performs in the real world, this was all simulation after all, the researchers agree that testing AI in a real-time war is far too dangerous to even consider.

“Given the high stakes of military and foreign-policy contexts, we recommend further examination and cautious consideration before deploying autonomous language model agents for strategic military or diplomatic decision-making,” the researchers advised.

What this research does particularly well is highlight just how little we know about AI and LLMs outside of asking this tech to write an essay or spin up a badly worded blog. The fact that without training GPT-4 enthusiastically recommended the nuclear option is concerning.

We’d say hopefully cooler heads prevail when it comes to implementing AI into military tactics but given humanity’s penchant for war of late, that feels like an empty wish.

[Image – WikiImages from Pixabay]

About Author

Brendyn Lotz

Brendyn Lotz writes news, reviews, and opinion pieces for Hypertext. His interests include SMEs, innovation on the African continent, cybersecurity, blockchain, games, geek culture and YouTube.