Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Radical value disagreements (RVDs) are increasingly shaping contemporary societies, underpinning profound and enduring conflicts over core moral and political principles that are not easy to resolve. Climate change debates provide a clear case in point, where fundamental value differences are at stake: one side may regard it as an existential crisis demanding urgent collective action, while the other may prioritise individual freedom and economic growth over interventions such as ‘net zero’ initiatives. Unlike standard policy debates, RVDs reflect incompatible worldviews and are often highly emotionally charged and polarising.  

In recent years, social media platforms have intensified these kinds of disagreements. Their algorithms prioritise content that drives engagement, often by pushing polarising and emotionally provocative posts. This creates echo chambers, fuels outrage, and amplifies the most extreme voices, making it harder for people with opposing values to even understand, let alone engage with, one another.[1] As a result, democratic discourse is increasingly fragmented, and collective decision-making — on issues from climate policy to human rights — becomes more difficult.

Meanwhile, modern AI tools such as large language models (LLMs) are beginning to shape the ways in which people engage with controversial topics online. LLMs are increasingly being used to generate persuasive arguments, moderate content, and even produce political messaging and imagery. These systems can reinforce bias and misinformation, but they also hold potential to support more constructive dialogue by identifying common ground between opposing parties, clarifying divergent viewpoints, or helping people reflect on their values.

Imagine, for instance, a digital town hall where thousands of citizens debate a contentious issue, such as whether we should heavily tax meat products for moral reasons. Rather than requiring individuals to sift through every comment to discern patterns of consensus, an LLM could process the entire discussion, producing a concise summary of the “collective will” and map areas of agreement and disagreement. It might also act as a moderator, flagging hate speech, enforcing respectful dialogue, or even suggesting less confrontational ways of phrasing responses to reduce hostility and polarisation. These possibilities raise a critical question: can LLMs be leveraged to support more productive deliberation in cases of radical value disagreement in digital spaces, or are they more likely to entrench and amplify existing divisions?

In what follows, we explore how LLMs might enhance online deliberation around RVDs, followed by a discussion of the ethical risks and limitations involved.

Potential of LLMs in RVD deliberation

LLMs could offer powerful tools for structuring RVD deliberation online by processing and synthesising large volumes of complex discourse. In contrast to human cognition, LLMs generate text by predicting likely word sequences based on patterns learned from enormous training datasets, typically consisting of hundreds of billions of tokens across diverse sources. Yet precisely because they are not bound by human cognitive limits, LLMs can absorb and analyse thousands of perspectives in real time, distilling sprawling discussions into coherent summaries and highlighting key points of agreement and disagreement with a scale and speed that far surpass human capabilities.

Moreover, with careful prompt engineering, LLMs can be directed to highlight recurring themes, surface underlying agreements, and reframe discussions in ways that reduce perceived polarisation. Researchers and developers have begun exploring their potential as consensus-building tools capable of mapping and mediating contentious debates. Below, we examine key areas where LLMs could be particularly useful in structuring deliberation on RVDs.

Moderation

LLMs can be fine-tuned to mediate discussions by promoting respectful engagement and discouraging inflammatory or hateful speech. This is particularly important for open deliberation platforms, such as social media, and for digital spaces explicitly designed to host contentious debates, including those centred on RVDs. LLMs offer an automated alternative, capable of maintaining constructive discourse at scale.

However, LLMs often struggle to reliably detect harmful content, sometimes missing hate speech, while at other times over-censoring legitimate but emotionally charged viewpoints.[2] This is especially problematic in the context of RVDs, where the line between offensive and strongly held belief can be blurry. As a result, some have proposed a hybrid model, using LLMs to flag potentially harmful posts and route them to human moderators, streamlining the process while preserving human oversight.[3] Nonetheless, a fully-automated approach offers a promising step toward scalable automated moderation, which could significantly improve the management of online deliberation of controversial topics.

Summarisation

Another relevant function of LLMs in deliberation is their ability to summarise large, complex discussions. Recent research suggests that summarisation can help reduce divisions by identifying shared goals and clarifying points of agreement in polarised debates. Notably, a series of studies by DeepMind[4] investigated the use of LLM-generated opinion summaries in group deliberation on political topics (e.g. ‘should the UK lower the voting age to 16?’). AI-generated summaries were ranked by participants as clearer, more informative, and fairer than those written by human moderators, with groups also reporting reduced internal division after engaging with the AI-mediated statements. This points to the potential of LLM-generated summaries to support more constructive dialogue in polarised settings, helping participants find common ground without requiring real-time confrontation.

While these findings highlight a promising application of LLMs, important caveats remain. It is unclear whether the reported improvements reflect genuine deliberative progress — such as greater willingness to engage respectfully, reconsider views, or concede points — or merely passive agreement with a well-crafted summary. This concern is heightened by the fact that participants did not engage in real-time deliberation, meaning the summaries were not tested in the more dynamic and emotionally charged conditions typical of RVDs. In such contexts, AI-generated summaries may create the illusion of consensus without fostering meaningful engagement or reducing real-world polarisation. This highlights the need for further research into how AI summarisation can lead to deeper outcomes, including increased mutual understanding, reduced polarisation, and more respectful forms of disagreement.

Reframing & Rephrasing

Lastly, another useful application of LLMs in RVD deliberation is their ability to rephrase emotionally charged statements in more constructive terms. Since RVDs often trigger strong reactions, this function can help prevent discussions from escalating into unproductive conflict. Recent research has tested AI-driven interventions in political discourse, using LLMs to suggest tonal adjustments or encourage clarifying questions. These interventions led to higher-quality conversations, with participants more willing to express their views, recognise opposing perspectives, and engage in less polarised ways. They also helped surface policy positions that might otherwise remain unexplored.[5]

While this is a promising capability, it raises concerns about individual agency. If LLMs are rephrasing user input, participants may no longer feel they are fully expressing their own views but instead communicating through a filter. This could dilute the authenticity of their contributions and reduce their sense of ownership in the discussion. Still, one might argue that this trade-off is worthwhile if it helps prevent conversations from becoming extreme or unproductive. Either way, it reflects a powerful intervention with the potential to shape more constructive deliberation in the context of RVDs.

While LLMs have other potential applications — such as prompting clarification, fact-checking, or introducing missing perspectives — we have focused on three core functions: moderation, summarisation, and rephrasing. Each show how LLMs might improve deliberation by structuring discussion, reducing polarisation, and increasing engagement. But these same capabilities raise serious ethical concerns. LLMs can subtly shape discourse, reinforce bias, and influence opinion in opaque ways. As black-box systems, their decisions often lack transparency, raising questions about trust and legitimacy. We now turn to these risks and the conditions for using LLMs responsibly in RVD deliberation.

Ethical considerations

LLM accuracy

A core ethical concern is whether LLMs are reliable enough to support deliberation on sensitive, high-stakes issues like RVDs. Although these models show promise in mediating debates through summarisation, rephrasing, and moderation, they remain prone to serious errors. One key limitation is their tendency to “hallucinate”, where LLMs generate factually incorrect or fabricated information, often with unwarranted confidence and authority.  In the context of RVDs, where debates are already polarised and fragile, such errors risk deepening existing divides. Furthermore, whilst recent advances in techniques have reduced the likelihood of hallucinations— such as Retrieval-Augmented Generation (RAG), which supplements the model’s internal knowledge by retrieving and incorporating relevant external documents into its response—this remains a persistent issue across many LLMs.

Despite this, perhaps LLMs still offer a more viable option; humans frequently struggle to accurately represent others’ views, often misinterpreting or oversimplifying them, which may further exacerbate divides in RVD deliberation. Therefore, while LLMs are indeed prone to hallucinations, human biases and cognitive limitations suggest that, despite their flaws, LLMs could potentially offer a more reliable alternative in facilitating these discussions.

In addition, LLMs are vulnerable to prompt injection attacks, where malicious inputs manipulate the model’s outputs, potentially introducing misinformation into otherwise good-faith deliberation. Even in benign contexts, LLMs may misrepresent, omit, or fabricate details when summarising content, particularly in response to ambiguous prompts or when overgeneralising from limited input. Proposed mitigation strategies include deploying a second model to flag suspicious prompts,[6] using instruction-based defences,[7] or applying outlier detection to identify unusual outputs.[8] However, these remain experimental, and no solution is foolproof. Both issues raise serious concerns about whether LLMs can be trusted to reliably summarise and moderate discussions in the context of RVDs.

Fairness & Representation

Beyond factual reliability, a deeper ethical concern lies in how LLMs may subtly shape the boundaries of acceptable discourse. This is particularly relevant in the context of RVDs — where deeply held, often conflicting values are at stake — with several studies demonstrating popular LLMs to be politically left leaning,[9] aligning with dominant perspectives in academia, media, and expert communities. When tasked with moderating or summarising discussion, these models may unintentionally deprioritise or soften dissenting perspectives through shifts in tone, emphasis, or omission. This is especially concerning in debates around climate policy, identity politics, or social justice, where models optimised for politeness or safety may overcorrect, excluding uncomfortable but essential parts of the discourse. In doing so, LLMs risk reinforcing epistemic conformity and undermining the pluralism that meaningful deliberation on radical value disagreements requires.

Black Boxes      

Thirdly, the black box nature of LLMs poses a serious challenge for their use in deliberation. It is often difficult to trace how these models arrive at specific outputs, making it hard to assess the reasoning behind a given summary or decision. This lack of transparency is especially consequential in the context of sensitive RVDs, where trust and perceived fairness are essential. Consider the Digital Town Hall scenario introduced earlier: if a particular group feels their views have been downplayed or misrepresented in the model’s summary, the legitimacy of the process may be called into question. This raises a central epistemic concern—if participants don’t trust that the AI has fairly incorporated their input, can its output still be regarded as a meaningful contribution to deliberation?

However, one might argue that humans, too, are ‘black boxes’ to some extent. When a person summarises a debate, it can be equally difficult to fully trace the reasoning behind their choices. The key difference, then, may lie in how we perceive and relate to human versus AI decision-making. You can ask a human summariser what they considered, and while their explanation may be incomplete or biased, it is often seen as intelligible and rooted in shared social understanding. LLMs, by contrast, may offer explanations that feel circular, mechanistic, or disconnected from real-world reasoning. More importantly, people tend to have intuitive models of how other humans think—what might motivate them, what biases they hold, and what norms shape their responses. These intuitions rarely extend to AI systems, which are often viewed as opaque or impenetrable, making it harder to assess whether their outputs are fair or trustworthy.

The key point here is that trust in deliberation depends on the belief that all views are taken seriously. If LLMs are to serve as intermediaries in public discourse online, they must be perceived as transparent and neutral. To earn that trust, platforms should provide clear explanations of how the model generated its output. This level of transparency is essential if AI is to play a legitimate role in RVD deliberation. Hence, even if concerns about LLM capabilities are addressed, public perception of their outputs remains equally important.

Conclusion

Whether LLMs ultimately support or undermine productive deliberation on RVDs remains an open question. Current evidence suggests they can help reduce polarisation, clarify opposing views, and structure dialogue in useful ways—but these benefits are fragile, and highly dependent on the context in which the models are deployed. The same technology can be used in markedly different ways: to inflame division or to foster mutual understanding, depending on how, where, and by whom it is applied. Future research should prioritise testing LLM interventions in live, high-stakes deliberative settings to assess their effects on mutual understanding, willingness to engage, and polarisation. Hybrid systems that combine LLMs with human moderators warrant particular attention, as do safeguards against hallucination, bias, and prompt manipulation. Just as crucially, defining what counts as successful deliberation—and developing robust ways to measure it—will be key to determining when, and under what conditions, LLMs can meaningfully support engagement with deep moral and political disagreement online.

Acknowledgments

Many thanks to Dr. David Lyreskog for his invaluable feedback, insightful suggestions, and thoughtful comments on this blog.

Blog by William Hohnen-FordDesign Bioethics Lab, NEUROSEC, Department of Psychiatry, University of Oxford

References

[1] Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the national academy of sciences, 118(9), e2023301118.

[2] Chiu, K. L., Collins, A., & Alexander, R. (2021). Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407.

[3] Small, C. T., Vendrov, I., Durmus, E., Homaei, H., Barry, E., Cornebise, J., ... & Megill, C. (2023). Opportunities and risks of LLMs for scalable deliberation with Polis. arXiv preprint arXiv:2306.11932.

[4] Tessler, M. H., Bakker, M. A., Jarrett, D., Sheahan, H., Chadwick, M. J., Koster, R., ... & Summerfield, C. (2024). AI can help humans find common ground in democratic deliberation. Science, 386(6719), eadq2852.

[5] Argyle, L. P., Bail, C. A., Busby, E. C., Gubler, J. R., Howe, T., Rytting, C., ... & Wingate, D. (2023). Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale. Proceedings of the National Academy of Sciences, 120(41), e2311627120.

[6] Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023, November). Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (pp. 79-90).

[7] Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., ... & Liu, Y. (2023). Prompt Injection attack against LLM-integrated Applications. arXiv preprint arXiv:2306.05499.

[8] Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., ... & Steinhardt, J. (2023). Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112.

[9] Rettenberger, L., Reischl, M., & Schutera, M. (2025). Assessing political bias in large language models. Journal of Computational Social Science, 8(2), 1-17.