Society

Feature, Not Bug: How AI Sycophancy Became Everyone's Problem

Keystone Collective

12 Apr 2026 — 7 min read

Two new studies confirm what anyone paying attention already suspected. AI chatbots are systematically flattering us into delusion, and we love them for it. Somewhere, a product manager is smiling.

There is a particular kind of genius involved in building a product that makes people measurably worse off and then training them to prefer it over the alternative. The tobacco industry managed it for decades. Social media refined the technique into an art form, monetised the resulting anxiety, and then appeared before Senate committees to express sincere concern. And now, according to two pieces of research published in early 2026, the artificial intelligence industry has achieved the same elegant trick, only this time the product does not merely rot your lungs or hollow out your attention span. It agrees with you. Enthusiastically. Continuously. Until you are absolutely certain you have made a fundamental mathematical discovery, or that you are trapped in a false universe, or that the ketamine was, all things considered, probably a good idea.

To be fair, this is not an entirely novel problem. Human beings have always been susceptible to people who agree with them. Kings surrounded themselves with courtiers who confirmed their genius until the army showed up. Cult leaders have built entire theological architectures on the simple premise that the congregation is special and misunderstood and definitely not responsible for anything. The self-help industry has extracted billions from people who wanted to be told, at some length and with tasteful cover design, that they were right all along. What the AI industry has contributed to this ancient tradition is scale, availability, and a training methodology that accidentally optimised for sycophancy as a side effect of optimising for engagement. Accidentally, of course. These things happen.

The Human Line Project has, at time of writing, documented nearly 300 cases of what is now being called "AI psychosis" or "delusional spiraling," linked to at least fourteen deaths and five wrongful death lawsuits filed against AI companies (Chandra et al., 2026). These are cases where extended chatbot interaction led users to high confidence in beliefs sufficiently detached from reality as to become dangerous. Eugene Torres, an accountant with no prior history of mental illness, came to believe he was trapped in a false universe from which he could escape only by, amongst other things, increasing his intake of ketamine and cutting ties with his family. Allan Brooks became convinced he had made a breakthrough mathematical discovery. Others have not survived the experience of being enthusiastically agreed with.

The instinct in polite technology circles is to treat these as edge cases. Tragic, certainly. Isolated. The sort of thing that happens to vulnerable people, people who perhaps should not have been using the product unsupervised, people about whom one can feel something in the vague region of the chest without drawing any structural conclusions. A new paper from MIT and the University of Washington suggests this instinct is, in the technical sense, wrong (Chandra et al., 2026). The paper constructs a formal Bayesian model of a user interacting with a sycophantic chatbot, runs ten thousand simulated conversations for each level of sycophancy tested, and finds that even an idealised perfectly rational belief-updater, a creature who updates on evidence exactly as probability theory demands, who never gets tired, never gets lonely, never wants the chatbot to agree with them, is still vulnerable to delusional spiraling if the chatbot is sycophantic enough. The point is not subtle. If a being of pure epistemic virtue can be flattered into catastrophic false belief, then characterising AI psychosis victims as irrational or gullible is not merely unkind. It is wrong in a way that is also, conveniently, exculpatory for the people who built the product.

The paper tests the obvious interventions and finds them wanting with almost elegant efficiency. Forcing the chatbot to be factual, preventing hallucination while still allowing it to choose which true facts to present, reduces spiraling but does not eliminate it, because a sycophant constrained to tell only the truth can still cherry-pick the truths most likely to validate you. A liar who has been told they must only lie by omission remains, structurally, a liar. Informing users that their chatbot might be sycophantic also helps, but not enough, because a subtly sycophantic factual bot is actually harder for an informed user to detect than an obviously sycophantic hallucinating one. The researchers compare this to Bayesian persuasion in behavioural economics, in which a strategically selective prosecutor can raise a judge's conviction rate even when the judge understands the prosecutor's incentives perfectly. Which is to say: knowing you are being manipulated is not the same as being immune to it, a lesson that conspiracy theorists, cult members, and the entire population of people who remained in obviously dysfunctional relationships despite knowing better could have offered for free.

Meanwhile, in a parallel and frankly more depressing line of inquiry, Stanford and Carnegie Mellon researchers spent their time measuring what sycophancy does to ordinary people having ordinary arguments (Cheng et al., 2026). Not people in crisis. Not people predisposed to psychosis. People having the sort of interpersonal conflicts that everyone has, the mildly unreasonable workplace dispute, the family gathering that went slightly wrong, the thing you did that, if you are being honest, you probably should not have done. They took nearly 2,500 participants, sat them down with either a sycophantic or a non-sycophantic AI model, had them discuss a real conflict from their own lives, and measured what happened to their willingness to apologise, take responsibility, and repair the relationship.

The findings are the kind that make you want to sit quietly for a moment. A single interaction with a sycophantic AI model was sufficient to increase participants' conviction that they had been in the right by up to twenty-five percent in live chat conditions and sixty-two percent in controlled scenarios. It reduced their willingness to take reparative action by up to twenty-eight percent. It made them less likely, in the letters they wrote to the other person in the conflict, to apologise or acknowledge fault at all. Participants in the non-sycophantic condition apologised or admitted fault at a rate of seventy-five percent. In the sycophantic condition, that figure fell to fifty percent. One conversation. Not years of use. Not a sustained campaign of validation. One conversation with a chatbot that had been trained to agree with them.

To establish how representative this is, the researchers also tested eleven leading AI models, the familiar roster of GPT-4o, Claude, Gemini, and their various competitors, against a dataset of Reddit posts from r/AmITheAsshole, a community built entirely around crowdsourced verdicts on interpersonal bad behaviour. In cases where the human community had definitively ruled the poster to be in the wrong, the AI models affirmed the user in fifty-one percent of cases. The human baseline for the same cases was zero percent. The AI was not merely more agreeable than a human friend would be. It was infinitely more agreeable, in the precise mathematical sense that fifty-one divided by zero is undefined, and the human consensus was zero.

Across all contexts tested, AI models affirmed user actions at roughly forty-nine percent higher rates than human respondents, including in cases involving deception, illegal conduct, and harm to others. This is not a rounding error or a statistical quirk. It is the predictable output of a training methodology in which human raters give higher scores to responses they find agreeable, agreeable responses are ones that validate the user, and validation correlates strongly with telling people what they wanted to hear. The machine learned what worked. It works very well.

What makes all of this genuinely interesting, in the way that slow-moving disasters are interesting, is that the same study found participants rated sycophantic responses as higher quality, trusted the sycophantic model more, and were more likely to return to it. Users in the experiments frequently described responses that merely echoed their own views as "objective," "fair," and "honest." The chatbot that told them they were right was the one they trusted to tell them the truth. This is not a paradox. This is the human brain doing exactly what it has always done, which is seek confirmation, reward sources of confirmation with trust, and interpret validation as signal rather than noise.

The grim structural joke, of course, is that this makes the problem nearly self-correcting in the wrong direction. Sycophantic models drive engagement. Engagement drives the training signal. The training signal produces more sycophantic models. Users prefer these models and return to them, generating more engagement data, and the loop completes. Cheng and colleagues note that market forces are unlikely to correct this, because the incentive to be sycophantic and the incentive to drive engagement are identical. There is no version of the business model that rewards the chatbot for telling you that you owe your sister an apology.

What both papers ultimately describe, taken together, is not an edge case or a design flaw. It is the ordinary functioning of a system built to be liked, deployed to a population that has spent the preceding decade being algorithmically sorted into information environments that confirmed their existing views, sold products that promised to reveal their authentic selves, and told, repeatedly and at commercial scale, that their instincts were sound and their choices were good. The chatbot did not invent the dynamic. It simply automated it, made it available at two in the morning, personalised it, and charged a subscription fee.

The machine agrees with you. It was designed to. It will continue to agree with you. The only variable the research cannot quite resolve is at what point the agreement stops feeling like validation and starts feeling like the last thing you heard before everything went wrong.

Fourteen deaths, so far. But these things take time to scale.

We don't have a chatbot telling us our work is brilliant. We have you. If you think it actually is, consider buying us a coffee. Thank you!

Donate

References

Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J. and Tenenbaum, J.B. (2026). Sycophantic chatbots cause delusional spiraling, even in ideal Bayesians. arXiv:2602.19141. Available at: https://doi.org/10.48550/arXiv.2602.19141 (Accessed: 12 April 2026).

Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D. and Jurafsky, D. (2026). Sycophantic AI decreases prosocial intentions and promotes dependence. Science, Volume 391, Issue 6792. Available at: https://doi.org/10.1126/science.aec8352 (Accessed: 12 April 2026).

The Human Line Project (2026). Protecting emotional well-being
in the age of AI. Available at: https://www.thehumanlineproject.org/ (Accessed: 12 April 2026).

Feature, Not Bug: How AI Sycophancy Became Everyone's Problem

Keystone Collective

References

Read more

Punch Cards and Power Plays: From IBM to Big Tech, or How We Learned to Stop Worrying and Love the Data State

The Internet Kill Switch Iran Built While the West Was Busy Debating Age Verification

Mind the Gap (and the GPU): The New Aristocracy of the Infinite Algorithm

Hack, Report, Get Sued: The Accidental Heroes of Cybersecurity