<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "journalpublishing.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" article-type="research-article"><front><journal-meta><journal-id journal-id-type="nlm-ta">JMIR Ment Health</journal-id><journal-id journal-id-type="publisher-id">mental</journal-id><journal-id journal-id-type="index">16</journal-id><journal-title>JMIR Mental Health</journal-title><abbrev-journal-title>JMIR Ment Health</abbrev-journal-title><issn pub-type="epub">2368-7959</issn><publisher><publisher-name>JMIR Publications</publisher-name><publisher-loc>Toronto, Canada</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">v13i1e96894</article-id><article-id pub-id-type="doi">10.2196/96894</article-id><article-categories><subj-group subj-group-type="heading"><subject>Viewpoint</subject></subj-group></article-categories><title-group><article-title>When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion</article-title></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name name-style="western"><surname>Tahseen</surname><given-names>Hina</given-names></name><degrees>MBBS, MSc, MRCPSYCH</degrees><xref ref-type="aff" rid="aff1">1</xref><xref ref-type="aff" rid="aff2">2</xref></contrib></contrib-group><aff id="aff1"><institution>Somerset NHS Foundation Trust</institution><addr-line>Summerlands Hospital Site</addr-line><addr-line>Yeovil</addr-line><addr-line>England</addr-line><country>United Kingdom</country></aff><aff id="aff2"><institution>School of Medicine, Cardiff University</institution><addr-line>Cardiff</addr-line><addr-line>Wales</addr-line><country>United Kingdom</country></aff><contrib-group><contrib contrib-type="editor"><name name-style="western"><surname>Torous</surname><given-names>John</given-names></name></contrib></contrib-group><contrib-group><contrib contrib-type="reviewer"><name name-style="western"><surname>Reeves-Mclaren</surname><given-names>Nik</given-names></name></contrib><contrib contrib-type="reviewer"><name name-style="western"><surname>Dang</surname><given-names>Quang-Vinh</given-names></name></contrib></contrib-group><author-notes><corresp>Correspondence to Hina Tahseen, MBBS, MSc, MRCPSYCH, Somerset NHS Foundation Trust, Summerlands Hospital Site, Yeovil, England, BA202BX, United Kingdom, 44 01935410784; <email>hina.tahseen@gmail.com</email></corresp></author-notes><pub-date pub-type="collection"><year>2026</year></pub-date><pub-date pub-type="epub"><day>26</day><month>5</month><year>2026</year></pub-date><volume>13</volume><elocation-id>e96894</elocation-id><history><date date-type="received"><day>02</day><month>04</month><year>2026</year></date><date date-type="rev-recd"><day>02</day><month>05</month><year>2026</year></date><date date-type="accepted"><day>04</day><month>05</month><year>2026</year></date></history><copyright-statement>&#x00A9; Hina Tahseen. Originally published in JMIR Mental Health (<ext-link ext-link-type="uri" xlink:href="https://mental.jmir.org">https://mental.jmir.org</ext-link>), 26.5.2026. </copyright-statement><copyright-year>2026</copyright-year><license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on <ext-link ext-link-type="uri" xlink:href="https://mental.jmir.org/">https://mental.jmir.org/</ext-link>, as well as this copyright and license information must be included.</p></license><self-uri xlink:type="simple" xlink:href="https://mental.jmir.org/2026/1/e96894"/><abstract><p>Research on artificial intelligence (AI) and mental health has focused largely on harms at deployment, including chatbot safety, sycophancy, and AI-associated delusions. Less attention has been paid to a prior question: whether the human-generated text and preference judgments that shape large language models are themselves clinically reliable, particularly when self-report may be distorted. This Viewpoint aims to develop the clinical psychiatric construct of collusion&#x2014;the uncritical acceptance of an unreliable account&#x2014;as an analytic lens for AI training and deployment, and to argue that the clinical reliability of training and preference data should be treated as an explicit trustworthy-AI criterion in mental-health&#x2013;relevant systems. A conceptual synthesis of psychiatry, clinical psychology, and AI safety literature was undertaken. The analysis distinguishes three pipeline layers: pretraining corpora, preference data and posttraining methods, and deployment-time interaction. It maps the clinical construct of collusion against adjacent technical concepts, including sycophancy, reward overoptimization, grounding, refusal training, red-teaming, and live monitoring. The synthesis suggests that collusion-like dynamics are least applicable at the pretraining layer and most applicable at the preference-data and deployment layers, where unassessed user or labeler input can be reinforced without corroboration. Existing mitigations, including data curation, Constitutional AI, reward-model evaluation, grounded generation, refusal training, red-teaming, and postdeployment monitoring, address parts of this problem. However, these approaches are not yet organized around a clinically informed account of when self-report is unreliable. The central novelty is therefore not a generic claim about bias, but the proposal that clinical self-report reliability should be assessed as a distinct data-quality and governance dimension. Trustworthy-AI frameworks for mental-health&#x2013;relevant applications should incorporate clinical expertise in self-report reliability into preference-data design, red-teaming, and postmarket surveillance. Adding the clinical reliability of training and preference data as an explicit criterion could complement existing technical safeguards while leaving empirical evaluation of clinician involvement as an open research agenda.</p></abstract><kwd-group><kwd>artificial intelligence</kwd><kwd>cognitive bias</kwd><kwd>training data</kwd><kwd>collusion</kwd><kwd>sycophancy</kwd><kwd>large language models</kwd><kwd>mental health</kwd><kwd>AI safety</kwd><kwd>reinforcement learning from human feedback</kwd><kwd>AI governance</kwd></kwd-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Research on artificial intelligence (AI) and mental health has expanded rapidly. Systematic reviews have evaluated the capabilities and limitations of generative AI in mental health applications [<xref ref-type="bibr" rid="ref1">1</xref>]. Simulation work has shown that AI chatbots frequently fail to challenge delusional content and may exhibit sycophantic behavior that reinforces harmful beliefs [<xref ref-type="bibr" rid="ref2">2</xref>,<xref ref-type="bibr" rid="ref3">3</xref>]. Clinical case series and rapid scoping reviews of media reporting have documented AI-associated delusions and adverse psychiatric events in users of large language models (LLMs), including individuals with no prior psychotic history [<xref ref-type="bibr" rid="ref4">4</xref>,<xref ref-type="bibr" rid="ref5">5</xref>]. Large-scale analyses of deployed assistants have begun to quantify the prevalence of potentially disempowering interactions [<xref ref-type="bibr" rid="ref6">6</xref>].</p><p>This literature concentrates on what happens at the point of interaction between a user and a deployed system. A logically prior question has received less attention: whether the human-generated text and human preference judgments that shape these systems are themselves clinically reliable. The technical literature is not silent on related phenomena. Sycophancy, reward overoptimization, specification gaming, and disempowerment potential are well-developed constructs in AI research [<xref ref-type="bibr" rid="ref6">6</xref>-<xref ref-type="bibr" rid="ref9">9</xref>]. Fairness-aware machine learning and participatory and community-engaged approaches to data curation have addressed demographic representation, population-level bias, and inclusion in training corpora [<xref ref-type="bibr" rid="ref10">10</xref>,<xref ref-type="bibr" rid="ref11">11</xref>]. Adjacent work on data integrity in materials science similarly shows how flawed and biased scientific data can be amplified in AI systems, but addresses domain-level data integrity rather than clinical self-report reliability [<xref ref-type="bibr" rid="ref12">12</xref>]. These contributions are substantial and are not replicated here. The contribution this Viewpoint advances is narrower and complementary: that psychiatry and clinical psychology have a mature vocabulary and a working evidence base for assessing when self-report is unreliable, and that this expertise is currently absent from the curation of training corpora, the design of preference data, and the evaluation of trustworthy AI in health care. The distinction is important. Existing data-quality work asks who is represented in training corpora and how proportionally. The question raised here is different: whether the human-generated text those populations produced is a reliable account of experience, cognition, and need in the specific sense developed in the clinical self-report literature. This is not a question of demographic representation but of self-report reliability, and it is a question that clinical disciplines are specifically trained to assess.</p><p>To make that case rigorously, <italic>collusion</italic> is developed below as an analytic construct rather than a metaphor, with stated necessary conditions, explicit disanalogies with the clinical encounter, and a mapping against adjacent technical constructs.</p></sec><sec id="s2"><title>A Three-Layer Account of the Pipeline</title><p>A common limitation of clinically motivated commentary on AI is to treat &#x201C;training data&#x201D; as a single object. The argument is sharper when three pipeline layers are distinguished.</p><sec id="s2-1"><title>Pretraining Corpora</title><p>General-purpose LLMs are first trained on large heterogeneous text corpora drawn from web content, books, code, and other sources. Recent work shows that data selection, filtering, deduplication, and source mixing materially affect model performance and downstream behavior, and that pretraining corpora are an active object of research and curation [<xref ref-type="bibr" rid="ref13">13</xref>,<xref ref-type="bibr" rid="ref14">14</xref>]. These corpora are not equivalent to validated clinical truth, but they also contain corrective material such as textbooks and peer-reviewed literature; the relevant claim is selection-weighted bias rather than absence of accurate text.</p></sec><sec id="s2-2"><title>Preference Data and Posttraining</title><p>Models are then refined through reinforcement learning from human feedback, direct preference optimization, or rule-based variants such as Constitutional AI and reinforcement learning from AI feedback [<xref ref-type="bibr" rid="ref15">15</xref>]. Recent technical work has shown that human preference judgments can favor view-matching or agreeable responses over truthful ones, that reward models trained on such judgments can be overoptimized, and that dedicated evaluation of reward models is required [<xref ref-type="bibr" rid="ref7">7</xref>,<xref ref-type="bibr" rid="ref16">16</xref>,<xref ref-type="bibr" rid="ref17">17</xref>].</p></sec><sec id="s2-3"><title>Deployment-Time Interaction</title><p>Final behavior is further shaped by system prompts, interface incentives, retrieved context, memory, personalization, and the accumulation of multiturn interaction history. Sycophancy has been shown to increase with extended interaction and personalization [<xref ref-type="bibr" rid="ref9">9</xref>], and providers have publicly described episodes in which deployment-time tuning and feedback design induced or worsened sycophantic behavior in production systems [<xref ref-type="bibr" rid="ref18">18</xref>].</p><p>The collusion analogy is weakest at the pretraining layer, where the relevant pathology is corpus composition rather than reinforced agreement, and is strongest at the preference-data and deployment layers, where systems are explicitly optimized to produce more of what users or labelers approve.</p></sec></sec><sec id="s3"><title>Cognitive Distortion and the Clinical Reliability of Self-Report</title><p>The cognitive science underpinning the argument is well established but should not be overstated. Tversky and Kahneman [<xref ref-type="bibr" rid="ref19">19</xref>] demonstrated that human reasoning shows systematic deviations from normative models under uncertainty, and Kahneman&#x2019;s later synthesis treats heuristic processing as a default operating mode of cognition [<xref ref-type="bibr" rid="ref20">20</xref>]. The strength of this view is contested by the ecological-rationality tradition, which argues that heuristics are often well calibrated to environmental structure [<xref ref-type="bibr" rid="ref21">21</xref>]. The conservative claim sufficient for the present argument is that human-generated text reflects systematic, predictable cognitive biases that pretraining corpora carry forward without correction.</p><p>Clinical populations introduce a further layer of distortion that the cognitive bias literature alone does not capture. Beck&#x2019;s cognitive model, developed and validated as a framework for psychotherapy rather than as a general theory of cognition, describes recurrent patterns of distorted thinking in mood and anxiety disorders, including catastrophizing, overgeneralization, dichotomous reasoning, and selective abstraction [<xref ref-type="bibr" rid="ref22">22</xref>]. Psychotic disorders disrupt the cognitive architecture on which accurate self-report depends; ambulatory and self-report studies in psychosis have documented the methodological challenges of obtaining valid first-person data even under controlled conditions [<xref ref-type="bibr" rid="ref23">23</xref>]. Severe depression is characterized by psychomotor retardation, anhedonia, and social withdrawal, and the digital phenotyping literature has reported associations between depressive symptoms and reduced or altered patterns of digital and social media engagement [<xref ref-type="bibr" rid="ref24">24</xref>]; affected individuals are therefore plausibly underrepresented, or differently represented, in pretraining corpora drawn from public text. These illustrations are clinical hypotheses about selection and presentation effects in pretraining corpora; they have not been quantified in any specific corpus and are offered as examples, not as epidemiologically calibrated estimates.</p><p>What clinical practice routinely contributes is the assessment of reliability in self-report: holding an account against observation, collateral history, illness pattern, secondary gain, and the pragmatic context of the encounter. A patient facing detention under the Mental Health Act may minimize psychotic symptoms to preserve autonomy. A patient seeking controlled medication may exaggerate distress. A patient whose life has been shaped by years of institutional care may have internalized a framework for understanding their own needs that bears little resemblance to standardized assessment instruments. The clinician&#x2019;s task is not to disbelieve, but to test the account.</p></sec><sec id="s4"><title>Collusion as an Analytic Construct</title><p>In clinical usage, <italic>collusion</italic> denotes the uncritical acceptance by a clinician of a patient account that is unreliable in ways the patient may not recognize, in the absence of corroboration against observation, collateral history, or known illness patterns. It is treated as a clinical error, regardless of whether the patient was being dishonest.</p><p>Adapted to AI systems, collusion can be defined as follows: <italic>the structural reinforcement of user input as ground truth in the absence of mechanisms (at training, preference labeling, or deployment) for assessing the clinical reliability of that input</italic>. On this definition, the necessary conditions for collusion-like dynamics in an AI system are: (1) input from a source whose reliability is unassessed, (2) optimization pressure that rewards agreement with that input, and (3) the absence of any corroboration mechanism. Under current preference-optimization regimes, these conditions are routinely met at the preference-data and deployment layers.</p><p>Several disanalogies must be stated explicitly. AI systems have no intent, no dyadic relational dynamic, and no countertransference; the &#x201C;patient&#x201D; role is distributed across millions of unidentified users, the &#x201C;clinician&#x201D; role is distributed across data curators and preference labelers, and there is no professional duty to a specific person. The analogy is therefore structural rather than psychodynamic.</p><p>Several adjacent constructs in the technical literature describe overlapping phenomena. <italic>Sycophancy</italic> names the behavioral pattern of agreeing with users against the system&#x2019;s own evidence [<xref ref-type="bibr" rid="ref7">7</xref>,<xref ref-type="bibr" rid="ref9">9</xref>]. <italic>Reward hacking</italic> and <italic>reward overoptimization</italic> name the optimization pathology in which a model exploits an imperfect reward signal [<xref ref-type="bibr" rid="ref16">16</xref>,<xref ref-type="bibr" rid="ref17">17</xref>]. <italic>Specification gaming</italic> generalizes this to misaligned objectives. <italic>Epistemic deference</italic> and <italic>perspective mimesis</italic> describe deployment-time effects on user belief [<xref ref-type="bibr" rid="ref9">9</xref>]. <italic>Disempowerment potential</italic> names the empirical correlate at scale [<xref ref-type="bibr" rid="ref6">6</xref>]. The contribution of <italic>collusion</italic> is not to compete with these constructs but to name what they do not foreground: the <italic>clinical reliability of the input</italic> on which the system is being optimized. Collusion is offered as a clinically informed redescription of a family of already described technical phenomena, not as a claim that the field has ignored bias or sycophancy.</p></sec><sec id="s5"><title>The Feedback Loop in Deployed Systems</title><p>The empirical correlate at deployment scale is informative but should be reported precisely. Anthropic- and University of Toronto&#x2013;affiliated researchers analyzed approximately 1.5 million Claude.ai conversations and developed a framework for assessing disempowerment potential across reality distortion, value judgment distortion, and action distortion, with amplifying factors such as attachment and reliance/dependency [<xref ref-type="bibr" rid="ref6">6</xref>]. In feedback-linked subsets, conversations rated as having moderate or severe disempowerment potential received higher rates of positive user feedback (thumbs-up) than baseline, and the prevalence of such potential increased over time. The authors emphasize that feedback-linked conversations are not representative, that feedback samples likely overrepresent extremes, that the study observes snapshots rather than longitudinal user belief, and that it cannot directly confirm distorted belief or harm. The defensible reading is that some potentially reality-distorting interactions are positively reinforced under current feedback designs, not that users were demonstrably misled.</p><p>This pattern is consistent with the broader sycophancy literature and with provider deployment notes [<xref ref-type="bibr" rid="ref7">7</xref>,<xref ref-type="bibr" rid="ref18">18</xref>]. Cheng and colleagues [<xref ref-type="bibr" rid="ref8">8</xref>] recently reported across 11 contemporary models that AI systems affirm users more often than humans do, and that sycophantic responses reduce responsibility-taking and increase users&#x2019; conviction of their own correctness. Jain and colleagues [<xref ref-type="bibr" rid="ref9">9</xref>] showed that deployment-time interaction context, including memory and personalization, increases agreement sycophancy and perspective mimesis. OpenAI&#x2019;s public account of an April 2025 GPT-4o update documents that posttraining and feedback redesign can introduce sycophancy that was not present in the pretrained base model, and that the change was rolled back [<xref ref-type="bibr" rid="ref18">18</xref>]. Together, these findings suggest that the necessary conditions for collusion-like dynamics arise from the interaction of preference-tuning and deployment design, not from pretraining alone.</p></sec><sec id="s6"><title>Existing Technical Mitigations</title><p>A balanced account requires acknowledgment that AI safety practice already addresses parts of this problem. At least five families of mitigation are active areas of work. Pretraining data curation includes quality filtering, deduplication, source mixing, and model-based selection [<xref ref-type="bibr" rid="ref13">13</xref>,<xref ref-type="bibr" rid="ref14">14</xref>]. Constitutional AI and reinforcement learning from AI feedback use explicit principles or AI-generated feedback to steer behavior beyond naive human preference labels [<xref ref-type="bibr" rid="ref15">15</xref>]. Reward-model evaluation and ensembling are intended to detect and reduce overoptimization against imperfect proxies [<xref ref-type="bibr" rid="ref16">16</xref>,<xref ref-type="bibr" rid="ref17">17</xref>]. Grounded generation, retrieval augmentation, refusal and &#x201C;I don&#x2019;t know&#x201D; training, and uncertainty calibration are increasingly used in high-stakes deployments [<xref ref-type="bibr" rid="ref25">25</xref>]. Live monitoring and feedback redesign, illustrated by the disempowerment study and the GPT-4o rollback, allow providers to detect and respond to emergent sycophancy in production [<xref ref-type="bibr" rid="ref6">6</xref>,<xref ref-type="bibr" rid="ref18">18</xref>]. Open-problems analyses of reinforcement learning from human feedback emphasize that none of these methods is sufficient on its own and that defense in depth is required [<xref ref-type="bibr" rid="ref26">26</xref>].</p><p>The narrower claim made here is that none of these mitigation families is yet organized around a clinically informed account of when user self-report is unreliable. Filter quality is not the same as clinical reliability. A constitution can encode helpfulness, harmlessness, and honesty without encoding reality-testing. Reward-model evaluation can audit calibration without auditing the clinical validity of the preferences being modeled. Grounding to authoritative documents does not address the unreliability of the user&#x2019;s own first-person account. Red-teaming probes for jailbreaks and unsafe outputs but is not standardly resourced with psychiatric expertise in distorted cognition.</p></sec><sec id="s7"><title>Implications and Operationalized Proposals</title><sec id="s7-1"><title>The Scope of the Problem</title><p>The implications extend beyond psychosis and beyond psychiatry. Any AI system that processes human input (a triage algorithm, a risk assessment tool, a diagnostic support system, or a patient-facing chatbot) operates under the assumption that user input broadly reflects clinical reality. The individual catastrophizing during a crisis, the person whose health-seeking behavior is anxiety-driven, and the patient whose account is shaped by years of institutionalization all generate input that current systems treat as ground truth.</p><p>International consensus guidelines for trustworthy AI in health care, including FUTURE-AI, articulate principles of fairness, universality, traceability, usability, robustness, and explainability and address data quality at a general level [<xref ref-type="bibr" rid="ref27">27</xref>]. The narrower claim is that none of these frameworks specifically operationalizes <italic>clinical reliability of self-report</italic> as a data-quality dimension. World Health Organization guidance on large multimodal models [<xref ref-type="bibr" rid="ref28">28</xref>], the EU AI Act&#x2019;s high-risk classification for medical devices, the US Food and Drug Administration&#x2019;s Good Machine Learning Practice, and the National Institute for Health and Care Excellence&#x2019;s Evidence Standards Framework similarly do not yet address it.</p><p>Three first-pass proposals follow. They are programmatic and require empirical evaluation.</p></sec><sec id="s7-2"><title>Proposal 1: Clinical Input Into Preference Data, Red-Teaming, and Postmarket Surveillance</title><p><italic>Who:</italic> AI developers, contracted clinical advisory groups, professional colleges (for example, the Royal College of Psychiatrists and the British Psychological Society in the United Kingdom; analogous bodies internationally). <italic>Where in the life cycle:</italic> preference labeling for assistants used in or affecting mental health contexts; red-team scenario design; postmarket live-conversation auditing. <italic>How:</italic> development of a clinical-reliability annotation schema for preference data, specifying when a candidate response should be preferred for challenging rather than affirming an apparently distorted account, with explicit attention to psychosis, mania, secondary gain, and crisis presentations. <italic>Measurable output:</italic> a published schema, interrater reliability statistics on a held-out set, and a comparison of model behavior on a clinical-reliability evaluation suite before and after schema-aligned preference training.</p></sec><sec id="s7-3"><title>Proposal 2: Routine Clinical Inquiry About AI Use</title><p><italic>Who:</italic> psychiatric and clinical psychology services, undergraduate and postgraduate medical curricula, and curriculum bodies such as the Royal College of Psychiatrists. <italic>Where:</italic> psychosocial assessment, risk assessment, and Mental State Examination; safeguarding reviews; new-onset psychosis pathways. <italic>How:</italic> addition of explicit AI-use items to standard history-taking proformas, building on the AI-literacy competency proposal of Morrin and colleagues [<xref ref-type="bibr" rid="ref4">4</xref>]. <italic>Measurable output:</italic> validated AI-use items in routine documentation; audit of capture rates; competency descriptors in core curricula.</p></sec><sec id="s7-4"><title>Proposal 3: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion</title><p><italic>Who:</italic> trustworthy-AI framework authors (FUTURE-AI consortium, World Health Organization, regulators), with clinical input from psychiatry, clinical psychology, and lived-experience representation. <italic>Where:</italic> the data-quality and robustness sections of frameworks such as FUTURE-AI [<xref ref-type="bibr" rid="ref27">27</xref>] and World Health Organization LLM guidance [<xref ref-type="bibr" rid="ref28">28</xref>] and at the conformity-assessment stage for high-risk medical AI under the EU AI Act and equivalent regimes. <italic>How:</italic> an explicit criterion that, where systems will be deployed in mental-health&#x2013;relevant contexts, training and preference data have been assessed for systematic biases that clinical practice identifies as undermining the reliability of self-report. <italic>Measurable output:</italic> a reliability-of-self-report data-quality item added to at least one international framework; audit checklists; reporting in deployment summaries.</p></sec></sec><sec id="s8"><title>Limitations and Scope</title><p>This is a conceptual contribution, not an empirical study. Clinical examples, including mania, persecutory delusions, and severe depression, are illustrative rather than epidemiologically calibrated. The disempowerment study cited reports potential rather than confirmed distorted belief. Whether clinician involvement in data curation and preference labeling will improve downstream safety remains an empirical question, and any clinician-input scheme must itself be subject to governance: lived-experience representation, transparent criteria, and external audit are essential safeguards. The proposals are first-pass and require iterative empirical evaluation.</p></sec><sec id="s9" sec-type="conclusions"><title>Conclusions</title><p>The technical literature on AI alignment describes sycophancy, reward overoptimization, and disempowerment potential. The contribution offered here is a clinically informed redescription: collusion as an analytic frame, and the clinical reliability of training and preference data as a candidate criterion within trustworthy-AI guidance. Psychiatry and clinical psychology have developed standardized methods for assessing self-report reliability that are directly relevant to the design and governance of AI systems and are currently absent from those processes. Bringing this expertise into the pipeline will not solve sycophancy, but it may help name and address one dimension of the problem that purely technical mitigations are not yet organized to capture.</p></sec></body><back><ack><p>In the preparation and revision of this Viewpoint, the author used Anthropic Claude for editorial assistance: reducing word count, copyediting, and clarifying wording. These tools were not used as authors and did not determine the conceptual argument, the construct of collusion, the clinical examples, or the factual and clinical claims. Any AI-assisted suggestions were reviewed, edited, accepted, or rejected by the author. The author takes full responsibility for the manuscript. No patient data, identifiable clinical material, or confidential institutional material were entered into any AI system.</p></ack><notes><sec><title>Funding</title><p>This work received no specific funding from any agency in the public, commercial, or not-for-profit sectors.</p></sec></notes><fn-group><fn fn-type="con"><p>HT: conceptualization, investigation, methodology, writing - original draft, writing - review and editing, project administration.</p></fn><fn fn-type="conflict"><p>None declared.</p></fn></fn-group><glossary><title>Abbreviations</title><def-list><def-item><term id="abb1">AI</term><def><p>artificial intelligence</p></def></def-item><def-item><term id="abb2">LLM</term><def><p>large language model</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="ref1"><label>1</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Wang</surname><given-names>L</given-names> </name><name name-style="western"><surname>Bhanushali</surname><given-names>T</given-names> </name><name name-style="western"><surname>Huang</surname><given-names>Z</given-names> </name><name name-style="western"><surname>Yang</surname><given-names>J</given-names> </name><name name-style="western"><surname>Badami</surname><given-names>S</given-names> </name><name name-style="western"><surname>Hightow-Weidman</surname><given-names>L</given-names> </name></person-group><article-title>Evaluating generative AI in mental health: systematic review of capabilities and limitations</article-title><source>JMIR Ment Health</source><year>2025</year><month>05</month><day>15</day><volume>12</volume><fpage>e70014</fpage><pub-id pub-id-type="doi">10.2196/70014</pub-id><pub-id pub-id-type="medline">40373033</pub-id></nlm-citation></ref><ref id="ref2"><label>2</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Clegg</surname><given-names>KA</given-names> </name></person-group><article-title>Shoggoths, sycophancy, psychosis, oh my: rethinking large language model use and safety</article-title><source>J Med Internet Res</source><year>2025</year><month>11</month><day>18</day><volume>27</volume><fpage>e87367</fpage><pub-id pub-id-type="doi">10.2196/87367</pub-id><pub-id pub-id-type="medline">41252530</pub-id></nlm-citation></ref><ref id="ref3"><label>3</label><nlm-citation citation-type="confproc"><person-group person-group-type="author"><name name-style="western"><surname>Moore</surname><given-names>J</given-names> </name><name name-style="western"><surname>Grabb</surname><given-names>D</given-names> </name><name name-style="western"><surname>Agnew</surname><given-names>W</given-names> </name><etal/></person-group><article-title>Expressing stigma and inappropriate responses prevents llms from safely replacing mental health providers</article-title><year>2025</year><month>06</month><day>23</day><conf-name>FAccT &#x2019;25</conf-name><conf-loc>Athens Greece</conf-loc><fpage>599</fpage><lpage>627</lpage><pub-id pub-id-type="doi">10.1145/3715275.3732039</pub-id></nlm-citation></ref><ref id="ref4"><label>4</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Morrin</surname><given-names>H</given-names> </name><name name-style="western"><surname>Nicholls</surname><given-names>L</given-names> </name><name name-style="western"><surname>Levin</surname><given-names>M</given-names> </name><etal/></person-group><article-title>Artificial intelligence-associated delusions and large language models: risks, mechanisms of delusion co-creation, and safeguarding strategies</article-title><source>Lancet Psychiatry</source><year>2026</year><month>03</month><day>5</day><fpage>S2215-0366(25)00396-7</fpage><pub-id pub-id-type="doi">10.1016/S2215-0366(25)00396-7</pub-id><pub-id pub-id-type="medline">41796598</pub-id></nlm-citation></ref><ref id="ref5"><label>5</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Chung</surname><given-names>VHA</given-names> </name><name name-style="western"><surname>Bernier</surname><given-names>P</given-names> </name><name name-style="western"><surname>Hudon</surname><given-names>A</given-names> </name></person-group><article-title>Mass media narratives of psychiatric adverse events associated with generative AI chatbots: rapid scoping review</article-title><source>JMIR Ment Health</source><year>2026</year><month>03</month><day>30</day><volume>13</volume><fpage>e93040</fpage><pub-id pub-id-type="doi">10.2196/93040</pub-id><pub-id pub-id-type="medline">41911018</pub-id></nlm-citation></ref><ref id="ref6"><label>6</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Sharma</surname><given-names>M</given-names> </name><name name-style="western"><surname>McCain</surname><given-names>M</given-names> </name><name name-style="western"><surname>Douglas</surname><given-names>R</given-names> </name><name name-style="western"><surname>Duvenaud</surname><given-names>D</given-names> </name></person-group><article-title>Who&#x2019;s in charge? Disempowerment patterns in real-world LLM usage</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Jan 27, 2026</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2601.19062">https://arxiv.org/abs/2601.19062</ext-link></comment></nlm-citation></ref><ref id="ref7"><label>7</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Sharma</surname><given-names>M</given-names> </name><name name-style="western"><surname>Tong</surname><given-names>M</given-names> </name><name name-style="western"><surname>Korbak</surname><given-names>T</given-names> </name><name name-style="western"><surname>Duvenaud</surname><given-names>D</given-names> </name><name name-style="western"><surname>Askell</surname><given-names>A</given-names> </name><name name-style="western"><surname>Bowman</surname><given-names>SR</given-names> </name><etal/></person-group><article-title>Towards understanding sycophancy in language models</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Oct 27, 2023</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2310.13548">https://arxiv.org/abs/2310.13548</ext-link></comment></nlm-citation></ref><ref id="ref8"><label>8</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Cheng</surname><given-names>M</given-names> </name><name name-style="western"><surname>Lee</surname><given-names>C</given-names> </name><name name-style="western"><surname>Khadpe</surname><given-names>P</given-names> </name><name name-style="western"><surname>Yu</surname><given-names>S</given-names> </name><name name-style="western"><surname>Han</surname><given-names>D</given-names> </name><name name-style="western"><surname>Jurafsky</surname><given-names>D</given-names> </name></person-group><article-title>Sycophantic AI decreases prosocial intentions and promotes dependence</article-title><source>Science</source><year>2026</year><month>03</month><day>26</day><volume>391</volume><issue>6792</issue><fpage>eaec8352</fpage><pub-id pub-id-type="doi">10.1126/science.aec8352</pub-id><pub-id pub-id-type="medline">41886588</pub-id></nlm-citation></ref><ref id="ref9"><label>9</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Jain</surname><given-names>S</given-names> </name><name name-style="western"><surname>Park</surname><given-names>C</given-names> </name><name name-style="western"><surname>Viana</surname><given-names>M</given-names> </name><name name-style="western"><surname>Wilson</surname><given-names>A</given-names> </name><name name-style="western"><surname>Calacci</surname><given-names>D</given-names> </name></person-group><article-title>Interaction context often increases sycophancy in LLMs</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Sep 15, 2025</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2509.12517">https://arxiv.org/abs/2509.12517</ext-link></comment></nlm-citation></ref><ref id="ref10"><label>10</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Rajkomar</surname><given-names>A</given-names> </name><name name-style="western"><surname>Hardt</surname><given-names>M</given-names> </name><name name-style="western"><surname>Howell</surname><given-names>MD</given-names> </name><name name-style="western"><surname>Corrado</surname><given-names>G</given-names> </name><name name-style="western"><surname>Chin</surname><given-names>MH</given-names> </name></person-group><article-title>Ensuring fairness in machine learning to advance health equity</article-title><source>Ann Intern Med</source><year>2018</year><month>12</month><day>18</day><volume>169</volume><issue>12</issue><fpage>866</fpage><lpage>872</lpage><pub-id pub-id-type="doi">10.7326/M18-1990</pub-id><pub-id pub-id-type="medline">30508424</pub-id></nlm-citation></ref><ref id="ref11"><label>11</label><nlm-citation citation-type="confproc"><person-group person-group-type="author"><name name-style="western"><surname>Birhane</surname><given-names>A</given-names> </name><name name-style="western"><surname>Isaac</surname><given-names>W</given-names> </name><name name-style="western"><surname>Prabhakaran</surname><given-names>V</given-names> </name><etal/></person-group><article-title>Power to the people? Opportunities and challenges for participatory AI</article-title><year>2022</year><month>10</month><day>6</day><conf-name>EAAMO &#x2019;22</conf-name><conf-loc>Arlington VA USA</conf-loc><publisher-name>Association for Computing Machinery</publisher-name><pub-id pub-id-type="doi">10.1145/3551624.3555290</pub-id></nlm-citation></ref><ref id="ref12"><label>12</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Reeves-McLaren</surname><given-names>N</given-names> </name><name name-style="western"><surname>Christensen</surname><given-names>SM</given-names> </name></person-group><article-title>Data integrity in materials science in the era of AI: balancing accelerated discovery with responsible science and innovation</article-title><source>J Mater Chem A</source><year>2025</year><pub-id pub-id-type="doi">10.1039/D5TA05512A</pub-id></nlm-citation></ref><ref id="ref13"><label>13</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Li</surname><given-names>J</given-names> </name><name name-style="western"><surname>Fang</surname><given-names>A</given-names> </name><name name-style="western"><surname>Smyrnis</surname><given-names>G</given-names> </name><etal/></person-group><article-title>DataComp-LM: in search of the next generation of training sets for language models</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Jun 17, 2024</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2406.11794">https://arxiv.org/abs/2406.11794</ext-link></comment></nlm-citation></ref><ref id="ref14"><label>14</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Albalak</surname><given-names>A</given-names> </name><name name-style="western"><surname>Elazar</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Xie</surname><given-names>SM</given-names> </name><etal/></person-group><article-title>A survey on data selection for language models</article-title><source>Transactions on Machine Learning Research</source><year>2024</year><access-date>2026-05-15</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://openreview.net/forum?id=XfHWcNTSHp">https://openreview.net/forum?id=XfHWcNTSHp</ext-link></comment></nlm-citation></ref><ref id="ref15"><label>15</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Bai</surname><given-names>Y</given-names> </name><name name-style="western"><surname>Kadavath</surname><given-names>S</given-names> </name><name name-style="western"><surname>Kundu</surname><given-names>S</given-names> </name><name name-style="western"><surname>Askell</surname><given-names>A</given-names> </name><name name-style="western"><surname>Kernion</surname><given-names>J</given-names> </name><name name-style="western"><surname>Jones</surname><given-names>A</given-names> </name><etal/></person-group><article-title>Constitutional AI: harmlessness from AI feedback</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Dec 15, 2022</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2212.08073">https://arxiv.org/abs/2212.08073</ext-link></comment></nlm-citation></ref><ref id="ref16"><label>16</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Coste</surname><given-names>T</given-names> </name><name name-style="western"><surname>Anwar</surname><given-names>U</given-names> </name><name name-style="western"><surname>Kirk</surname><given-names>R</given-names> </name><name name-style="western"><surname>Krueger</surname><given-names>D</given-names> </name></person-group><article-title>Reward model ensembles help mitigate overoptimization</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Oct 4, 2023</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2310.02743">https://arxiv.org/abs/2310.02743</ext-link></comment></nlm-citation></ref><ref id="ref17"><label>17</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Frick</surname><given-names>E</given-names> </name><name name-style="western"><surname>Li</surname><given-names>T</given-names> </name><name name-style="western"><surname>Chen</surname><given-names>C</given-names> </name><etal/></person-group><article-title>How to evaluate reward models for RLHF</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Oct 18, 2024</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2410.14872">https://arxiv.org/abs/2410.14872</ext-link></comment></nlm-citation></ref><ref id="ref18"><label>18</label><nlm-citation citation-type="web"><article-title>Sycophancy in GPT-4o: what happened and what we&#x2019;re doing about it</article-title><source>OpenAI</source><year>2025</year><access-date>2026-05-02</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://openai.com/index/sycophancy-in-gpt-4o">https://openai.com/index/sycophancy-in-gpt-4o</ext-link></comment></nlm-citation></ref><ref id="ref19"><label>19</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Tversky</surname><given-names>A</given-names> </name><name name-style="western"><surname>Kahneman</surname><given-names>D</given-names> </name></person-group><article-title>Judgment under uncertainty: heuristics and biases</article-title><source>Science</source><year>1974</year><month>09</month><day>27</day><volume>185</volume><issue>4157</issue><fpage>1124</fpage><lpage>1131</lpage><pub-id pub-id-type="doi">10.1126/science.185.4157.1124</pub-id><pub-id pub-id-type="medline">17835457</pub-id></nlm-citation></ref><ref id="ref20"><label>20</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Kahneman</surname><given-names>D</given-names> </name></person-group><source>Thinking, Fast and Slow</source><year>2011</year><publisher-name>Farrar, Straus and Giroux</publisher-name><pub-id pub-id-type="other">13 978-0141033570</pub-id></nlm-citation></ref><ref id="ref21"><label>21</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Gigerenzer</surname><given-names>G</given-names> </name><name name-style="western"><surname>Brighton</surname><given-names>H</given-names> </name></person-group><article-title>Homo heuristicus: why biased minds make better inferences</article-title><source>Top Cogn Sci</source><year>2009</year><month>01</month><volume>1</volume><issue>1</issue><fpage>107</fpage><lpage>143</lpage><pub-id pub-id-type="doi">10.1111/j.1756-8765.2008.01006.x</pub-id><pub-id pub-id-type="medline">25164802</pub-id></nlm-citation></ref><ref id="ref22"><label>22</label><nlm-citation citation-type="book"><person-group person-group-type="author"><name name-style="western"><surname>Beck</surname><given-names>AT</given-names> </name></person-group><source>Cognitive Therapy and the Emotional Disorders</source><year>1976</year><publisher-name>International Universities Press</publisher-name><pub-id pub-id-type="other">978-0140156898</pub-id></nlm-citation></ref><ref id="ref23"><label>23</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Palmier-Claus</surname><given-names>JE</given-names> </name><name name-style="western"><surname>Ainsworth</surname><given-names>J</given-names> </name><name name-style="western"><surname>Machin</surname><given-names>M</given-names> </name><etal/></person-group><article-title>The feasibility and validity of ambulatory self-report of psychotic symptoms using a smartphone software application</article-title><source>BMC Psychiatry</source><year>2012</year><month>10</month><day>17</day><volume>12</volume><fpage>172</fpage><pub-id pub-id-type="doi">10.1186/1471-244X-12-172</pub-id><pub-id pub-id-type="medline">23075387</pub-id></nlm-citation></ref><ref id="ref24"><label>24</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Saeb</surname><given-names>S</given-names> </name><name name-style="western"><surname>Zhang</surname><given-names>M</given-names> </name><name name-style="western"><surname>Karr</surname><given-names>CJ</given-names> </name><etal/></person-group><article-title>Mobile phone sensor correlates of depressive symptom severity in daily-life behavior: an exploratory study</article-title><source>J Med Internet Res</source><year>2015</year><month>07</month><day>15</day><volume>17</volume><issue>7</issue><fpage>e175</fpage><pub-id pub-id-type="doi">10.2196/jmir.4273</pub-id><pub-id pub-id-type="medline">26180009</pub-id></nlm-citation></ref><ref id="ref25"><label>25</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Kenthapadi</surname><given-names>K</given-names> </name><name name-style="western"><surname>Sameki</surname><given-names>M</given-names> </name><name name-style="western"><surname>Taly</surname><given-names>A</given-names> </name></person-group><article-title>Grounding and evaluation for large language models: practical challenges and lessons learned</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Jul 10, 2024</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2407.12858">https://arxiv.org/abs/2407.12858</ext-link></comment></nlm-citation></ref><ref id="ref26"><label>26</label><nlm-citation citation-type="other"><person-group person-group-type="author"><name name-style="western"><surname>Casper</surname><given-names>S</given-names> </name><name name-style="western"><surname>Davies</surname><given-names>X</given-names> </name><name name-style="western"><surname>Shi</surname><given-names>C</given-names> </name><etal/></person-group><article-title>Open problems and fundamental limitations of reinforcement learning from human feedback</article-title><source>arXiv</source><access-date>2026-05-02</access-date><comment>Preprint posted online on  Jul 27, 2023</comment><comment><ext-link ext-link-type="uri" xlink:href="https://arxiv.org/abs/2307.15217">https://arxiv.org/abs/2307.15217</ext-link></comment></nlm-citation></ref><ref id="ref27"><label>27</label><nlm-citation citation-type="journal"><person-group person-group-type="author"><name name-style="western"><surname>Lekadir</surname><given-names>K</given-names> </name><name name-style="western"><surname>Frangi</surname><given-names>AF</given-names> </name><name name-style="western"><surname>Porras</surname><given-names>AR</given-names> </name><etal/></person-group><article-title>FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare</article-title><source>BMJ</source><year>2025</year><month>02</month><day>5</day><volume>388</volume><fpage>e081554</fpage><pub-id pub-id-type="doi">10.1136/bmj-2024-081554</pub-id><pub-id pub-id-type="medline">39909534</pub-id></nlm-citation></ref><ref id="ref28"><label>28</label><nlm-citation citation-type="web"><article-title>Ethics and governance of artificial intelligence for health: guidance on large multi-modal models</article-title><source>World Health Organization</source><year>2024</year><access-date>2026-05-02</access-date><comment><ext-link ext-link-type="uri" xlink:href="https://www.who.int/publications/i/item/9789240084759">https://www.who.int/publications/i/item/9789240084759</ext-link></comment></nlm-citation></ref></ref-list></back></article>