GenAI's Trust Deficit: Why Frontier Models Struggle Across Institutions

Despite massive investment in AI, overestimation of LLM capabilities and persistent trust deficits continue to impede reliable institutional adoption at scale

The 2025 Stanford AI Index reported that organisational AI use rose from 55 percent in 2023 to 78 percent in 2024, and that global private investment in generative AI reached US$33.9 billion in 2024. Yet experimentation has not translated into widespread enterprise transformation. While many organisations have adopted AI through pilot projects, 50 percent of generative AI projects are abandoned. A 2025 MIT Project NANDA report found that 95 percent of enterprise GenAI pilots delivered no measurable profit-and-loss impact, suggesting that the bottleneck lies not in access to models but in integration, workflow redesign, organisational learning, and related organisational challenges. As a result, many firms have struggled to convert AI into durable, measurable value.

There is also a broader problem of an AI trust deficit. A KPMG report concluded that more than half of the 48,000 people surveyed across 47 countries do not trust AI. People are most sceptical about its societal impact, security, and safety, even as adoption continues to increase worldwide. Research also suggests that people are averse to AI in certain domains but not in others, especially where they believe it is unnecessary. There is significant skepticism about the ethics of AI companies and the bias present in dominant models today. Together, these factors may also contribute to broader adoption patterns across the industry.

A KPMG report concluded that more than half of the 48,000 people surveyed across 47 countries do not trust AI. People are most sceptical about its societal impact, security, and safety, even as adoption continues to increase worldwide.

Large Language Models (LLMs) continue to improve, but many researchers now question whether scaling language prediction can produce the capabilities required for reliable autonomy. French-American computer scientist Yann LeCun has argued that current AI systems remain too limited for tasks such as domestic robotics or fully autonomous driving because they lack a grounded understanding of the physical world. His proposed alternative—systems based on world models, reasoning, planning, and prediction of real-world dynamics—reflects a broader concern that linguistic fluency, coding proficiency and passing general knowledge tests are not the same as intelligence. Frontier firms are also running out of quality real-world data to keep scaling pre-training.

While capabilities as measured by benchmarks might be improving, advances in benchmark performance and multimodality have not resolved the more difficult problems of reliability, explainability, and institutional trust. This can also be called “the hard problem of AI”, mirroring the hard problem of consciousness. If the hard problem of consciousness asks why subjective experience arises from physical processes, the hard problem of AI asks why intelligent-looking outputs do not yet amount to reliable intelligence capable of widespread adoption. GenAI is useful for bounded tasks and impressive in demonstrations, but the gap between experimentation and transformation has become clearer in recent months.

Limits of Deployment and Architecture

The barriers to enterprise AI adoption are often described as external implementation problems: weak or legacy infrastructure, privacy and security risks, unclear KPIs, regulatory uncertainty, and related challenges. While these remain valid concerns, this framing understates the underlying foundational problem regarding what these systems are actually capable of. Further, the real-world implementation is where the nature of the technology is tested. If a system requires extensive redesign of workflows, constant monitoring, and intense human supervision before it can be trusted, then these requirements constitute a core part of its real-world cost. In other words, there is an inherent limitation to adoption arising from the technology’s limited capabilities.

The empirical evidence points in this direction. A 2025 enterprise AI adoption report, based on 1,200 generative, agentic, and traditional AI use cases, found that only 31 percent of prioritised AI use cases had reached full production. It also found that enterprises had spent an average of US$1.3 million on AI initiatives, while only one in four initiatives achieved the expected return on investment (ROI) in terms of growth, and only half achieved efficiency gains at the expected level. This suggests that the cost-benefit equation remains uncertain, partly due to gaps in AI literacy but also due to heightened expectations regarding efficiency gains. Thus, organisations are investing heavily, but the value generated remains uneven and difficult to capture fully.

The cost-benefit equation remains uncertain, partly due to gaps in AI literacy but also due to heightened expectations regarding efficiency gains. Thus, organisations are investing heavily, but the value generated remains uneven and difficult to capture fully.

The adoption problem cannot be reduced to organisational inertia or simply external problems. The gains from AI may be real, but they are often diffuse, while the costs are immediate, concrete, and institutionally risky. Thus, the cost of making systems reliable often rises faster for most organisations than the value they generate. This helps explain why firms such as OpenAI and Anthropic have moved towards deployment services and consulting-like enterprise models, because standard models alone are not enough to drive widespread adoption. Enterprises need more than just access to wrappers, open-source models and APIs. They also require the socio-technical system needed to enhance trust in it and manage real-world deployment problems.

Mistaking Expectations for Reality

The adoption problem is also rooted in the fundamental architecture of current GenAI systems. LLMs are not systems that are truly intelligent in the human sense. They are probabilistic prediction systems trained on a vast corpora of text, code, images, and other data. This gives them artificial fluency, but fluency is not the same as grounded intelligence. While LLMs can produce plausible outputs, they do so without knowing whether those outputs are true or understanding their meaning. They can also assist with reasoning-like tasks through chain-of-thought approaches, but they do not understand causality, institutional responsibility, or real-world consequences.

The problem is not simply that GenAI fails, but that it can fail persuasively and often in ways that are difficult for humans to verify. For example, autonomous agents can fail spectacularly mid-way through a process, with the failure remaining invisible until the operation is completed. A recent example is the reported PocketOS (a car software business) incident. In April 2026, the PocketOS founder said that an AI coding agent using Cursor and powered by Anthropic’s Claude deleted the company’s entire production database and backups. This shows that the issue is not merely hallucination, but delegated action without reliable situational awareness, an understanding of consequences, or an understanding of constraints.

Autonomous agents can fail spectacularly mid-way through a process, with the failure remaining invisible until the operation is completed.

This also points to a deeper research gap. If the goal is reliable autonomous intelligence, then scaling language prediction and deep learning may be an incomplete path. Replicating human intelligence is incredibly difficult because it is not just a matter of prediction and pattern recognition. It is embodied, spatial, social, memory-dependent, and action-oriented. Humans build internal models of the world through experience, test expectations against reality, navigate physical and social environments, and continuously update their behaviour through feedback. Today’s systems can be useful within bounded workflows, but their reliability is likely to remain limited when they are expected to operate autonomously in open-ended, real-world environments.

However, the current AI investment cycle remains heavily concentrated around larger models, more compute, more data, and enterprise deployment layers. At the same time, a smaller set of companies and research labs is exploring world models, spatial intelligence, embodied AI, neurosymbolic systems, and related approaches. These approaches remain comparatively undercapitalised relative to the capital flowing into LLM scaling and commercial deployment. This imbalance matters because trustworthy autonomy may require systems that can reason about the world and take sensible actions, not merely generate plausible representations.

The Future of GenAI Use

The deployment plateau does not imply that generative AI will suddenly vanish from organisational life. Rather, it suggests that its most durable role may be narrower than early expectations suggested. The shift by frontier AI firms towards deployment services points to a service-led future for GenAI: one focused on helping companies function more efficiently and streamline many low-skilled or entry-level tasks. This is commercially significant, but it is not the same as the emergence of autonomous institutional intelligence based solely on current LLM-based approaches. A more likely path may involve a combination of alternative approaches tailored to different sectors.

Replicating human intelligence is incredibly difficult because it is not just a matter of prediction and pattern recognition. It is embodied, spatial, social, memory-dependent, and action-oriented.

Today’s GenAI deployments are likely to remain bounded and department-specific, optimised for specialised tasks. These tools may improve productivity and reduce operational friction, but they will still require human supervision, verification, governance, and accountability. The core limitation remains: LLMs can generate useful outputs, but they cannot yet be depended upon to act autonomously across complex, high-stakes institutional environments. The core issues of trust, explainability, and reliability must also be addressed before adoption can translate into broad-based gains.

Ishita Deshmukh is a research intern at the Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.

PREV NEXT

Author

Ishita Deshmukh

Ishita Deshmukh is a Research Assistant with ORF’s Centre for Security, Strategy & Technology. Her work focuses on how artificial intelligence is reshaping national security, economic ...

Expert Speak Young Voices

Published on Jun 01, 2026

Limits of Deployment and Architecture

Mistaking Expectations for Reality

The Future of GenAI Use

Author

Ishita Deshmukh

Related Search Terms

Publications

Counter-OSINT and Its Implications for India’s Security Strategy

International Affairs | Internal Security

Jun 16, 2026

Who Will Make India Rich?

Indian Economy

Jun 15, 2026

Essay Series

Long-form

Progammes & Centres

Location

About ORF

Engage

People

GenAI's Trust Deficit: Why Frontier Models Struggle Across Institutions

Published on Jun 01, 2026

Limits of Deployment and Architecture

Mistaking Expectations for Reality

The Future of GenAI Use

Author

Ishita Deshmukh

Related Search Terms

Publications

International Affairs | Internal Security

Jun 16, 2026

Jun 15, 2026