Expert Speak Young Voices
Published on Apr 21, 2025

India’s consent-driven data rules may unintentionally hinder AI by limiting access to the large datasets essential for training

AI at a Crossroads: Navigating Consent-Centric Data in India

Image Source: Getty

In the purview of Artificial Intelligence (AI), data is a key driving force for training advanced AI models. Advanced AI systems such as Large Language Models (LLMs) thrive on large volumes of high-quality datasets. However, the Indian Digital Personal Data Protection (DPDP) Act and its rules based on express, informed, and continuous consent pose ethical and practical considerations. This article aims to identify the implications of the DPDP Act’s consent-centric nature on AI development, especially in sectors that require curated, proprietary data.

The Consent-Centric Data Governance

India's DPDP Act represents a significant milestone in the country’s data protection approach. The official DPDP Rules emphasise that every data point shall be collected according to the principle of consent by the data subject. It also excludes publicly available data in some instances. Unlike the European Union's (EU) General Data Protection Regulation (GDPR) and Brazil's Lei Geral de Proteção de Dados (LGPD) or the General Personal Data Protection Act, this framework narrowly recognises only consent as a valid processing basis, overlooking alternative legal mechanisms such as contractual necessity and legitimate interests that provide processing flexibility under leading international data protection regimes. With rapidly unfolding AI development, consent as the basis for data protection is working at cross purposes with the prevailing mode of data collection to train large AI models. While the DPDP Act aims to protect an individual's rights in a way that makes data collection practices more transparent and accountable, this regulatory development comes at a time when AI developers increasingly need data that is not easily accessible to the public. Ernst & Young’s (EY) comprehensive reports on sectoral AI development highlight the essentiality of high-quality, carefully curated datasets for effective LLM training. This finding is further corroborated by India AI's analyses of the specific challenges involved in developing generative models. The focus on explicit, granular consent poses a significant conundrum in such a context. How can the underlying consent-centric framework for data protection be reconciled with data requirements for AI innovation?

With rapidly unfolding AI development, consent as the basis for data protection is working at cross purposes with the prevailing mode of data collection to train large AI models.

The Conundrum of Curated Data for Sector-Specific AI

The foundation of AI systems like LLMs rests entirely on their training data. In critical sectors such as healthcare, banking, and online advertising, data collection follows regulated protocols, often drawing from exclusive sources inaccessible to the general public. Within the DPDP Act framework, a consent manager is defined as an entity officially registered with the Data Protection Board of India. It provides a transparent, accessible, and interoperable platform that empowers data principals to grant, manage, review, and revoke their consent and serve as the primary intermediary between individuals and businesses. In comparison to a similar framework, account aggregators illustrate how giving customers control over their data not only enhances user experience but also improves data quality. Notably, both consent models are inspired by the Data Empowerment and Protection Architecture (DEPA) framework. However, this consent-based approach creates a fundamental tension in AI development. Requiring case-by-case consent significantly reduces the volume of available training data, creating a complex challenge with multiple dimensions. While consent-centric frameworks aim to build trust and ensure data subjects maintain control, they also introduce new problems for AI innovation. For instance, an additional layer of complexity arises at the intersection of data protection and copyright law. The recent cases, such as the ANI Media Pvt. Ltd case,  highlight the legal issues that develop when curated data protected by copyrights is used to train AI models.

Within the DPDP Act framework, a consent manager is defined as an entity officially registered with the Data Protection Board of India.

Given that LLMs need vast datasets to function, whether they can negotiate consent for each data element, including copyrighted content, is an enduring question. As further exemplified by the FPF event report on responsible data practices, this tension is a matter of debate as to whether the optimal values of curated data can be achieved under a system that requires repeated consent. As research has demonstrated, the frequent requests for permission can lead to what experts call ‘consent fatigue’ or ‘consent exhaustion.’ Users in this state may mechanically approve data usage without fully comprehending the implications. This phenomenon ultimately undermines the quality of consent itself whilst restricting access to high-quality datasets essential for advanced AI applications, potentially negating the very privacy benefits these frameworks are meant to secure.

Global Perspectives on Privacy and Innovation

Outside the Indian context, similar findings give a different perspective to the balance between privacy and innovation. According to the World Economic Forum (WEF) Redesigning Data Privacy Report 2020, applying the consent models in the current environment where AI is data-intensive may be difficult. Similarly, in a white paper by the Stanford Institute for Human-Centred AI on rethinking privacy in the future of AI, there is an argument that privacy cannot remain acceptable only based on individual consent as presupposed by previous frameworks. In the same way, the study of the European Parliament highlighted the conflict between data protection and data utility. These international perspectives substantiate that even though the DPDP Act is ethical in its consent-centric approach, the lack of sufficient flexibility in its implementation may hinder technological advancement.

A flexible framework as envisioned in best practices like those embodied in the EU AI Act, Article 10 of which outlines responsible data governance and management practices and alternative anonymisation techniques such as subjective anonymisation that factors in context in addition to identifiers.

As discussions on sovereign data strategies and AI-driven public interest technologies indicate, the way ahead may have different regulatory methods that consider sectoral dimensions and allow for flexibility in measures. By contrast, a flexible framework as envisioned in best practices like those embodied in the EU AI Act, Article 10 of which outlines responsible data governance and management practices and alternative anonymisation techniques such as subjective anonymisation that factors in context in addition to identifiers. These approaches highlight the importance of contextualising privacy protection within comprehensive risk assessment frameworks, vulnerabilities arising from data linkages, and the consequent risk interfaces create.

Balancing Innovation with Ethical Imperatives

The issue, therefore, is to find the balance between two opposite and equally important goals. On one hand, the ethical and legal positions aim to protect an individual’s privacy by ensuring that they consent knowingly and can withdraw their consent at any time. On the other hand, there are technological demands for big and organised data sets for AI development. As mentioned in the policy and white paper reports, the further development of AI in India depends on the availability of structured data, which can be accessed through specific mechanisms. However, these mechanisms must conform to the DPDP Act. Technological solutions, such as Consent Managers, help in consent management more efficiently and manually while maintaining proper records and audit trails, but they add an extra layer of compliance. Blockchain technology can also be used to make the records of consent unalterable and transparent. When used together with methods such as subjective anonymisation, data analyses can help protect individual identities. These tools create a data environment that respects the subject’s rights and the development of AI technologies. But policy adaptations are also crucial. Standardised consent templates, which have been recommended in several international reports, fail to reduce consent fatigue among the subjects and the researchers. However, there might be a need to allow sector-specific exemptions and regulatory sandboxes owing to the nature of business conducted in some industries that require curated data. For instance, in fields such as the health sector or the financial sector where sector specific frameworks are being applied, regulations could permit limited data sharing with the necessary conditions put in place to protect individuals’ privacy and consent while at the same time providing LLMs and other AI systems with the quality data they need.

By leveraging these intelligent agents, India can preserve individual privacy and autonomy and foster a more agile and sustainable data ecosystem—one that aligns ethical imperatives with the rapid advancements in AI technology.

Looking ahead, an alternative to conventional consent managers is emerging—the integration of AI agents for consent management. AI-driven systems can continuously monitor compliance, dynamically adapt to evolving data landscapes, and significantly reduce the administrative burden imposed by traditional consent management approaches. Unlike static consent managers, AI agents offer dynamic proactive mechanisms that can enforce data protection regulations while mitigating issues like consent fatigue. By leveraging these intelligent agents, India can preserve individual privacy and autonomy and foster a more agile and sustainable data ecosystem—one that aligns ethical imperatives with the rapid advancements in AI technology.

Conclusion

India's consent-based data protection regime, while protecting individual rights through informed consent mechanisms, might create operational challenges for AI innovation. The balance between privacy protection and technological innovation will depend on identifying effective solutions like responsive risk-based regulatory frameworks, including sandboxes and exemptions, and more towards industry-led methods. This will help policymakers and industry leaders collaboratively design an ethical framework conducive to AI-driven progress, ensuring that India remains at the forefront of responsible technological evolution.


Purushraj Patnaik is a Research Intern at the Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.