Building Foundational Models Under IndiaAI Mission: MeitY Clarifies

The Government of India is making strides in developing platforms to host datasets, aiming to provide access to specific sources as needed. During a recent Q&A session focused on calls for proposals for foundational AI models under the IndiaAI Mission, IT Ministry’s Additional Secretary, Abhishek Singh, highlighted these initiatives.

Once the foundational model is constructed, various individuals and entities will gain access to these datasets. However, any entity wishing to procure proprietary datasets must detail the associated costs within their proposal. This discussion stemmed from questions regarding the sourcing of training data for foundational models and ownership rights, which remain with the developing entity. While these entities can monetize their models, the government plans to utilize these models for public services across multiple sectors, intending to craft licensing agreements with the entities involved in development.

This informative session was a follow-up to Union IT Minister Ashwini Vaishnaw’s call for proposals to develop foundational AI models based on Indian datasets. Alongside IndiaAI Mission advisor Aakrit Vaish, Singh addressed various questions about the submission for Foundational Models proposals. Here’s a breakdown of key clarifications and insights:

Understanding Foundational Models

Providing clarity on what defines a foundational model, Vaish articulated that it encompasses models involving speech, video, and other relevant subsets. According to him, any model applying large volumes of training data for generative purposes falls under this category. Additionally, the mission considers models built from the ground up to solve specific vertical use cases, such as in healthcare. It was emphasized that refining existing models does not qualify as foundational model development; for example, building on an existing Llama 3 would be categorized as application development instead.

Developing Models with Open-Source Architecture

Singh elaborated on creating foundational models using open-source architecture, which depends heavily on the level of ingenuity. Proposals should address critical questions like the distillation method, which is a technique used to transfer knowledge from a pre-trained AI model into a smaller, more deployable version. This process has been a topic of discussion due to its potential implications, as highlighted by controversies such as OpenAI’s data theft accusations against DeepSeek.

Singh also mentioned that the mission has empaneled service providers for computing infrastructure, part of whose funding will be subsidized, as detailed in the request for empanelment (RFE).

The IndiaAI Datasets Platform

Regarding the IndiaAI datasets platform, Vaish compared its development to Hugging Face, an open-source machine learning platform facilitating AI model training, building, and deployment. Currently in beta, India’s platform plans to host datasets from both governmental and private sectors. First announced in October 2024, initial dataset collections focus on areas like astrophysics and biological systems.

Compliance and Proposal Requirements

The Government emphasized the necessity for entities to comply with all relevant legislation, including the Digital Personal Data Protection (DPDP) Act, 2023. This stems from queries regarding the responsibility of reviewing datasets for compliance.

While the evolving demands of the proposal offer some flexibility, applicants should be as specific as possible in their problem descriptions and approaches. It is crucial for applicants to explain the tactical nature of their models concisely. The IndiaAI Mission will hold them accountable for achieving specified milestones rather than focusing on minutiae.

Proposals must outline costs related to dataset mining and cleaning, personnel salaries, and training compute resources.

Focus on India-Specific Models

The government emphasized that models must cognizantly reflect the Indian cultural context, addressing multiple Indian languages where feasible. This requirement aligns with the overarching goal of developing ingeniously designed models that resonate with India’s diverse linguistic and cultural landscape.

This initiative underlines India’s commitment to fostering indigenous AI development, potentially transforming numerous sectors with AI-driven innovation and solutions tailored to the country’s unique requirements.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Charting New Terrain: Physical Reservoir Computing and the Future of AI

Beyond Electricity: Exploring AI through Physical Reservoir Computing In an era where…

Unveiling Oracle’s AI Enhancements: A Leap Forward in Logistics and Database Management

Oracle Unveils Cutting-Edge AI Enhancements at Oracle Cloud World Mumbai In an…

Challenging AI Boundaries: Yann LeCun on Limitations and Potentials of Large Language Models

Exploring the Boundaries of AI: Yann LeCun’s Perspective on the Limitations of…

The Rise of TypeScript: Is it Overpowering JavaScript?

Will TypeScript Wipe Out JavaScript? In the realm of web development, TypeScript…