Soket AI Labs and Google Cloud Unveil ‘Pragna-1B’: A Milestone in Open-Source, Multilingual AI for India
In a significant leap towards enhancing AI inclusivity and innovation, Soket AI Labs, a trailblazer in the Indian AI research domain, has proudly introduced ‘Pragna-1B,’ marking a pioneering move with the unveiling of India’s inaugural fully open-source multilingual foundation model. This ground-breaking endeavor, realized in partnership with Google Cloud, is poised to revolutionize the generative AI landscape in India by offering unparalleled support for vernacular languages, including but not limited to Hindi, English, Bengali, and Gujarati.
Abhishek Upperwal, the visionary founder of Soket AI Labs, shared his insights on ‘Pragna-1B,’ emphasizing its exceptional efficiency and its competitive edge in natural language processing tasks. “Leveraging Google Cloud has empowered ‘Pragna-1B’ to achieve remarkable efficiency and performance, even with fewer parameters,” asserted Upperwal. He further elaborated on the model’s unique advantages, stating, “Designed with a deep understanding of vernacular nuances, ‘Pragna-1B’ ensures a well-balanced language representation, enabling swift and effective tokenization, thus catering to the needs of organizations aiming for optimized operations and augmented functionalities.”
The development of ‘Pragna-1B’ is characterized by a singular focus on the Indian demographic, aiming for transparency and clarity, thereby simplifying the integration of AI into diverse enterprise operations. The collaboration leveraged Google Cloud’s sophisticated AI infrastructure to cultivate efficiency and cost-effectiveness in the model’s development process.
Building on this landmark collaboration, Soket AI Labs and Google Cloud are set to expand their partnership. Plans include featuring Soket’s AI Developer Platform on the Google Cloud Marketplace and integrating the Pragna series of models into the Google Vertex AI model registry. This integration aspires to offer developers a seamless and enriched experience in model fine-tuning, by blending Soket’s intuitive platform with the robust capabilities of Vertex AI and Google’s Tensor Processing Units (TPUs).
Established in 2019 by Abhishek Upperwal, Soket AI Labs has been at the forefront of creating ‘Bhasha,’ a collection of premium datasets tailored for the training of Indian language models. This assortment includes ‘Bhasha-wiki,’ a compilation of 44.1 million articles translated from English Wikipedia into six Indian vernaculars, and ‘Bhasha-wiki-indic,’ focusing on India-relevant content.
‘Pragna-1B’ embodies a transformer decoder-only architectural framework with 1.25 billion parameters and a context length spanning 2048 tokens. The model’s training involved analyzing around 150 billion tokens, with a concentrated emphasis on Hindi, Bangla, and Gujarati, thereby ensuring optimum performance for vernacular languages within a compact form factor.
In a candid reveal through a LinkedIn post, Upperwal showcased the strides made with ‘Pragna-1B,’ particularly the tokenizer enhancements and a significant boost in vocabulary size, now supporting 200k tokens. Notably, he pointed out the superiority of ‘Pragna-1b’s tokenizer over GPT-4o’s in handling languages like Kannada, Gujarati, Tamil, and Urdu. This comparison not only highlights ‘Pragna-1B’s strengths but also underscores Soket AI Labs’ commitment to further enriching support for Hindi and other Indian languages.
As part of its ongoing vision, Soket AI Labs is delving into the ‘Mixture of Experts’ model, aiming to broaden the language support spectrum and explore various architectures to harness increased optimization. This step marks another stride toward Soket AI Labs’ dedication to molding the future of multilingual AI technology in India.
The launch of ‘Pragna-1B’ by Soket AI Labs, in collaboration with Google Cloud, is a testament to the growing importance of accessible, efficient, and inclusive AI technologies. This initiative is set to pave the way for more nuanced and versatile AI applications across diverse sectors, supporting an array of Indian languages and dialects, thus propelling India’s position on the global AI innovation map.