Preserving Latvian Language in the AI and LLM Era

In today’s rapidly evolving technological landscape, artificial intelligence (AI) is becoming an integral part of our daily lives and routines. Tools like ChatGPT, founded on large language models trained using immense quantities of text and other data, have become synonymous with AI. However, this technological surge largely overlooks languages of smaller communities, including Latvian. Over 90% of the data utilized in training models like ChatGPT is in English, with the remainder primarily in prominent languages such as German, French, Portuguese, Spanish, and Mandarin. This trend underscores the necessity for Latvia to develop its own national large language model to safeguard and advance the Latvian language in the era of AI.

This concern was possibly highlighted during the recent meeting between the President of Latvia, Edgars Rinkēvičs, and the CEO of OpenAI, Sam Altman. AI solutions, notably those based on large language models, are gaining traction through tools like ChatGPT, Microsoft Copilot, and Gemini. The future could see this technology replacing current tools such as machine translation and speech recognition, aggregating text and images into comprehensive models that enable future innovations we can only begin to envision.

Currently, AI development is dominated by US tech giants like Microsoft, Google, Meta, and Amazon. With access to vast computing resources, intellectual talent, and financial might, these companies create high-quality solutions with English as the primary language. Simultaneously, they keep a keen eye on global markets, including Europe, adjusting large language models to accommodate European languages. The recent acquisition of Finnish Silo.ai by AMD for 665 million US dollars, emphasizing Nordic language models, exemplifies this trend.

In contrast, the European Union (EU) is taking a distinct approach. Instead of relying on industry behemoths, the EU is treating large language model implementation akin to a new Industrial Revolution—ushering in advanced automation. European countries recognize this pivotal moment and have begun developing supercomputers accessible through various innovation programs. As a testament to this commitment, Latvian company Tilde emerged as a victor in the European Commission’s Large AI Grand Challenge, earning access to LUMI, Europe’s most potent supercomputer. This access will be instrumental in Tilde’s endeavor to create a multilingual large language model for Latvian, Lithuanian, and other lesser-spoken European languages, akin to ChatGPT.

The creation of these models involves processing volumes of data surpassing the capacity of traditional European data hubs. The resultant multilingual model will underpin future national language models and AI solutions.

For Latvia, developing a national language model is essential for fostering AI tool use in Latvian and competing globally. Nearly all European countries are embarking on similar initiatives. The Netherlands, for instance, has launched a national program with several million euros allocated for language model development. Poland initiated a one-year project in late 2023 aimed at establishing their own national model. Lithuania and Estonia are also making strides, with Lithuania completing the procurement process for its language model, and Estonia funding the University of Tartu for data identification and collection.

Larger nations like Germany, France, and Spain have already deployed multiple versions of their national language models.

Regarding Latvia’s course of action, a government-driven initiative is imperative, coupled with budget allocation and removal of administrative hurdles due to potential data limitations and confidentiality concerns necessitating data anonymization. Collaboration with academia and data custodians, such as the National Library and Archives, alongside media and industry players like Tilde, is crucial. The intricate lexicological, morphological, and syntactical nature of the Latvian language warrants a specialized approach in AI development.

Beyond being a technological endeavor, the development of a national language model is a cultural and linguistic preservation imperative. AI technology’s benefits are evident in areas like data aggregation and text analysis, enhancing individual and business efficiency and resource productivity.

As a nimble nation, Latvia can rapidly embrace and integrate new technologies. By creating its national large language model, it can ensure the Latvian language’s presence in the future digital world, while reaping economic benefits and strengthening its global competitiveness. Thus, recognizing this opportunity and taking strategic steps towards developing a national language model is crucial for maintaining the Latvian language and culture in the AI era.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Charting New Terrain: Physical Reservoir Computing and the Future of AI

Beyond Electricity: Exploring AI through Physical Reservoir Computing In an era where…

Unveiling Oracle’s AI Enhancements: A Leap Forward in Logistics and Database Management

Oracle Unveils Cutting-Edge AI Enhancements at Oracle Cloud World Mumbai In an…

Challenging AI Boundaries: Yann LeCun on Limitations and Potentials of Large Language Models

Exploring the Boundaries of AI: Yann LeCun’s Perspective on the Limitations of…

The Rise of TypeScript: Is it Overpowering JavaScript?

Will TypeScript Wipe Out JavaScript? In the realm of web development, TypeScript…