ETtech Explainer: Here are nations at the forefront of GenAI innovation

ETtech Explainer: Here are nations at the forefront of GenAI innovation

According to data science community platform Hugging Face, there has been a rapid influx of new LLMs, with hundreds being announced every week. Today, its repository hosts 450,706 open-source language models and serves over 1 million model downloads a day. China China is in a neck-and-neck competition with its geopolitical rival US in terms of the number of models it releases. According to Beijing’s ministry of science and technology, Chinese organisations released 2 LLMs compared with 11 in the US in 2020. In 2021, this figure was 30 each for both countries. And, in 2023 China released 28 LLMs against the US’ 37. Some of China’s popular releases include DeepSeek, a 67 billion parameter model trained on English and Chinese. Discover the stories of your interest E-commerce giant Alibaba Group Holding’s research unit Damo Academy launched the Southeast Asia LLM (SeaLLM) trained on Vietnamese, Indonesian, Thai, Malay, Khmer, Lao, Tagalog, and Burmese data sets. Alibaba’s small language 7-billion parameter model Qwen-VL, is a multimodal tool that can comprehend images, texts, and bounding boxes in prompts. Another Chinese startup called 01.AI became a unicorn in less than eight months since its inception after it released the Yi-34B model trained on 34 billion parameters. Singapore Singapore also took a similar initiative as China. It anchored the SEA-LION (Southeast Asian Languages In One Network) family of LLMs that are pre-trained for the Southeast Asian (SEA) region. Currently, SEA-LION has two SLMs in 3-billion and 7-billion parameters. Following this initiative, Singapore’s telecom and media regulator IMDA, its ministry of science and other agencies announced a sum of SGD $70 million ($52.3 million) to be invested on multimodal AI research. Singaporean startup WIZ.AI also released its 13 billion parameter LLM tailored for Bahasa Indonesia. Korea Naver Corp., South Korea’s online giant, debuted in the LLM space with launch of its humongous 204-billion parameter LLM HyperCLOVA X trained with news articles published over the past five decades and blog data accumulated over nine years. Korean startup Upstage built Solar LLM, which has been lately ranked among top LLMs on Hugging Face’s leaderboard for open-source LLMs. KT Corp., South Korea’s number two mobile carrier, has also joined hands with Thailand’s Jasmine Group to together create an LLM for Thai language. UAE The Technology Innovation Institute, an Emirati research centre in Abu Dhabi, released the Falcon 180B, after unveiling Noor last year, which was the world’s first Arabic model. UAE also came out with Jais, a 13-billion parameter model, trained on an Arabic and English dataset on Condor Galaxy, one of the largest cloud AI supercomputers in the world. India Indian tech majors and startups are also in the fray for AI innovation. Conversational AI startup Corover.ai recently launched BharatGPT, a 7 billion parameter model trained in 14 Indian languages across text, voice, and video interactions. Sarvam AI also released the first open-source Hindi language model called OpenHathi-Hi-0.1 built on Meta’s LlaMa 2-7B model. Mobility unicorn Ola dropped the Krutrim LLM which can comprehend 22 Indian languages and can form responses in 10 languages. IT services leader Tech Mahindra is also working on ‘Project Indus’, an LLM trained in Hindi and 37 Indic dialects. Others Besides these, Japan’s Rakuten and France’s Mistral have also forayed into the LLM race with models trained in Japanese and European languages respectively.