FANAR

Qatar launches Fanar sovereign large language model

#Qatar #LLMs – The State of Qatar’s much anticipated sovereign large language model has been officially launched on Day One of the World Summit AI in Doha. Called Fanar (tr. lighthouse), the LLM was developed to help close the gap between Arabic language models and global English models, providing users with unparalleled contextual comprehension, linguistic precision, and depth of knowledge. The new artificial intelligence model was developed by the Qatar Computing Research Institute (QCRI) of Hamad Bin Khalifa University (HBKU) sponsored by the Ministry of Communications and Information Technology (MCIT) and in collaboration with other key stakeholders. Fanar 7B LLM and Fanar Prime (an 8.87 parameter model) launch with six Generative AI apps, including a text and voice chatbot which is able to produce multimodal outputs.

SO WHAT? – Following high-profile projects in Saudi Arabia and the UAE to develop sovereign large language models, a growing number of Arab countries have announced intentions to build national models. Early sovereign models have all set goals to provide a very high level of Arabic language capability and local cultural knowledge, drawing on increasingly diverse Arabic language data resources. However, the development of Arabic-centric LLMs is a complex, expensive and time-consuming business. Qatar’s new Fanar LLM joins an elite group of Arabic models that are trained on the biggest and most diverse Arabic language data sets that can be created. We can expect relatively few of these in the short term.

  • Qatar’s sovereign large language model Fanar was launched on the opening day of the World Summit AI in Doha, together with a suite of Arabic apps. The project to develop the Arabic-centric bilingual LLM was initially announced at the Qatar Economic Forum in May. A suite of Arabic language multimodal models support the Fanar platform.
  • Fanar LLM was developed by the Qatar Computing Research Institute (QCRI) of Hamad Bin Khalifa University (HBKU) sponsored by the Ministry of Communications and Information Technology (MCIT) and in collaboration with Qatar UniversityQatar National Library, the Ministry of Endowment and Islamic AffairsAl Jazeera, and the Arab Center for Research and Policy StudiesGoogle Cloud was the key Technology Provider.
  • A Fanar services platform was developed powered by Fanar Star 7B, which is a model developed from scratch and Fanar Prime, a 8.78 parameter model built on a Google Gemma 2 9B model, together with retrieval-augmented generation (RAG) modules for attribution, recency and special RAG module for Islamic queries.
  • Fanar LLM launches with six Generative AI apps:
    • Fanar Chat – A bilingual chatbot that accepts text and voice prompts, able to handle different Arabic dialects, with multimodal outputs (text, speech, images)
    • Taleem – A teaching assistant to support educators in creating lesson material, including summaries, questions & multimedia.
    • Akhbar AI – An AI-powered newsroom assistant that helps editros create content, including news stories, interview plans and visuals.
    • Allama – A government services chatbot that leverages RAG (Retrieval-Augmented Generation), able to answer questions about government processes.
    • Talk to Your Book – An app powered by Fanar and social media platform Mastodon that allows users to takk to virtual agents that represent books.
    • News insights – An app developed by Hyperthink Technology to provide news insights and analysis.
  • The large language model was trained on 1.3 trillion tokens, of which 40% was Arabic language data, 50% ere English language data and 10% code. Developers used a corpus created specifically for the project consisting of 300+ billion words of text data. 400+ billion tokens were Arabic language.
  • Special effort was made to curate data about Qatar’s heritage, culture and traditions, including the country’s colloquial Arabic dialect and Islamic data. Fanar is therefore able to generate accurate and culturally appropriate responses, to enhance the user experience of Arabic speaking users.
  • In benchmark tests conducted by researchers, Fanar Star 7B outperformed Allam 7B and Jais 13B, but not the newer Jais 6p7b model released in September.
  • Training data was aggregated from a variety of sources including data provided by Qatar University, Qatar National Library, the Ministry of Endowment and Islamic Affairs, Al Jazeera, and the Arab Center for Research and Policy Studies.
  • Users will be able to use Fanar for Arabic-English / English-Arabic translation, summarisation, creative writing, empowering companies and institutions to effectively engage their Arabic-speaking audience.
  • It is hoped that the new bilingual model will facilitate the development of Arabic chatbots and virtual assistants for public and private sectors, that are attuned with the local language, culture and laws, and can more effectively engage with their Arabic-speaking audience.
  • QCRI is committed to continually improving the model and has built-in a user-feedback system to allow users to help improve Fanar.
  • Open access to Fanar models and APIs are expected soon.

ZOOM OUT – Qatar is investing billions of dollars in technology innovation and R&D both inside the country and internationally. Since the beginning of 2024, the government has announced plans to invest about $14.5 billion across a variety of sectors locally and globally, with a focus on technology. In February, Qatar announced 10 billion euros investment in France, destined mainly for key technology sectors, such as AI, life sciences, semiconductors, aerospace and energy transition. This month, a £1 billion plan to invest in UK technology sectors was announced, focused on climate tech (and including the formation of a joint Qatar-UK AI research commission). Meanwhile, Qatar plans to invest $2.5 billion in incentives to advance programmes in AI, technology and innovation at home.

Source

Spring Sale 2020

Leave a Reply

Your email address will not be published. Required fields are marked *