Jais
Jais is the world’s most advanced open-source Arabic Large Language Model (LLM). Developed in the UAE, Jais is the result of a collaboration between Inception, a G42 company focused on expanding the boundaries of AI, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) -- the world’s first graduate-research university dedicated to AI -- and Cerebras Systems.

Jais is the world’s most advanced open-source Arabic Large Language Model (LLM). Developed in the UAE, Jais is the result of a collaboration between Inception, a G42 company focused on expanding the boundaries of AI, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) -- the world’s first graduate-research university dedicated to AI -- and Cerebras Systems.
MEET JAIS: THE ARABIC LLM
01
Overview
Jais is the world’s most advanced open-source Arabic Large Language Model (LLM). Developed in the UAE, Jais is the result of a collaboration between Inception, a G42 company focused on expanding the boundaries of AI, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) -- the world’s first graduate-research university dedicated to AI -- and Cerebras Systems.
02
Parameters
Jais is a 13-billion parameter model trained on 116 billion Arabic tokens and 279 billion English tokens of data.
03
The LLM for the Arab world
By enabling more than 400 million Arabic speakers to benefit from the opportunities afforded by generative AI, Jais will help to accelerate innovation and global partnerships. Jais represents an important evolution and expansion of the Natural Language Processing (NLP) AI landscape in the Middle East and already outperforms existing Arabic LLMs by a sizable margin.
04
Inside the model
Jais is a transformer-based large language model that incorporates many cutting-edge features, including ALiBi position embeddings, which enable the model to extrapolate to much longer inputs, providing better context handling and accuracy. Other state-of-the-art techniques include SwiGLU (Swish-Gated Linear Unit) and maximal update parameterization to improve the model’s training efficiency and accuracy.
05
Training Jais
Jais was trained on Condor Galaxy, a multi-exaFLOP AI supercomputer built by G42 and Cerebras. The 13-billion parameter open-source model was trained on a unique and purpose-built dataset of 116 billion Arabic tokens that was designed to capture the complexity, nuance, and richness of the Arabic language. The dataset includes 279 billion English word tokens to increase the model’s performance through cross-language transfer.
Disclaimer: By using Jais, you acknowledge and accept that, as with any large language model, it may generate incorrect, misleading and/or offensive information or content. The information is not intended as advice and should not be relied upon in any way, nor are we responsible for any of the content or consequences resulting from its use. We are continuously working to develop models with greater capabilities, and as such, welcome any feedback on the model.
Copyright Inception Institute of Artificial Intelligence Ltd.
JAIS is made available under the Apache License, Version 2.0 (the “License”). You shall not use JAIS except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, JAIS is distributed on an AS IS basis, without warranties or conditions of any kind, either express or implied. Please see the terms of the License for the specific language permissions and limitations under the License.