Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), in collaboration with Petuum and LLM360, announces the launch of K2-65B, a ground-breaking open-source 65-billion parameter large language model (LLM). K2 sets new standards in transparency and performance for open-source artificial intelligence (AI) by providing a blueprint for documenting and studying the full lifecycle of LLMs, including all reproduction details.
Using LLM360’s LLM release framework enables the community-led pursuit of the artificial general intelligence (AGI) vision through peer-reviewed, transparent, reproducible, and collaborative open-source research and development of LLMs. The model is available globally and freely under the Apache 2.0 license. K2-65B is the only third-party reproducible LLM to outperform major private sector state-of-the-art LLMs, such as the Llama 2 70B.
K2 was trained on 1.4 trillion tokens using 480 A100’s inside of NVIDIA’s DGX Cloud, using 35% fewer resources than Llama 2 70B, making it one of the world’s most sustainable models in its class. In certain areas of strategic interest, such as mathematical and logical reasoning, K2 is competitive against much larger models such as GPT-4.
LLMs have become one of the most important tools in natural language processing (NLP), enabling computers to understand and generate text in a way that is similar to how humans communicate. UAE companies have made important progress in this field in recent years, notably the world’s most advanced Arabic LLM Jais – developed in partnership with Core42, MBZUAI, and Cerebras Systems, as the nation continues to pursue a strategy to build its reputation as a key AI destination.
MBZUAI President and University Professor, Eric Xing, who supported the development of K2, as well as other large-scale AI systems under the university’s Institute of Foundation Models (IFM), said: “The launch of K2-65B demonstrates the UAE’s growing prowess in superior LLM development. The model epitomizes the importance of embracing an open and collaborative approach to create LLMs with unmatched performance and efficiency, and the power to bring transformation to all sectors and organizations.”
Trained in two stages, K2-65B underwent rigorous evaluation through 22 multidisciplinary assessments, ensuring a comprehensive performance assessment across various domains including math, coding, and medicine, among others, with the new model surpassing Llama 2 70B across each of these areas.
Even in competitive arenas like the Open LLM Leaderboard, K2-65B demonstrated high-quality performance metrics. The chat model, K2-Chat, outperforms Llama 2 70B Chat in every aspect of evaluation, underscores its capabilities in understanding and generating human-like responses across diverse scenarios.
Petuum’s Head of Engineering and the lead developer of the model, Hector Liu, said: “Providing the blueprint for reproducing K2 is paramount to advancing the world’s knowledge, research ability, and development options for large language models. Releasing a model of this size and quality, along with reproduction steps, will have reverberating positive effects for the open-source ecosystem as the community engages with the model and benefits from our learnings, similar to what we have seen with prior LLM360 models, such as Amber and CrystalCoder.”
What sets K2-65B apart from existing LLMs is its full transparency enabled through the LLM360 Pretraining and Developer Suites. Equipped with detailed training guides, intermediate checkpoints, and evaluation results, K2-65B ensures reproducibility and auditability throughout its development process.
Using less computing power than comparable LLMs, K2-65B supports greater operational efficiency and reduced energy consumption, enabling users around the world to adhere to sustainable computing practices.
In addition, the LLM360 Research Suite offers a wealth of artifacts to study training dynamics, providing researchers and developers with invaluable resources for further exploration.
Looking ahead, the developers plan to incorporate image understanding capabilities and ongoing development and evaluation initiatives aimed at continually enhancing its performance and versatility.