Best Of

5 Best Open Source LLMs (July 2024)

Updated on July 2, 2024

Large Language Models (LLMs) have emerged as a cornerstone of today's AI, driving innovations and reshaping the way we interact with technology.

As these models become increasingly sophisticated, there's a growing emphasis on democratizing access to them. Open-source models, in particular, are playing a pivotal role in this democratization, offering researchers, developers, and enthusiasts alike the opportunity to delve deep into their intricacies, fine-tune them for specific tasks, or even build upon their foundations.

In this blog, we'll explore some of the top open-source LLMs making waves in the AI community. Each brings its unique strengths and capabilities to the table.

1. Llama 3

Metas LLAMA 3 Just STUNNED Everyone! (Open Source GPT-4)

Watch this video on YouTube

Meta's Llama 3 represents a monumental leap forward in their open-source large language model lineup. As the successor to the groundbreaking Llama 2 released in 2023, Llama 3 establishes a new state-of-the-art for openly available models at the 8B and 70B parameter scales. This isn't just an incremental update; it's a transformative advancement that will enable developers to build cutting-edge natural language applications while spurring open research and innovation in AI.

Llama 3's unrivaled performance is thanks to major improvements in its pretraining process and architecture. The model was trained on a massive dataset of over 15 trillion tokens from publicly available sources, an astounding 7 times more data than Llama 2. This includes 4 times more code data to boost Llama 3's coding capabilities, as well as significant coverage of 30+ languages to lay the foundation for future multilingual versions. Extensive filtering was used to curate this data, ensuring Llama 3 learned from only the highest quality sources.

But Llama 3's enhancements go beyond just more data. Cutting-edge optimizations to the model's architecture and training process have substantially improved its reasoning abilities, code generation, instruction following, and response diversity. An improved tokenizer makes Llama 3 up to 15% more token efficient than its predecessor. Grouped query attention allows the 8B model to maintain inference parity with the previous 7B model.

Source: Meta

The end result is a language model that excels at a wide variety of complex language tasks:

Creative Generation: Llama 3 can generate highly coherent and creative text in the form of stories, scripts, musical pieces, poems, and more.
Coding and Reasoning: Thanks to its enhanced code training data, Llama 3 boasts incredibly strong coding and logical reasoning skills for tackling intricate problems.
Question Answering: By connecting information across its broad knowledge base, Llama 3 can provide deeply knowledgeable answers to questions on diverse topics.
Summarization: Llama 3 is adept at producing concise yet comprehensive summaries of long articles and factual content.
Instruction Following: One of Llama 3's most impressive feats is its ability to accurately follow complex multi-step instructions for open-ended tasks.

The future is bright for the Llama series. Meta is already developing versions of Llama 3 with over 400B parameters that are not only larger but also multilingual and multimodal. Early testing shows these ultra-large-scale models delivering promising results competitive with the best proprietary systems.

Source: Meta

Visit Llama 3 →

2. Bloom

Open Source Bloom AI Introduction

Watch this video on YouTube

In 2022, the BLOOM project was unveiled after a year-long collaborative effort led by AI company Hugging Face involving over 1,000 volunteer researchers from more than 70 countries. BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176 billion parameter large language model designed for autoregressive text generation, capable of extending a given text prompt to generate coherent stories, scripts, poetry, articles, and more.

What sets BLOOM apart is its open-access nature – the model, source code, and training data are all freely available under open licenses, in contrast to most other large language models developed by tech companies. This openness invites ongoing examination, utilization, and enhancement of the model by the broader AI community.

BLOOM boasts impressive multilingual capabilities, having been trained on a vast 1.6TB dataset (the ROOTS corpus) spanning 46 natural languages and 13 programming languages, with over 30% of the data being English. For many languages like Spanish and Arabic, BLOOM is the first model of its size.

The model was trained over 3.5 months on the Jean Zay supercomputer in France using 384 NVIDIA A100 GPUs, made possible by a compute grant from the French government – equating to over 5 million hours of compute. Based on the GPT architecture with modifications, BLOOM achieves competitive performance on benchmarks.

Key Strengths of BLOOM:

Open-Access: BLOOM's model, code and training data are freely available, democratizing access to powerful language models and enabling open research.
Multilingual Proficiency: Trained on data spanning 46 natural languages and 13 programming languages, BLOOM has extensive multilingual capabilities.
Versatile Language Skills: From text generation to question answering, summarization, translation, and code generation, BLOOM excels at a variety of language tasks.
Responsible AI Development: BLOOM was developed with a focus on responsible AI practices and is released under a license prohibiting malicious use cases.
Easy Deployment: Developers can access BLOOM through the Hugging Face Transformers library and deploy it using Accelerate.

Looking ahead, the BigScience team plans to expand BLOOM to more languages, compress the model, and use it as a starting point for more advanced architectures. BLOOM represents a major step in making large language models more transparent and accessible to all.

Visit Bloom →

3. MPT-7B

MPT-7B - The First Commercially Usable Fully Trained LLaMA Style Model

Watch this video on YouTube

MosaicML Foundations has made a significant contribution to this space with the introduction of MPT-7B, their latest open-source LLM. MPT-7B, an acronym for MosaicML Pretrained Transformer, is a GPT-style, decoder-only transformer model. This model boasts several enhancements, including performance-optimized layer implementations and architectural changes that ensure greater training stability.

A standout feature of MPT-7B is its training on an extensive dataset comprising 1 trillion tokens of text and code. This rigorous training was executed on the MosaicML platform over a span of 9.5 days.

The open-source nature of MPT-7B positions it as a valuable tool for commercial applications. It holds the potential to significantly impact predictive analytics and the decision-making processes of businesses and organizations.

In addition to the base model, MosaicML Foundations is also releasing specialized models tailored for specific tasks, such as MPT-7B-Instruct for short-form instruction following, MPT-7B-Chat for dialogue generation, and MPT-7B-StoryWriter-65k+ for long-form story creation.

The development journey of MPT-7B was comprehensive, with the MosaicML team managing all stages from data preparation to deployment within a few weeks. The data was sourced from diverse repositories, and the team utilized tools like EleutherAI’s GPT-NeoX and the 20B tokenizer to ensure a varied and comprehensive training mix.

Key Features Overview of MPT-7B:

Commercial Licensing: MPT-7B is licensed for commercial use, making it a valuable asset for businesses.
Extensive Training Data: The model boasts training on a vast dataset of 1 trillion tokens.
Long Input Handling: MPT-7B is designed to process extremely lengthy inputs without compromise.
Speed and Efficiency: The model is optimized for swift training and inference, ensuring timely results.
Open-Source Code: MPT-7B comes with efficient open-source training code, promoting transparency and ease of use.
Comparative Excellence: MPT-7B has demonstrated superiority over other open-source models in the 7B-20B range, with its quality matching that of LLaMA-7B.

Visit MPT-7B →

4. Falcon 2

Deploy FALCON-180B Instantly! The NEW #1 Open-Source AI Model

Watch this video on YouTube

*Video on Falcon 180B, the predecessor to Falcon 2

Falcon 2 is the latest generation of open-source large language models from the Technology Innovation Institute (TII) in Abu Dhabi, building upon the success of their earlier Falcon 7B, 40B, and 180B models released in 2023. The Falcon 2 series currently includes:

Falcon 2 11B: An 11 billion parameter causal decoder-only model that outperforms Meta's LLaMA 3 8B and performs on par with Google's Gemma 7B model on standard benchmarks, as verified by the Hugging Face leaderboard.
Falcon 2 11B VLM: A groundbreaking multimodal version of Falcon 2 11B with vision-to-language capabilities, making it one of the only open-source models to offer this functionality.

Source: TII

Falcon 2 models are fully open sourced under the permissive TII Falcon License 2.0, based on Apache 2.0 but with an acceptable use policy to promote responsible AI development. This allows free use of the models for research and most commercial applications.

The Falcon 2 models were trained on over 5 trillion tokens from the enhanced RefinedWeb dataset, which includes a diverse mix of high-quality web data, books, technical writing, code, and conversations. Extensive filtering and deduplication techniques were used to extract the best data. While still primarily English-focused, a portion of the training data covers other languages like German, Spanish, French, and Italian, laying the groundwork for future multilingual models.

Falcon 2 utilizes an optimized decoder-only transformer architecture that enables strong performance at a smaller scale compared to other open models. TII plans to further boost efficiency using techniques like mixture-of-experts in upcoming releases.

In terms of raw capabilities, Falcon 2 11B excels at a wide range of natural language tasks, including:

Text generation of coherent long-form content like stories and articles
Knowledgeable question answering by connecting information on diverse topics
High-quality summarization of long articles or factual content
Accurate instruction following when fine-tuned
Solid performance on coding and reasoning benchmarks

The Falcon 2 11B VLM variant adds the unique ability to understand images and generate text based on both visual and language inputs. This enables powerful multimodal use cases like visual question answering, image captioning, and vision-to-language reasoning.

Looking ahead, TII has shared plans to expand the Falcon 2 series with larger model sizes while maintaining a focus on efficiency and open access. Techniques like mixture-of-experts will be leveraged to scale up capabilities without drastically increasing computational requirements.

Visit Falcon 2 →

5. Vicuna-13B

Run Vicuna-13B On Your Local Computer 🤯 | Tutorial (GPU)

Watch this video on YouTube

LMSYS ORG has made a significant mark in the realm of open-source LLMs with Vicuna-13B. This open-source chatbot has been meticulously trained by fine-tuning LLaMA on around 70K user-shared conversations sourced from ShareGPT.com using public APIs. To ensure data quality, the conversations were converted from HTML back to markdown and filtered to remove inappropriate or low-quality samples. Lengthy conversations were also divided into smaller segments that fit the model's maximum context length.

Preliminary evaluations, with GPT-4 acting as the judge, indicated that Vicuna-13B achieved more than 90% quality of renowned models like OpenAI ChatGPT and Google Bard. Impressively, Vicuna-13B outperformed other notable models such as LLaMA and Stanford Alpaca in over 90% of cases at the time. The entire training process for Vicuna-13B was executed at a cost of approximately $300, leveraging techniques like spot instances, gradient checkpointing, and flash attention to optimize memory usage and reduce costs. For those interested in exploring its capabilities, the code, weights, and an online demo have been made publicly available for non-commercial purposes.

The training recipe for Vicuna builds upon Stanford's Alpaca model with several key improvements:

Multi-turn conversations: The training loss is adjusted to account for multi-turn conversations, computing the fine-tuning loss solely on the chatbot's output.
Memory optimizations: The max context length is expanded from 512 in Alpaca to 2048 in Vicuna, enabling understanding of longer context at the cost of increased GPU memory requirements. This is addressed through gradient checkpointing and flash attention.
Cost reduction: 40x larger dataset and 4x sequence length presented challenges for training expenses, but employing managed spot instances via SkyPilot reduced costs significantly – from $82K to $140 for the 7B model and from $135K to $300 for the 13B model.

To serve Vicuna, a distributed serving system was built capable of handling multiple models with workers that can be flexibly plugged in from on-premise clusters or the cloud. Utilizing fault-tolerant controllers and managed spot instances enables this system to work well with cheaper spot instances from multiple clouds to minimize serving costs. While currently a lightweight implementation, work is underway to integrate the latest research to further enhance the serving infrastructure.

Key Features of Vicuna-13B:

Open-Source Nature: Vicuna-13B is available for public access, promoting transparency and community involvement.
Extensive Training Data: The model has been trained on 70K user-shared conversations, ensuring a comprehensive understanding of diverse interactions.
Cost-Effective Training: Techniques like managed spot instances, gradient checkpointing, and flash attention enabled cost-effective training at around $300 for the 13B model.
Enhanced Training Recipe: Vicuna builds on the Alpaca recipe with improvements for multi-turn conversation handling, memory optimization, and cost reduction.
Distributed Serving Infrastructure: A flexible and cost-effective distributed serving system was built to make Vicuna publicly accessible.
Online Demo Availability: An interactive online demo is available for users to test and experience the capabilities of Vicuna-13B.

It's important to note that the analysis was based on preliminary non-scientific evaluations using GPT-4. Rigorous evaluation is still needed.

Visit Vicuna-13B →

The Expanding Realm of Large Language Models

Large language models are a rapidly advancing field, with new models consistently pushing the boundaries of performance and capabilities. The open-source nature of the LLMs discussed in this article demonstrates the collaborative spirit within the AI community and lays the foundation for future innovation.

These models represent the current state-of-the-art in LLM technology. Open-source models will undoubtedly play a significant role in driving further advancements in this domain.

For researchers, AI enthusiasts, and those interested in exploring the potential applications of these models, now is an opportune time to engage with and leverage the extensive capabilities offered by cutting-edge open-source LLMs.