stub Complete Beginner's Guide to Hugging Face LLM Tools - Unite.AI
Connect with us

AI Tools 101

Complete Beginner’s Guide to Hugging Face LLM Tools

Updated on

Hugging Face is an AI research lab and hub that has built a community of scholars, researchers, and enthusiasts. In a short span of time, Hugging Face has garnered a substantial presence in the AI space. Tech giants including Google, Amazon, and Nvidia have bolstered AI startup Hugging Face with significant investments, making its valuation $4.5 billion.

In this guide, we'll introduce transformers, LLMs and how the Hugging Face library plays an important role in fostering an opensource AI community. We'll also walk through the essential features of Hugging Face, including pipelines, datasets, models, and more, with hands-on Python examples.

Transformers in NLP

In 2017, Cornell University published an influential paper that introduced transformers. These are deep learning models used in NLP. This discovery fueled the development of large language models like ChatGPT.

Large language models or LLMs are AI systems that use transformers to understand and create human-like text. However, creating these models is expensive, often requiring millions of dollars, which limits their accessibility to large companies.

Hugging Face, started in 2016, aims to make NLP models accessible to everyone. Despite being a commercial company, it offers a range of open-source resources helping people and organizations to affordably build and use transformer models. Machine learning is about teaching computers to perform tasks by recognizing patterns, while deep learning, a subset of machine learning, creates a network that learns independently.  Transformers are a type of deep learning architecture that effectively and flexibly uses input data, making it a popular choice for building large language models due to lesser training time requirements.

How Hugging Face Facilitates NLP and LLM Projects

Hugging face Ecosystem - Models, dataset, metrics, transformers, accelerate, tokenizers

Hugging Face has made working with LLMs simpler by offering:

  1. A range of pre-trained models to choose from.
  2. Tools and examples to fine-tune these models to your specific needs.
  3. Easy deployment options for various environments.

A great resource available through Hugging Face is the Open LLM Leaderboard. Functioning as a comprehensive platform, it systematically monitors, ranks, and gauges the efficiency of a spectrum of Large Language Models (LLMs) and chatbots, providing a discerning analysis of the advancements in the open-source domain

LLM Benchmarks measures models through four metrics:

  • AI2 Reasoning Challenge (25-shot) — a series of questions around elementary science syllabus.
  • HellaSwag (10-shot) — a commonsense inference test that, though simple for humans this metric is a significant challenge for cutting-edge models.
  • MMLU (5-shot) — a multifaceted evaluation touching upon a text model's proficiency across 57 diverse domains, encompassing basic math, law, and computer science, among others.
  • TruthfulQA (0-shot) — a tool to ascertain the tendency of a model to echo frequently encountered online misinformation.

The benchmarks, which are described using terms such as “25-shot”, “10-shot”, “5-shot”, and “0-shot”, indicate the number of prompt examples that a model is given during the evaluation process to gauge its performance and reasoning abilities in various domains. In “few-shot” paradigms, models are provided with a small number of examples to help guide their responses, whereas in a “0-shot” setting, models receive no examples and must rely solely on their pre-existing knowledge to respond appropriately.

Components of Hugging Face


‘pipelines‘ are part of Hugging Face's transformers library a feature that helps in the easy utilization of pre-trained models available in the Hugging Face repository. It provides an intuitive API for an array of tasks, including sentiment analysis, question answering, masked language modeling, named entity recognition, and summarization.

Pipelines integrate three central Hugging Face components:

  1. Tokenizer: Prepares your text for the model by converting it into a format the model can understand.
  2. Model: This is the heart of the pipeline where the actual predictions are made based on the preprocessed input.
  3. Post-processor: Transforms the model’s raw predictions into a human-readable form.

These pipelines not only reduce extensive coding but also offer a user-friendly interface to accomplish various NLP tasks.

Transformer Applications using the Hugging Face library

A highlight of the Hugging Face library is the Transformers library, which simplifies NLP tasks by connecting a model with necessary pre and post-processing stages, streamlining the analysis process. To install and import the library, use the following commands:

pip install -q transformers
from transformers import pipeline

Having done that, you can execute NLP tasks starting with sentiment analysis, which categorizes text into positive or negative sentiments. The library's powerful pipeline() function serves as a hub encompassing other pipelines and facilitating task-specific applications in audio, vision, and multimodal domains.

Practical Applications

Text Classification

Text classification becomes a breeze with Hugging Face's pipeline() function. Here's how you can initiate a text classification pipeline:

classifier = pipeline("text-classification")

For a hands-on experience, feed a string or list of strings into your pipeline to obtain predictions, which can be neatly visualized using Python’s Pandas library. Below is a Python snippet demonstrating this:

sentences = ["I am thrilled to introduce you to the wonderful world of AI.",
"Hopefully, it won't disappoint you."]
# Get classification results for each sentence in the list
results = classifier(sentences)
# Loop through each result and print the label and score
for i, result in enumerate(results):
print(f"Result {i + 1}:")
print(f" Label: {result['label']}")
print(f" Score: {round(result['score'], 3)}\n")


Result 1: 
Score: 1.0 
Result 2: 
Score: 0.996 

Named Entity Recognition (NER)

NER is pivotal in extracting real-world objects termed ‘named entities' from the text. Utilize the NER pipeline to identify these entities effectively:

ner_tagger = pipeline("ner", aggregation_strategy="simple")
text = "Elon Musk is the CEO of SpaceX."
outputs = ner_tagger(text)


 Elon Musk: PER, SpaceX: ORG 

Question Answering

Question answering involves extracting precise answers to specific questions from a given context. Initialize a question-answering pipeline and input your question and context to get the desired answer:

reader = pipeline("question-answering")
text = "Hugging Face is a company creating tools for NLP. It is based in New York and was founded in 2016."
question = "Where is Hugging Face based?"
outputs = reader(question=question, context=text)


 {'score': 0.998, 'start': 51, 'end': 60, 'answer': 'New York'} 

Hugging Face's pipeline function offers an array of pre-built pipelines for different tasks, aside from text classification, NER, and question answering. Below are details on a subset of available tasks:

Table: Hugging Face Pipeline Tasks

TaskDescriptionPipeline Identifier
Text GenerationGenerate text based on a given promptpipeline(task=”text-generation”)
SummarizationSummarize a lengthy text or documentpipeline(task=”summarization”)
Image ClassificationLabel an input imagepipeline(task=”image-classification”)
Audio ClassificationCategorize audio datapipeline(task=”audio-classification”)
Visual Question AnsweringAnswer a query using both an image and a questionpipeline(task=”vqa”)


For detailed descriptions and more tasks, refer to the pipeline documentation on Hugging Face's website.

Why Hugging Face is shifting its focus on Rust

Hugging face Safetensors and tokenizer Rust

Hugging face Safetensors and tokenizer GitHub Page

The Hugging Face (HF) ecosystem started utilizing Rust in its libraries such as safesensors and tokenizers.

Hugging Face has very recently also released a new machine-learning framework called Candle. Unlike traditional frameworks that use Python, Candle is built with Rust. The goal behind using Rust is to enhance performance and simplify the user experience while supporting GPU operations.

The key objective of Candle is to facilitate serverless inference, making the deployment of lightweight binaries possible and removing Python from the production workloads, which can sometimes slow down processes due to its overheads. This framework comes as a solution to overcome the issues encountered with full machine learning frameworks like PyTorch that are large and slow when creating instances on a cluster.

Let's explore why Rust is becoming a favored choice much more than Python.

  1. Speed and Performance – Rust is known for its incredible speed, outperforming Python, which is traditionally used in machine learning frameworks. Python's performance can sometimes be slowed down due to its Global Interpreter Lock (GIL), but Rust does not face this issue, promising faster execution of tasks and, subsequently, improved performance in projects where it is implemented.
  2. Safety – Rust provides memory safety guarantees without a garbage collector, an aspect that is essential in ensuring the safety of concurrent systems. This plays a crucial role in areas like safetensors where safety in handling data structures is a priority.


Safetensors benefit from Rust's speed and safety features. Safetensors involves the manipulation of tensors, a complex mathematical entity, and having Rust ensures that the operations are not just fast, but also secure, avoiding common bugs and security issues that could arise from memory mishandling.


Tokenizers handle the breaking down of sentences or phrases into smaller units, such as words or terms. Rust aids in this process by speeding up the execution time, ensuring that the tokenization process is not just accurate but also swift, enhancing the efficiency of natural language processing tasks.

At the core of Hugging Face's tokenizer is the concept of subword tokenization, striking a delicate balance between word and character-level tokenization to optimize information retention and vocabulary size. It functions through the creation of subtokens, such as “##ing” and “##ed”, retaining semantic richness while avoiding a bloated vocabulary.

Subword tokenization involves a training phase to identify the most efficacious balance between character and word-level tokenization. It goes beyond mere prefix and suffix rules, requiring a comprehensive analysis of language patterns in extensive text corpora to design an efficient subword tokenizer. The generated tokenizer is adept at handling novel words by breaking them down into known subwords, maintaining a high level of semantic understanding.

Tokenization Components

The tokenizers library divides the tokenization process into several steps, each addressing a distinct facet of tokenization. Let's delve into these components:

  • Normalizer: Takes initial transformations on the input string, applying necessary adjustments such as lowercase conversion, Unicode normalization, and stripping.
  • PreTokenizer: Responsible for fragmenting the input string into pre-segments, determining the splits based on predefined rules, such as space delineations.
  • Model: Oversees the discovery and creation of subtokens, adapting to the specifics of your input data and offering training capabilities.
  • Post-Processor: Enhances construction features to facilitate compatibility with many transformer-based models, like BERT, by adding tokens such as [CLS] and [SEP].

To get started with Hugging Face tokenizers, install the library using the command pip install tokenizers and import it into your Python environment. The library can tokenize large amounts of text in very little time, thereby saving precious computational resources for more intensive tasks like model training.

The tokenizers library uses Rust which inherits C++'s syntactical similarity while introducing novel concepts in programming language design. Coupled with Python bindings, it ensures you enjoy the performance of a lower-level language while working in a Python environment.


Datasets are the bedrock of AI projects. Hugging Face offers a wide variety of datasets, suitable for a range of NLP tasks, and more. To utilize them efficiently, understanding the process of loading and analyzing them is essential. Below is a well-commented Python script demonstrating how to explore datasets available on Hugging Face:

from datasets import load_dataset
# Load a dataset
dataset = load_dataset('squad')
# Display the first entry

This script uses the load_dataset function to load the SQuAD dataset, which is a popular choice for question-answering tasks.

Leveraging Pre-trained Models and bringing it all together

Pre-trained models form the backbone of many deep learning projects, enabling researchers and developers to jumpstart their initiatives without starting from scratch. Hugging Face facilitates the exploration of a diverse range of pre-trained models, as shown in the code below:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
# Load the pre-trained model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = AutoTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
# Display the model's architecture

With the model and tokenizer loaded, we can now proceed to create a function that takes a piece of text and a question as inputs and returns the answer extracted from the text. We will utilize the tokenizer to process the input text and question into a format that is compatible with the model, and then we will feed this processed input into the model to get the answer:

def get_answer(text, question):
    # Tokenize the input text and question
    inputs = tokenizer(question, text, return_tensors='pt', max_length=512, truncation=True)
    outputs = model(**inputs)
    # Get the start and end scores for the answer
    answer_start = torch.argmax(outputs.start_logits)
    answer_end = torch.argmax(outputs.end_logits) + 1
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))
    return answer

In the code snippet, we import necessary modules from the transformers package, then load a pre-trained model and its corresponding tokenizer using the from_pretrained method. We choose a BERT model fine-tuned on the SQuAD dataset.

Let's see an example use case of this function where we have a paragraph of text and we want to extract a specific answer to a question from it:

text = """
The Eiffel Tower, located in Paris, France, is one of the most iconic landmarks in the world. It was designed by Gustave Eiffel and completed in 1889. The tower stands at a height of 324 meters and was the tallest man-made structure in the world at the time of its completion.
question = "Who designed the Eiffel Tower?"
# Get the answer to the question
answer = get_answer(text, question)
print(f"The answer to the question is: {answer}")
# Output: The answer to the question is: Gustave Eiffel

In this script, we build a get_answer function that takes a text and a question, tokenizes them appropriately, and leverages the pre-trained BERT model to extract the answer from the text. It demonstrates a practical application of Hugging Face's transformers library to build a simple yet powerful question-answering system. To grasp the concepts well, it is recommended to have a hands-on experimentation using a Google Colab Notebook.


Through its extensive range of open-source tools, pre-trained models, and user-friendly pipelines, it enables both seasoned professionals and newcomers to delve into the expansive world of AI with a sense of ease and understanding. Moreover, the initiative to integrate Rust, owing to its speed and safety features, underscores Hugging Face's commitment to fostering innovation while ensuring efficiency and security in AI applications. The transformative work of Hugging Face not only democratizes access to high-level AI tools but also nurtures a collaborative environment for learning and development in the AI space, facilitating a future where AI is accessible to

I have spent the past five years immersing myself in the fascinating world of Machine Learning and Deep Learning. My passion and expertise have led me to contribute to over 50 diverse software engineering projects, with a particular focus on AI/ML. My ongoing curiosity has also drawn me toward Natural Language Processing, a field I am eager to explore further.