Generative AI and particularly the language-flavor of it – ChatGPT is everywhere. Large Language Model (LLM) technology will play a significant role in the development of future applications. LLMs are very good at understanding language because of the extensive pre-training that has been done for foundation models on trillions of lines of public domain text, including code. Methods like supervised fine-tuning and reinforced learning with human feedback (RLHF) make these LLM even more efficient in answering specific questions and conversing with users. As we get into next phase of AI apps powered by LLMs – following key components will be crucial for these next-gen applications. The figure below shows this progression, and as you move up the chain, you build more intelligence and autonomy in your applications. Let’s look at these various levels.
These are direct calls to completion or chat models by a LLM provider like Azure OpenAI or Google PaLM or Amazon Bedrock. These calls have a very basic prompt and mostly use the internal memory of the LLM to produce the output.
Example: Asking a basic model like “text-davinci” to “tell a joke”. You give very little context and model relies on its internal pre-trained memory to come up with an answer (highlighted in green in figure below – using Azure OpenAI).
Next level of intelligence is in adding more and more context into prompts. There are techniques for prompt engineering that can be applied to LLMs that can make them give customized responses. For example, when generating an email to a user, some context about the user, past purchases and behavior patterns can serve as prompt to better customize the email. Users familiar with ChatGPT will know different methods of prompting like giving examples which are used by the LLM to build response. Prompts augment the internal memory of the LLM with additional context. Example is below.
Embeddings take prompts to the next level by searching a knowledge store for context and obtaining that context and appending to the prompt. Here, the first step is to make a large document store with unstructured text searchable by indexing the text and populating a vector database. For this an embedding model like ‘ada’ by OpenAI is used that takes a chunk of text and converts it into a n-dimensional vector. These embeddings capture the context of the text, so similar sentences will have embeddings that are close to each other in vector space. When user enters a query, that query is also converted into embedding and that vector is matched against vectors in database. Thus, we get top 5 or 10 matching text chunks for the query which form the context. The query and context are passed to LLM to answer the question in a human-like manner.
Today Chains is the most advanced and mature technology available that is extensively being used to build LLM applications. Chains are deterministic where a sequence of LLM calls are joined together with output from one flowing into one of more LLMs. For example, we could have a LLM call query a SQL database and get list of customer emails and send that list to another LLM that will generate personalized emails to Customers. These LLM chains can be integrated in existing application flows to generate more valuable outcomes. Using chains, we could augment LLM calls with external inputs like API calls and integration with knowledge graphs to provide context. Moreover, today with multiple LLM providers available like OpenAI, AWS Bedrock, Google PaLM, MosaicML, etc. we could mix and match LLM calls into chains. For chain elements with limited intelligence a lower LLM like ‘gpt3.5-turbo’ could be used while for more advanced tasks ‘gpt4’ could be used. Chains give an abstraction for data, applications and LLM calls.
Agents is a topic of many online debates particularly with respect to being artificial general intelligence (AGI). Agents use an advanced LLM like ‘gpt4’ or ‘PaLM2’ to plan tasks rather than having pre-defined chains. So now when there are user requests, based on query the agent decides what set of tasks to call and dynamically builds a chain. For example, when we configure an agent with a command like “notify customers when loan APR changes due to government regulation update”. The agent framework makes a LLM call to decide on the steps to take or chains to build. Here it will involve invoking an app that scrapes regulatory websites and extracts latest APR rate, then a LLM call searches database and extracts customer emails which are affected and finally an email is generated to notify everyone.
LLM is a highly evolving technology and better models and applications are being launched every week. LLM to Agents is the intelligence ladder and as we move up, we build complex autonomous applications. Better models will mean more effective agents and the next-gen applications will be powered by these. Time will tell how advanced the next gen applications will be and what patterns they will be powered by.