By clicking “Accept”, you agree to the storing of cookies and pixels on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
blog-cover-agent-framework

Using the Agent Framework to scale and expand the capabilities of Large Language Models (LLMs)

Dawid Leszczyński small photo
Dawid Leszczyński
December 2024

While it does not seem like much work and it seems like it would not require a lot of knowledge or skill—after all, you’re only writing a few instructions in plain text, natural language, and expecting the model to do the rest of the work for you—it turns out that these models not only have a lot of limitations but also a lot of problems when given slightly more complex and rigid instructions. 

These problems make it difficult to make LLM-driven features reliable enough to be advantageous to your users when compared to other, more standard solutions.

Suppose you decide to build the “one assistant to rule them all”. In that case, you’ll quickly find yourself in a never-ending story of prompt adjustments to cover all the new features that are being requested and all the edge cases that come with them. Very soon, you’ll start missing the good old times when you would apply well-defined and tested design patterns to make things easier to maintain. Why don’t we apply those tried and true principles to prompt engineering and building assistants? It turns out we can.

Prompt optimization

Before we play our best hand, what can we do about the prompt itself? 

Should you write shorter prompts or longer ones? Give the assistant more freedom to do their task or more rigid, more specific instructions? Is there a sweet spot? Yes, probably, but it’s not going to be consistent for different use cases.

If you can get away with writing shorter prompts and adding less complexity, from my experience, it seems to generally yield better results. These models do better at tasks that allow for more creativity and autonomy. But when you have to add more complexity or you need to make the instructions more rigid, and you need more specific output, longer prompts will usually be necessary to have a chance of achieving a semblance of consistency and to cover all your bases.

What about the structure? When the assistant is responsible for a more creative task, seemingly, a few sentences in natural language will be your best bet. When building out a more complex prompt, what we found on Form2Agent, is that it might be easier to debug and maintain the prompt when creating a step-by-step list of [actionable] tasks for the assistant. Including examples is also good practice; just make sure the chat doesn’t paste them as output (to avoid it, pay attention to your instructions around examples, and your formatting of examples—test multiple times and iterate).

The struggles of a single agent

Large language models can handle very long strings of instructions and they can usually do that quite well, but the advent of LLMs has sparked a lot of creativity in the tech community. Those models are often used for tasks they weren’t really designed for, more suited to other machine learning algorithms, which generally require a bit or significantly more time and know-how to set up. All is well and good since the flexibility is there, but it’s hardly as reliable as you’d hope or expect.

Speaking from personal experience, when a single LLM assistant is given too many instructions to follow and is required to provide a very rigid output to pass the acceptance criteria, it will often create a high failure rate when faced with various edge cases. 

While you can give the chat more specific instructions that would specifically tell it to avoid these scenarios, if they struggled to complete the task properly before, usually those will not help. If the instruction is understandable to a person, but not to a virtual assistant, even when you prompt them for their reasoning or better instructions (e.g. by typing “suggest improved instructions for the next assistant”), it will usually take a lot of iteration to make any improvement, and it might not be enough to fix all of your problems.

What’s more, changing any language to make it shorter by rewording your sentences or using abbreviations and synonyms can change the success to failure rate and have unintended side effects. Adding any new instructions can also affect any of the tasks the chat is asked to do before reaching that point, or after, even when they are seemingly unrelated.

A monolithic LLM agent can be problematic to test and maintain when new features or bug fixes are required. It might seem like a standard and reasonable approach at first, but when building a more complex system, flaws begin to show through the cracks. That being said, it’s the simplest and most performant approach to integrating LLMs, and often the cheapest (it may be easier to generate fewer tokens).

Programming principles in LLM integration

Good practices and design patterns have always been a topic to separate a novice from an expert, for better or worse—you can definitely have too much of a good thing when you only need to do something very quick and simple. 

Some of the more popular rules include SOLID, which is an acronym that stands for five principles: single-responsibility, open-closed, Liskov substitution, interface segregation and dependency inversion.

I spent one or more good nights wondering “Why are we not trying to apply those to our prompt engineering and to the LLM assistants?” We might not be able to copy and apply them 1:1, but those principles are good guiding rules that are supposed to make it easier to build and maintain good, reliable software. I like it easier.

The originator of the single-responsibility principle, Robert C. Martin, states that “A module should be responsible to one, and only one, actor” or “A class should have only one reason to change”. If the agent does more than one job which, in the case of Form2Agent, it does—greeting the user, filling out the form, interrogating for information, and other, smaller tasks—we are mixing concerns and creating more than one reason to change the agent, failing the principle.

It also makes it difficult to stick to the open-closed principle. Classes are meant to be open for extension but closed for modification, and it’s difficult to avoid changes when there are sudden failures associated with all the edge cases that pop up.

A monolithic agent with complex instructions can also pose an obstacle to following the Liskov substitution principle. A small change to the prompt can affect the interpretation of any of the tasks given to the assistant, creating a butterfly effect that makes it challenging to avoid breaking the program when trying to substitute the prompt for such an agent.

The other two principles would be more specific to the way the tasks are planned and the way the prompt is structured and built out in code.

If we had multiple assistants, we could split a more complex job into smaller tasks and have them completed by other assistants or different entities and systems altogether.

This would make it possible to get better results by default, but also to do more targeted fixes while containing side effects, and to split the work between multiple developers in a team, or even multiple teams. Overall, it would make it easier to maintain.

Because LLMs are designed to do more creative work and struggle to give reliable results in other instances, even when given a lower temperature, having multiple assistants should prove to be an incredibly beneficial framework to follow if we want to try to improve their reliability.

What is a Multi-Agent System?

A multi-agent system (MAS) is a computerized system in which multiple intelligent agents attempt to solve problems based on a set of instructions and various inputs. These systems can make it easier to solve more complex problems where a single monolithic agent might struggle to do so. You can think of a multi-agent system as a team of specialists, each focused on their own task. 

These systems excel at complex tasks that require iterative action. You can find a good explanation of complex tasks and an example in the following video from Microsoft Research by Adam Fourney AutoGen Update: Complex Tasks and Agents. It also explains how you can use agents alongside a ledger to reduce large language model hallucinations. 

On the other hand, here you can see how agents can be augmented with long-term memory Unlock AI Agent real power?! Long term memory & Self improving. This kind of memory is optimized, as it does not involve including all of the previous conversations in the context of the API call, and can easily help you exceed context window limitations.

The MAS framework is applicable to the field of artificial intelligence in general, and agents can also be simple programs or tools that do not use any AI and know how to parse the needed information from a specifically formatted text. 

The implementation of multi-agent systems with LLM assistants is often enabled through the use of libraries (which come with some additional features like built-in assistant tools that can execute Python code, and pre-existing coding abstractions) often referred to as agent frameworks, and the idea of dividing a complex task between multiple agents is known as the agentic approach.

Not all implementations of the agentic approach need to use existing libraries. In many cases, it will make more sense to build out a small implementation of your own that will open up more possibilities for optimization and avoid creating dependencies.

Comparing microservice principles to MAS 

A multi-agent system is not too dissimilar to systems built within the microservice architecture. In fact, these two can be implemented together as multi-agent microservices (MAMS). I won’t focus on that in this article, but here’s a brief comparison from (PDF) MAMS: Multi-Agent MicroServices.

Table 1: Comparison of microservices to MAS from (PDF) MAMS: Multi-Agent MicroServices.
Principle Microservices MAS
Bounded Context A microservice represents a single piece of business functionality. An agent can play a single or multiple roles in a system.
Size Microservices should be small enough to ensure maintainability and extensibility. Size/complexity is not an issue in MAS research and often depends on the target domain.
Isolated State Sharing of state information is minimized across services. The state is local and private to an agent. This is often viewed as essential for an agent's autonomy.
Distribution Services are spread across multiple nodes. Agents are logically distributed, but it is also expected that they will be spread over multiple nodes.
Elasticity The application is designed to allow the addition and removal of required resources at runtime. The ability to add/remove agents at runtime is a central feature of MAS.
Automated Management Management operations like failure handling and scaling are automated. Management operations are not central to agents but are sometimes considered.
Loose Coupling Systems are decomposed into loosely coupled sets of highly cohesive colocated services. Agents are autonomous and loosely coupled problem solvers.
Autonomy Microservices operate without the direct intervention of humans and have some kind of control over their internal state (and actions). Agents operate without the direct intervention of humans and have some kind of control over their actions and internal state.
Social Ability Interaction between microservices is typically achieved using messages based on RESTful APIs and HTTP. Agents interact with other agents using some kind of Agent Communication Language.
Reactivity Microservices respond to incoming HTTP requests in a timely fashion. Agents perceive their environment and respond in a timely fashion to changes that occur in it.
Proactivity Microservices do not take the initiative. Agents don’t just respond—they take the initiative.

Agent example (Service distinguisher)

We are planning an integration of another LLM assistant of ours, ToOne, into the Form2Agent chat widget. ToOne is a system using RAG (Retrieval-Augmented Generation) that allows you to upload documents and have the chat answer questions based on the knowledge from those documents. The idea is to make requests with questions for ToOne from the target web app instead of using their separate client, and we already have a chat widget we can use with many useful features such as a full hands-off mode thanks to Form2Agent.

One of the difficulties lies in distinguishing between queries for Form2Agent and ToOne. We either need the user to explicitly change a tab in the GUI, mark a checkbox, or we need to do more natural language processing to figure that out. One way to solve this would be with an agent that makes this choice based on the user input. To get the response faster, we can ask it to return a string containing only one character, e.g. a number representing the app. This would be our distinguisher agent.

Let’s view an example prompt we could use for this agent.

“The result of this prompt should be only a single digit representing the service, answer with 1 or 2, see descriptions below, you need to determine which one should take action against the provided prompt.
1. F2A - a tool that helps users fill forms; answer 1 if there is anything that indicates the user may want to fill something or provides a value for a given parameter.
2. ToOne - the knowledge base of a given system and general Internet knowledge; answer 2 if the user requires assistance with determining a value or asks questions regarding the form, system, or business domain.
User prompt: ‘The invoice number is 123’”

Benefits of the agentic framework

There are a lot of benefits to using the agentic framework. Let me name a few of the main ones:

  • Better split of responsibility means agents are easier to test, expand upon, and maintain;
  • Having multiple agents means that we can use different LLM models for different tasks, balancing between speed, costs, reasoning, and knowledge;
  • Within the agentic framework, together, agents can solve more complex problems through iteration, doing a lot of back-and-forth before returning to the user;
  • Some agents can work asynchronously, doing additional work in the background;
  • The agentic approach and many agentic frameworks can enable long-term memory, web scraping, code execution, and other interesting features;
  • A ledger with facts and educated guesses can be created, used, and maintained by the agents to minimize large language model hallucination.

Pitfalls of the agentic framework

Since we are splitting one job into many tasks, if they depend on each other and have to be run sequentially, each agent in sequence will add communication overhead. When making requests to OpenAI’s web API, based on the service distinguisher example, these can be somewhat significant.

We may also find that we are duplicating some information between assistants, which will increase the amount of generated tokens and the cost of using the APIs. A smart use of the agentic framework can sometimes allow the same or lower costs, by reducing the amount of tokens in the average assistant call (based on how often certain paths are chosen by the users) or by allowing some use of cheaper models.

Agent framework implementations

Let’s do a general overview of a couple of existing implementations of this framework. The tools here have some really exciting features. Below, you’ll find a few interesting links.

AutoGen (Python)

“AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks. AutoGen aims to streamline the development and research of agentic AI, much like PyTorch does for Deep Learning. It offers features such as agents capable of interacting with each other, facilitates the use of various large language models (LLMs) and tool use support, autonomous and human-in-the-loop workflows, and multi-agent conversation patterns.”

GitHub - microsoft/autogen: A programming framework for agentic AI 🤖 

With AutoGen, one of the more interesting feats we can achieve is a back-and-forth between a user proxy capable of executing Python code and an LLM capable of generating it. AutoGen executes that code in a Dockerized container, which makes it more safe for the host machine. This is especially great for one-off tasks for people who don’t know a lot about programming in Python, but it can definitely be utilized for more creative solutions.

It’s well trusted and can also handle live data streams. Generally, AutoGen takes a bit more time and code to set up and get started compared to some other frameworks.

Semantic Kernel (C# / Python / Java)

Semantic Kernel is an SDK that integrates Large Language Models (LLMs) like OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages like C#, Python, and Java. Semantic Kernel achieves this by allowing you to define plugins that can be chained together in just a few lines of code.

What makes Semantic Kernel special, however, is its ability to automatically orchestrate plugins with AI. With Semantic Kernel planners, you can ask an LLM to generate a plan that achieves a user's unique goal. Afterwards, Semantic Kernel will execute the plan for the user.

It provides:

  • abstractions for AI services (such as chat, text to images, audio to text, etc.) and memory stores
  • implementations of those abstractions for services from OpenAI, Azure OpenAI, Hugging Face, local models, and more, and for a multitude of vector databases, such as those from Chroma, Qdrant, Milvus, and Azure
  • a common representation for plugins, which can then be orchestrated automatically by AI
  • the ability to create such plugins from a multitude of sources, including from OpenAPI specifications, prompts, and arbitrary code written in the target language
  • extensible support for prompt management and rendering, including built-in handling of common formats like Handlebars and Liquid
  • and a wealth of functionality layered on top of these abstractions, such as filters for responsible AI, dependency injection integration, and more.”

GitHub - microsoft/semantic-kernel: Integrate cutting-edge LLM technology quickly and easily 

Semantic Kernel also supports output streaming 19 - Streaming In Microsoft Semantic Kernel

Langchain (Python / JavaScript)

“LangChain is a framework for developing applications powered by large language models (LLMs).

For these applications, LangChain simplifies the entire application lifecycle:

  • Open-source libraries: Build your applications using LangChain's open-source building blocks, components, and third-party integrations. Use LangGraph to build stateful agents with first-class streaming and human-in-the-loop support.
  • Productionization: Inspect, monitor, and evaluate your apps with LangSmith so that you can constantly optimize and deploy with confidence.
  • Deployment: Turn your LangGraph applications into production-ready APIs and Assistants with LangGraph Cloud.”

GitHub - langchain-ai/langchain: 🦜🔗 Build context-aware reasoning applications 

Langchain also supports text output streaming from LLMs Streaming | 🦜️🔗 LangChain.

CrewAI (Python)

A Python library for orchestrating role-playing, autonomous AI agents, helping them to tackle more complex tasks, and helping the developers to write cleaner code around AI agents, crewAI is one of the more popular choices for implementing this framework, with an open-source repository

As a developer, you are expected to define your agents, their roles and goals. Then, you define the tasks, including the agent that should be responsible for each one of them, and you instantiate your “crew” and kick off the work. Optionally, you can create a manager agent. You can see more about how it all works here: Managing Processes in CrewAI

CrewAI supports the creation of custom tools to expand the capabilities of AI agents, but also includes its own built-in tools (like website search) and, being built on top of LangChain, supports all of the LangChain tools.

Unfortunately, it’s unclear if you can handle text output streaming, it does not seem like it, making it less than ideal for chatbot implementation.

MemGPT (Python)

“MemGPT makes it easy to build and deploy stateful LLM agents with support for:

You can also use MemGPT to deploy agents as a service. You can use a MemGPT server to run a multi-user, multi-agent application on top of supported LLM providers.”

GitHub - cpacker/MemGPT: Create LLM agents with long-term memory and custom tools 📚🦙 

As far as I can tell, streaming is not supported. This feature is on their development roadmap. In a GitHub discussion, it was mentioned that implementing text streaming would require significant changes due to the structured output format used by MemGPT Streaming support? · Issue #345 · cpacker/MemGPT · GitHub, which is something I actually had a bit of experience with on Form2Agent.

Summary

Prompting and LLMs…

  • Prompt engineering is a deceptive skill that is easy to start learning but difficult to master.
  • Large language models have a lot of potential flexibility, but they are not well suited for tasks that have more rigid rules or more complex structure; complex tasks cannot be reliably handled out-of-the-box by only using prompt engineering. 
  • If you try to handle more complex tasks using a monolithic agent, you might end up having a frustrating development because of unreliable results (even using lower model temperatures) and the need for smoke or regression tests on the entire assistant after every prompt change.

Multi-agent system and the agent framework…

  • The agent framework can be used to create a multi-agent system. It can help you overcome the pitfalls of having a single agent and apply good programming practices to LLM integration.
  • The primary job of existing agent framework implementations is to provide developers with an abstraction for the framework.
  • Not all agents need to be running LLMs. Agents can be used with custom tools to run code, scrape websites, store and access long-term memory inside a vector database, and more. Existing frameworks have some of these tools built into them.
  • Some of the more popular agent frameworks include AutoGen (Python), Semantic Kernel (C#, Python, Java), Langchain (Python, JavaScript), CrewAI (Python) and MemGPT (Python).
  • You don’t need a framework to develop the agentic approach. You may want to build your own abstractions to be able to better understand what’s going on under the hood and optimize as you see fit. This is especially true if you don’t need to leverage the custom tools.