I recently received an email from a talented scholar asking how Botpress interfaces with LLMs.
He was writing a paper on avoiding vendor lock-in, and wanted to know whether we perhaps used a framework like LangChain or Haystack.
I was more than pleased to share with him that we created our own abstractions that allow Botpress builders to interface with LLMs.
Given the wider interest in the subject, I wanted to make this information public. It may be of use for other devs or our platform’s users. I hope you find it as interesting as I found creating it.
Two ways Botpress interfaces with LLMs
Botpress has created its own abstractions that work in two ways:
1. Integrations
Integrations have the concept of actions that have specific input and output types.
We have open source components on the platform, so the community can create their own integrations that can either be private or available for public use.
So LLM providers – OpenAI, Anthropic, Groq, etc. – each have an integration. That’s one way our users can interface with them.
2. LLM integration interfaces
On top of the concept of integrations, we added “interfaces.”
These are simply standard schema definitions that integrations can extend. We created a standard schema for LLMs.
As long as an integration extends this schema, the integration is considered an LLM provider. So it works out-of-the-box in Botpress.
Here are some examples of the Botpress integrations for different LLM providers:
We have similar interfaces for text2image, image2text, voice2text, text2voice, etc.
Model configurations
Inside the Botpress Studio, we have two general configs: the "Best Model" and the "Fast Model". We found that in general, most tasks easily fit one of these two modes.
But in addition to just pure model selection, we found that the different providers diverged too much on tool calling and message formats to be able to easily swap one model to the other and expect good performances.
The Botpress inference engine
Because of that, we created our own inference engine called LLMz, which works with any model with no (or very minimal) prompt change required. And it provides much better tool calling and often much better performance in terms of token cost and LLM roundtrips.
This engine works with typescript types behind the scene for the tool definitions, markdown for the message and code output format, and an LLM-native execution sandbox for inference.
LLMz provides many optimizations and debugging features that are required for advanced use-cases such as:
- Input tokens compression
- Smart token truncation
- Token-optimized memory-to-context
- Parallel & composite tool calling
- Mix of multiple messages + tool calls in a single LLM call
- Fully type safe tools (input & output)
- Long-lived sessions through sandbox serialization
- Tool mocking, wrapping and tracing
- Full execution isolation in lightweight V8 isolates (allows to run thousands of concurrent executions fast and for very cheap)
- Automatic iterations and error recovery
All these things were necessary for our use-cases. But they were either impossible or very hard to do with regular tool calling.
The case against lightweight router models
We long thought about building a lightweight router model that would sit on top of existing models and automatically pick the right model for the task at hand.
But we decided not to do so for multiple reasons:
1. Predictability
Most of our clients – understandably – want reliable and predictable results.
So the idea of a dynamic model router is a bit scary for high-level agents. It brings another layer of unpredictability to LLMs.
2. Speed
Latency is very important for our use-cases. For the router to be fast, the model has to be very small (and arguably dumber) than the models it will route to – probably a traditional classifier.
While these generally perform okay when they’re trained on specific tasks, a) their short context sizes are an issue for long prompts and b) they fail to generalize to other prompts outside what they have been trained on.
3. Model supremacy or model equality
While benchmarks may say otherwise, in the wild, we've rarely seen models outperform GPT-4o (so far).
It's still unclear if LLMs will really perform better on task X than task Y over time, or if all LLMs will end up being extremely good at most things. In the case of the latter, model-picking won't be worth the effort.
Future-proofing LLMs with feedback
LLMs will be a commodity in a few years and model selection won't really be a thing.
For those reasons, we decided to invest our effort in providing a good mechanism for providing LLMs with examples.
So we've built a system to capture feedback. It stores "learnings" for future executions. And it dynamically provides the most relevant learnings at prompt-time for future executions, in order to ensure reliable and continuous improvements over time.
As LLMs each build towards higher and higher performance, we’re ready and excited to make the most of them for our platform’s users.
Table of Contents
Stay up to date with the latest on AI chatbots
Share this on: