How Botpress interfaces with LLMs

Written by

Sylvain Perron

CEO of Botpress

Table of Contents

Two ways Botpress interfaces with LLMs

Model configurations

The Botpress inference engine

The case against lightweight router models

Future-proofing LLMs with feedback

Summary

I recently received an email from a talented scholar asking how Botpress interfaces with LLMs.

He was writing a paper on avoiding vendor lock-in, and wanted to know whether we perhaps used a framework like LangChain or Haystack.

I was more than pleased to share with him that we created our own abstractions that allow Botpress builders to interface with LLMs.

Given the wider interest in the subject, I wanted to make this information public. It may be of use for other devs or our platform’s users. I hope you find it as interesting as I found creating it.

Build AI Chatbots

Build custom agentic chatbots

Start now

Two ways Botpress interfaces with LLMs

Botpress has created its own abstractions that work in two ways:

1. Integrations

Integrations have the concept of actions that have specific input and output types.

We have open source components on the platform, so the community can create their own integrations that can either be private or available for public use.

So LLM providers – OpenAI, Anthropic, Groq, etc. – each have an integration. That’s one way our users can interface with them.

2. LLM integration interfaces

On top of the concept of integrations, we added “interfaces.”

These are simply standard schema definitions that integrations can extend. We created a standard schema for LLMs.

As long as an integration extends this schema, the integration is considered an LLM provider. So it works out-of-the-box in Botpress.

Here are some examples of the Botpress integrations for different LLM providers:

We have similar interfaces for text2image, image2text, voice2text, text2voice, etc.

Model configurations

Inside the Botpress Studio, we have two general configs: the "Best Model" and the "Fast Model". We found that in general, most tasks easily fit one of these two modes.

Screenshot depicting choice between 'fast' and 'best' LLM defaults on the Botpress platform. — *Screenshot from the Botpress platform.*

Screenshot of the model strategy selection on Botpress, a choice between fastest, hybrid, and best. — *Screenshot from the Botpress platform.*

‍

But in addition to just pure model selection, we found that the different providers diverged too much on tool calling and message formats to be able to easily swap one model to the other and expect good performances.

The Botpress inference engine

Because of that, we created our own inference engine called LLMz, which works with any model with no (or very minimal) prompt change required. And it provides much better tool calling and often much better performance in terms of token cost and LLM roundtrips.

This engine works with typescript types behind the scene for the tool definitions, markdown for the message and code output format, and an LLM-native execution sandbox for inference.

LLMz provides many optimizations and debugging features that are required for advanced use-cases such as:

Input tokens compression
Smart token truncation
Token-optimized memory-to-context
Parallel & composite tool calling
Mix of multiple messages + tool calls in a single LLM call
Fully type safe tools (input & output)
Long-lived sessions through sandbox serialization
Tool mocking, wrapping and tracing
Full execution isolation in lightweight V8 isolates (allows to run thousands of concurrent executions fast and for very cheap)
Automatic iterations and error recovery

All these things were necessary for our use-cases. But they were either impossible or very hard to do with regular tool calling.

Deploying AI Agents?

Read our Blueprint for AI Agent Implementation

Read Now

The case against lightweight router models

We long thought about building a lightweight router model that would sit on top of existing models and automatically pick the right model for the task at hand.

But we decided not to do so for multiple reasons:

1. Predictability

Most of our clients – understandably – want reliable and predictable results.

So the idea of a dynamic model router is a bit scary for high-level agents. It brings another layer of unpredictability to LLMs.

2. Speed

Latency is very important for our use-cases. For the router to be fast, the model has to be very small (and arguably dumber) than the models it will route to – probably a traditional classifier.

While these generally perform okay when they’re trained on specific tasks, a) their short context sizes are an issue for long prompts and b) they fail to generalize to other prompts outside what they have been trained on.

3. Model supremacy or model equality

While benchmarks may say otherwise, in the wild, we've rarely seen models outperform GPT-4o (so far).

It's still unclear if LLMs will really perform better on task X than task Y over time, or if all LLMs will end up being extremely good at most things. In the case of the latter, model-picking won't be worth the effort.

Future-proofing LLMs with feedback

LLMs will be a commodity in a few years and model selection won't really be a thing.

For those reasons, we decided to invest our effort in providing a good mechanism for providing LLMs with examples.

So we've built a system to capture feedback. It stores "learnings" for future executions. And it dynamically provides the most relevant learnings at prompt-time for future executions, in order to ensure reliable and continuous improvements over time.

As LLMs each build towards higher and higher performance, we’re ready and excited to make the most of them for our platform’s users.

Build AI Chatbots

Build custom agentic chatbots

Start now