What Does GPT-4o Mean for Chatbots?

Written by

Sarah Chudleigh

Researcher & AI Content Lead

Table of Contents

Model Capabilities

What does GPT-4o mean for LLM Chatbots?

Users Love Speed

Savings for Everyone

Expanded Multimodal Potential

How will we judge future LLM models?

AI Chatbots Made Easy

FAQs

Summary

GPT-4o is twice as fast and half the cost of GPT-4 Turbo, drastically lowering the price and speeding up response times for AI chatbots.
The new model enables advanced multimodal capabilities—including voice, video, real-time translation, and vision—which opens up innovative use cases for chatbots beyond text.
Efficiency gains in tokenization, especially for non-Roman alphabet languages, mean significant cost savings for global chatbot deployments.
Speed improvements directly enhance user experience, reducing wait times that traditionally frustrate chatbot users.

Twice the speed and half the price – what does GPT-4o mean for AI chatbots?

Following their mysterious announcement, OpenAI launched the latest version of their flagship model: GPT-4o.

The latest model didn’t just receive a flashy glow-up in multimodal capabilities. It’s faster and cheaper than GPT-4 Turbo. While mainstream media coverage is enamored with the new flagship model’s video and voice capabilities for ChatGPT, the new cost and speed are just as impactful for those using GPT to power their apps.

White lettering on an indigo background. A quote from Botpress software engineer lead Patrick Hamelin that reads: "The availability of 4o has the power to significantly improv both the builder and the user experience. The impact is further-reaching than we think."

“The availability of 4o has the power to significantly improve both the builder and the user experience,” said Patrick Hamelin, a software engineer lead at Botpress. “The impact is further-reaching than we think.”

So let’s dive in on how the new model will shake up AI chatbots.

Build AI Chatbots

Build custom agentic chatbots

Start now

Model Capabilities

Say hello to GPT-4o

The new flagship model comes with an exciting list of updates and new features: enhanced voice and video capabilities, real time translation, more natural language abilities. It can analyze images, understand a wider variety of audio inputs, provide assistance summarizing, facilitate real time translation, and create charts. Users can upload files and have a voice-to-voice conversation. It even comes with a desktop app.

In their series of launch videos, OpenAI employees (and associates like Sal Khan of Khan academy) demonstrate the latest version of GPT prepping a user for a job interview, singing, identifying human emotions through facial expressions, solving written math equations, and even interacting with another ChatGPT-4o.

The launch illustrated a new reality in which an AI model is capable of analyzing the writing in your kid’s notebook and being able to respond. It could explain the concept of adding fractions for the first time, changing tone and tactics based on your child's understanding - it could cross the line from chatbot to personal tutor.

A video screenshot of a GPT-4o demo video featuring Kan Academy creator Sal Khan and his son. — *Sal Khan, creator of Khan Academy, and his son demonstrating GPT-4o's ability to provide geometry tutoring.*

What does GPT-4o mean for LLM Chatbots?

AI chatbots that run on LLMs are gifted an update every time companies like OpenAI update their models. If an LLM agent is connected to a bot-building platform like Botpress, they receive all the benefits of the latest GPT model in their own chatbots.

With the release of GPT-4o, AI chatbots can now opt to run on the advanced model, changing their capabilities, price, and speed. The new model has 5x higher rate limits from GPT-4 Turbo, with the ability to process up to 10 million tokens per minute.

For bots using audio integrations like Twilio on Botpress, a new world of voice-powered interaction has emerged. Instead of being confined to the audio processing of yesteryear, chatbots are a step closer to mimicking human interaction.

Perhaps most important is the lower cost for paid users. Running a similarly-capable chatbot for half the cost can drastically increase access and affordability world-wide. And Botpress users pay no additional AI spend on their bots – so these savings go directly to builders.

And on the user side of the equation, GPT-4o means a far better user experience. No one likes waiting. Shorter response times means higher user satisfaction for AI chatbot users.

*In the Botpress studio, users can select different versions of GPT for different parts of their bot's workflow.*

Users Love Speed

A key tenant of chatbot adoption is improving user experience. And what improves user experience more than cutting down on wait times?

“It’ll be a better experience for sure,” said Hamelin. “The last thing you want to do is wait on someone.”

Humans hate waiting. Even back in 2003, a study found that people were only willing to wait approximately 2 seconds for a web page to load. Our patience certainly hasn’t increased since then.

And everyone hates waiting

There are a plethora of UX tips out there to cut down on perceived waiting time. Often we’re unable to improve the speed of events, so we focus on how to make users feel like time is passing faster. Visual feedback, like a loading bar image, exists to shorten perceived wait time.

In a famous story of elevator wait times, an old New York building was fielding a barrage of complaints. Residents had to wait 1-2 minutes for the elevator to arrive. The building wasn’t able to upgrade the elevator to a newer model and residents were threatening to break their leases.

A new hire, trained in psychology, figured out that the real problem wasn’t the two minutes of lost time – it was boredom. He suggested installing mirrors so residents could look at themselves or others while waiting. Complaints about the elevator ceased, and now, it’s commonplace to see mirrors in elevator lobbies.

Instead of taking shortcuts to enhance the user experience – like visual feedback – OpenAI has improved on the experience at its source. Speed is central to user experience, and there’s no trick that matches the satisfaction of an efficient interaction.

Savings for Everyone

Using this new AI model to run applications suddenly got cheaper. A lot cheaper.

Running an AI chatbot at scale can get pricey. The LLM your bot is powered by determines how much you’ll pay for each user interaction at a larger scale (at least at Botpress, where we match AI spend 1:1 with LLM costs).

And these savings aren’t just for developers using API. ChatGPT-4o is the latest free version of the LLM, alongside GPT-3.5. Free users are able to use the ChatGPT app at no cost.

Better tokenization

If you interact with the model in a language that doesn’t use the Roman alphabet, GPT-4o even further decreases your API costs.

A visualization of how much more efficient tokenization is with GPT-4o compared to Turbo. Indo-Aryan languages like Hindi and Gujarati have a 2.9-4.4 average tokenization reduction. Arabic has a 2x reduction and East Asian languages like Japanese, Korean, and Chinese have a 1.4-1.x reduction. — *How much more efficient is GPT-4o tokenization? It depends on the language.*

The new model comes with improved usage limits. It provides a significant leap in tokenization efficiency, largely concentrated to certain non-English languages.

The new tokenization model requires fewer tokens to process input text. It’s far more efficient for logographic languages (i.e. languages that use symbols and characters instead of individual letters).

These benefits are largely concentrated to languages that don’t use the Roman alphabet. The reductions in savings have been estimated as the following:

Indian languages, like Hindi, Tamil, or Gujarati, have a 2.9 – 4.4x reduction in tokens
Arabic has a ~2x reduction in tokens
East Asian languages, like Chinese, Japanese, and Vietnamese have a 1.4 – 1.7x reduction in tokens

Deploying AI Agents?

Read our Blueprint for AI Agent Implementation

Read Now

Closing the AI digital divide

The digital era has brought with it an extension of the ages-old, well-documented wealth gap – the digital divide. Just as access to wealth and strong infrastructure is exclusive to certain populations, so is access to AI and the opportunities and benefits that accompany it.

Robert Opp, the Chief Digital Officer at the United Nations Development Programme (UNDP), explained that the presence of AI platforms has the ability to make or break an entire country’s development metrics:

“One big concern that we have, is that countries that are more equipped and skilled at AI platforms, both in terms of developing and using, they could have a much faster development process and countries that don't have the skills and capacities are going to be left behind.”

A brightly-decorated stage with four individuals in white armchairs. Opp sits on the far right and speaks into a microphone. — *Robert Opp, Chief Digital Officer at the UNDP, speaks at the Global Digital Public Infrastructure Summit in India (2024). Photo from* *UNDP Digital X*.

By halving the cost of GPT-4o and introducing a free tier, OpenAI is taking a crucial step towards neutralizing one of the biggest problems in AI – and directly addressing the inequality on the minds of policymakers and economists.

A positive PR move for big AI is more necessary than enthusiasts might think. As AI has loomed ever more present in our day-to-day lives, advocates and skeptics alike have asked how we might be able to use AI ‘for good’.

White lettering on an indigo background. A quote from AI educator Louis Bouchard reads “Making AI accessible is one way, if not the best, to use AI ‘for good.’”

According to AI PhD and educator Louis Bouchard, distributing wider access to AI is how we do exactly that: “Making AI accessible is one way, if not the best, to use AI ‘for good.’” His reasoning? If we’re unable to fully control the positive and negative impacts of AI technology – at least in its early days – we can instead ensure equal access to its potential benefits.

Expanded Multimodal Potential

The popular way to interact with a business’ chatbot is via text, but the enhanced multimodal capabilities of OpenAI’s new AI model suggest that this might change going forward.

In the coming year, we’ll likely see a tide of developers rolling out new applications that make the most of the newly accessible audio, vision, and video capabilities.

For example, GPT-powered chatbots could have the ability to:

Ask customers for an image of the item they’re returning to identify the product and ensure it isn’t damaged
Provide audio translation in real time conversation that accounts for region-specific dialects
Tell whether your steak is cooked from an image of it in the pan
Function as a no-cost personal tour guide, providing historical context based on an image of an old cathedral, giving translation in real time, and giving a customized voice tour that allows for back-and-forth communication and questions
Power a language learning application that listens to audio input, can provide feedback on pronunciation based on a video of your mouth movements, or teach sign language through images and video
Provide non-urgent mental wellness support by combining its ability to interpret audio and videos, allowing for low-cost talk therapy

With AI models that can interpret images and audio, our understanding of how LLMs can serve us is rapidly expanding.

Multimodality means accessibility

We’ve already seen the enhanced multimodal features put to social good. A perfect example is OpenAI’s partnership with Be My Eyes.

Be My Eyes is a Danish start-up that connects vision-impaired users with seeing volunteers. When a user needs assistance – like picking the right canned goods at the supermarket or identifying the color of a t-shirt – the app connects them with a seeing volunteer around the world through video via smartphone.

A bright blue announcement for 'Be My AI' that reads 'Rolling out out'. On the right side is an image of a smartphone showing a deserted seaside pathway with an AI-generated description of the picture. — *The partnership and product announcement for Be My Eyes x OpenAI.*

OpenAI’s new vision ability can provide an even more helpful experience for Be My Eyes users. Instead of relying on a human volunteer to visually decipher an image or video in real time, blind users can relay an image or video to their device that the model can respond to with audio information.

OpenAI and Be My Eyes, now trusted partners, are paving the way to more independence for legally blind individuals around the world. Be My Eyes CEO Michael Buckley explains its impact:

“In the short time we’ve had access, we have seen unparalleled performance to any image-to-text object recognition tool out there. The implications for global accessibility are profound. In the not so distant future, the blind and low vision community will utilize these tools not only for a host of visual interpretation needs, but also to have a greater degree of independence in their lives.”

Three images of smartphones using Be My Eyes. One focuses on an array of patterned neckties, one features a user holding a bottle of sunscreen to the camera, and one holds the camera to show small, colorful houses. — *Be My Eyes connects vision-impaired users with seeing volunteers to complete visual tasks. Photos from Be My Eyes.*

Be My Eyes Accessibility with GPT-4o

The new service will be rolling out soon, in the summer of 2024, for the first time. Early access users have been beta testing the new vision, video, and audio features to rave reviews. While the impacts of AI can cause concern for skeptics, this partnership is a clear sign of the positive impacts it can bring. Understanding the social good that comes with advanced AI is a crucial step for its PR.

How will we judge future LLM models?

As competitors continue in a race to the bottom – to create the cheapest, fastest LLM – it begs the question: how will we judge the AI models of tomorrow?

At some point in the future, the major LLM creators (likely OpenAI and Google) will plateau in how fast their models can run and how cheaply they can provide access. Once we reach stability on cost and speed, how will we crown the market-leading model?

What will become the new sign of the times? Whether it’s available personalities of your artificial intelligence model, the video enhancement capabilities, the features available to free users, or brand-new metrics beyond our current understanding, the next generation of LLMs is at our doorstep.

AI Chatbots Made Easy

What if your AI chatbot automatically synchronized with every GPT update?

Botpress has provided customizable AI chatbot solutions since 2017, providing developers with the tools they need to easily build chatbots with the power of the latest LLMs. Botpress chatbots can be trained on custom knowledge sources – like your website or product catalog – and seamlessly integrate with business systems.

The only platform that ranges from no code set-up to endless customizability and extendability, Botpress allows you to automatically get the power of the latest GPT version on your chatbot – no effort required.

‍Start building today. It’s free.

Build AI Chatbots

Build custom agentic chatbots

Start now

FAQs

1. How do I switch my existing chatbot to GPT-4o on Botpress?

To switch your existing chatbot to GPT-4o on Botpress, go to the Botpress Studio, navigate to your assistant’s LLM settings, and select GPT-4o from the available model dropdown. The change applies instantly without requiring code changes.

2. Are there prerequisites to using GPT-4o within the Botpress platform (e.g., SDKs, API versions)?

No, there are no prerequisites to using GPT-4o in Botpress. The platform manages all SDKs, API updates, and backend dependencies automatically, so you only need to select GPT-4o in the settings to activate it.

3. Can GPT-4o be fine-tuned or customized for specific business use cases via Botpress?

While GPT-4o cannot be fine-tuned in the traditional sense within Botpress, you can customize its responses and behavior using prompt engineering, workflow logic, knowledge bases, and variables. This allows GPT-4o to behave contextually for your business needs without retraining the model.

4. Are there limitations on the use of multimodal features (voice, vision) within Botpress workflows?

Yes, Botpress currently supports voice features through integrations like Twilio or Dialogflow Voice Gateway, but multimodal capabilities like processing images or videos are not fully supported yet. Vision-based input is still under consideration or requires workarounds.

5. Are there hidden costs to using GPT-4o’s advanced features like real-time translation or vision input?

No, there are no hidden costs to using GPT-4o’s advanced features in Botpress. GPT-4o’s speed and efficiency benefits are included in your existing Botpress plan, and LLM costs are covered by Botpress - so users don’t incur extra fees for using GPT-4o’s enhancements.