Twice the speed and half the price – what does GPT-4o mean for AI chatbots?
Following their mysterious announcement, OpenAI launched the latest version of their flagship model: GPT-4o.
The latest model didn’t just receive a flashy glow-up in multimodal capabilities. It’s faster and cheaper than GPT-4 Turbo. While mainstream media coverage is enamored with the new flagship model’s video and voice capabilities, the new cost and speed are just as impactful for those using GPT to power their apps.
“The availability of 4o has the power to significantly improve both the builder and the user experience,” said Patrick Hamelin, a software engineer lead at Botpress. “The impact is further-reaching than we think.”
So let’s dive in on how the new model will shake up AI chatbots.
Model Capabilities
The new flagship model comes with an exciting list of updates and new features: enhanced voice and video capabilities, real time translation, more natural language abilities. It can analyze images, understand a wider variety of audio inputs, provide assistance summarizing, facilitate real time translation, and create charts. Users can upload files and have a voice-to-voice conversation. It even comes with a desktop app.
In their series of launch videos, OpenAI employees (and associates like Sal Khan of Khan academy) demonstrate the latest version of GPT prepping a user for a job interview, singing, identifying human emotions through facial expressions, solving written math equations, and even interacting with another ChatGPT-4o.
The launch illustrated a new reality in which an AI model is capable of analyzing the writing in your kid’s notebook and being able to respond. It could explain the concept of adding fractions for the first time, changing tone and tactics based on your child's understanding - it could cross the line from chatbot to personal tutor.
What does GPT-4o mean for LLM Chatbots?
AI chatbots that run on LLMs are gifted an update every time companies like OpenAI update their models. If a chatbot is connected to a bot-building platform like Botpress, they receive all the benefits of the latest GPT model in their own chatbots.
With the release of GPT-4o, AI chatbots can now opt to run on the advanced model, changing their capabilities, price, and speed. The new model has 5x higher rate limits from GPT-4 Turbo, with the ability to process up to 10 million tokens per minute.
For bots using audio integrations like Twilio on Botpress, a new world of voice-powered interaction has emerged. Instead of being confined to the audio processing of yesteryear, chatbots are a step closer to mimicking human interaction.
Perhaps most important is the lower cost for paid users. Running a similarly-capable chatbot for half the cost can drastically increase access and affordability world-wide. And Botpress users pay no additional AI spend on their bots – so these savings go directly to builders.
And on the user side of the equation, GPT-4o means a far better user experience. No one likes waiting. Shorter response times means higher user satisfaction for AI chatbot users.
Users love Speed
A key tenant of chatbot adoption is improving user experience. And what improves user experience more than cutting down on wait times?
“It’ll be a better experience for sure,” said Hamelin. “The last thing you want to do is wait on someone.”
Humans hate waiting. Even back in 2003, a study found that people were only willing to wait approximately 2 seconds for a web page to load. Our patience certainly hasn’t increased since then.
And everyone hates waiting
There are a plethora of UX tips out there to cut down on perceived waiting time. Often we’re unable to improve the speed of events, so we focus on how to make users feel like time is passing faster. Visual feedback, like a loading bar image, exists to shorten perceived wait time.
In a famous story of elevator wait times, an old New York building was fielding a barrage of complaints. Residents had to wait 1-2 minutes for the elevator to arrive. The building wasn’t able to upgrade the elevator to a newer model and residents were threatening to break their leases.
A new hire, trained in psychology, figured out that the real problem wasn’t the two minutes of lost time – it was boredom. He suggested installing mirrors so residents could look at themselves or others while waiting. Complaints about the elevator ceased, and now, it’s commonplace to see mirrors in elevator lobbies.
Instead of taking shortcuts to enhance the user experience – like visual feedback – OpenAI has improved on the experience at its source. Speed is central to user experience, and there’s no trick that matches the satisfaction of an efficient interaction.
Savings for Everyone
Using this new AI model to run applications suddenly got cheaper. A lot cheaper.
Running an AI chatbot at scale can get pricey. The LLM your bot is powered by determines how much you’ll pay for each user interaction at a larger scale (at least at Botpress, where we match AI spend 1:1 with LLM costs).
And these savings aren’t just for developers using API. ChatGPT-4o is the latest free version of the LLM, alongside GPT-3.5. Free users are able to use the ChatGPT app at no cost.
Better tokenization
If you interact with the model in a language that doesn’t use the Roman alphabet, GPT-4o even further decreases your API costs.
The new model comes with improved usage limits. It provides a significant leap in tokenization efficiency, largely concentrated to certain non-English languages.
The new tokenization model requires fewer tokens to process input text. It’s far more efficient for logographic languages (i.e. languages that use symbols and characters instead of individual letters).
These benefits are largely concentrated to languages that don’t use the Roman alphabet. The reductions in savings have been estimated as the following:
- Indian languages, like Hindi, Tamil, or Gujarati, have a 2.9 – 4.4x reduction in tokens
- Arabic has a ~2x reduction in tokens
- East Asian languages, like Chinese, Japanese, and Vietnamese have a 1.4 – 1.7x reduction in tokens
Closing the AI digital divide
The digital era has brought with it an extension of the ages-old, well-documented wealth gap – the digital divide. Just as access to wealth and strong infrastructure is exclusive to certain populations, so is access to AI and the opportunities and benefits that accompany it.
Robert Opp, the Chief Digital Officer at the United Nations Development Programme (UNDP), explained that the presence of AI platforms has the ability to make or break an entire country’s development metrics:
By halving the cost of GPT-4o and introducing a free tier, OpenAI is taking a crucial step towards neutralizing one of the biggest problems in AI – and directly addressing the inequality on the minds of policymakers and economists.
A positive PR move for big AI is more necessary than enthusiasts might think. As AI has loomed ever more present in our day-to-day lives, advocates and skeptics alike have asked how we might be able to use AI ‘for good’.
According to AI PhD and educator Louis Bouchard, distributing wider access to AI is how we do exactly that: “Making AI accessible is one way, if not the best, to use AI ‘for good.’” His reasoning? If we’re unable to fully control the positive and negative impacts of AI technology – at least in its early days – we can instead ensure equal access to its potential benefits.
Expanded Multimodal Potential
The popular way to interact with a business’ chatbot is via text, but the enhanced multimodal capabilities of OpenAI’s new AI model suggest that this might change going forward.
In the coming year, we’ll likely see a tide of developers rolling out new applications that make the most of the newly accessible audio, vision, and video capabilities.
For example, GPT-powered chatbots could have the ability to:
- Ask customers for an image of the item they’re returning to identify the product and ensure it isn’t damaged
- Provide audio translation in real time conversation that accounts for region-specific dialects
- Tell whether your steak is cooked from an image of it in the pan
- Function as a no-cost personal tour guide, providing historical context based on an image of an old cathedral, giving translation in real time, and giving a customized voice tour that allows for back-and-forth communication and questions
- Power a language learning application that listens to audio input, can provide feedback on pronunciation based on a video of your mouth movements, or teach sign language through images and video
- Provide non-urgent mental wellness support by combining its ability to interpret audio and videos, allowing for low-cost talk therapy
With AI models that can interpret images and audio, our understanding of how LLMs can serve us is rapidly expanding.
Multimodality means accessibility
We’ve already seen the enhanced multimodal features put to social good. A perfect example is OpenAI’s partnership with Be My Eyes.
Be My Eyes is a Danish start-up that connects vision-impaired users with seeing volunteers. When a user needs assistance – like picking the right canned goods at the supermarket or identifying the color of a t-shirt – the app connects them with a seeing volunteer around the world through video via smartphone.
OpenAI’s new vision ability can provide an even more helpful experience for Be My Eyes users. Instead of relying on a human volunteer to visually decipher an image or video in real time, blind users can relay an image or video to their device that the model can respond to with audio information.
OpenAI and Be My Eyes, now trusted partners, are paving the way to more independence for legally blind individuals around the world. Be My Eyes CEO Michael Buckley explains its impact:
The new service will be rolling out soon, in the summer of 2024, for the first time. Early access users have been beta testing the new vision, video, and audio features to rave reviews. While the impacts of AI can cause concern for skeptics, this partnership is a clear sign of the positive impacts it can bring. Understanding the social good that comes with advanced AI is a crucial step for its PR.
How will we judge future LLM models?
As competitors continue in a race to the bottom – to create the cheapest, fastest LLM – it begs the question: how will we judge the AI models of tomorrow?
At some point in the future, the major LLM creators (likely OpenAI and Google) will plateau in how fast their models can run and how cheaply they can provide access. Once we reach stability on cost and speed, how will we crown the market-leading model?
What will become the new sign of the times? Whether it’s available personalities of your artificial intelligence model, the video enhancement capabilities, the features available to free users, or brand-new metrics beyond our current understanding, the next generation of LLMs is at our doorstep.
AI Chatbots Made Easy
What if your AI chatbot automatically synchronized with every GPT update?
Botpress has provided customizable AI chatbot solutions since 2017, providing developers with the tools they need to easily build chatbots with the power of the latest LLMs. Botpress chatbots can be trained on custom knowledge sources – like your website or product catalog – and seamlessly integrate with business systems.
The only platform that ranges from no code set-up to endless customizability and extendability, Botpress allows you to automatically get the power of the latest GPT version on your chatbot – no effort required.
Start building today. It’s free.
Table of Contents
Stay up to date with the latest on AI agents
Share this on: