Even if you use it on a daily basis, you might have questions about how ChatGPT works.
Let’s dive into the behind-the-scenes of the world’s most popular AI chatbot.
ChatGPT 101
If you’ve only got 20 seconds to spare, here’s how ChatGPT works:
- You send a request. ‘Please write an email
- ChatGPT breaks down the input into tokens for processing.
- It uses NLP to analyze the input and understand context.
- It predicts the next word using patterns it learned from training data.
- It focuses on the most relevant parts of your input (using the attention mechanism).
- ChatGPT generates the full response, word by word, and sends it back to you.
These are the basic steps of how ChatGPT receives and responds to queries.
Natural Language Processing
Part of what makes ChatGPT seem like magic is that it uses natural language processing. It can chat back and forth with us because it can process and then understand natural human language.
What is natural language processing?
Natural language processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans through natural language.
It enables machines to understand, interpret, and generate human language in a way that is both meaningful and useful.
NLP vs NLU vs NLG
NLP is a broad field that encompasses various sub-disciplines, including natural language understanding (NLU) and natural language generation (NLG).
NLP is the overarching domain, while NLU and NLG are specialized areas within it. That’s because natural language processing must involve both understanding, then generation during a back-and-forth conversation.
How does NLP work?
NLU breaks down human language to interpret its meaning and intent. Here’s how it works step-by-step:
- The text is pre-processed to remove unnecessary elements (like punctuation and stop words).
- The system identifies key components such as entities, keywords, and phrases from the text.
- It analyzes sentence structure to understand relationships between words and concepts.
- The NLU model maps the recognized elements to specific intents or goals.
- The NLU engine refines its understanding based on context and user interaction history.
The system provides a structured output that can trigger appropriate actions or responses.
The GPT of ChatGPT
The GPT of ChatGPT stands for ‘generative pre-trained transformer’. Each of these 3 elements is key to understanding how ChatGPT works.
Generative
ChatGPT is a generative AI model – it can generate text, code, images, and sound. Other examples of generative AI are image generation tools like DALL-E or audio generators.
Pre-Trained
The ‘pre-trained’ aspect of ChatGPT is why it seems to know everything on the internet. The GPT model was trained on large swathes of data in a process called ‘unsupervised learning.’
Before ChatGPT, AI models were built with supervised learning – they were given clearly labeled inputs and outputs and taught to map one to the other. This process was pretty slow, since datasets had to be compiled by humans.
When the early GPT models were exposed to the large datasets they were trained on, they absorbed language patterns and contextual meaning from a wide variety of sources.
This is why ChatGPT is a general knowledge chatbot – it was already trained on a huge dataset before being released to the public.
Users who want to further train the GPT engine – to become specialized in certain tasks, like writing reports for your unique organization – can use techniques to customize LLMs.
Transformer
Transformers are a type of neural network architecture introduced in a 2017 paper titled "Attention is All You Need" by Vaswani et al. Before transformers, models like recurrent neural networks (RNNs) and long short-term memory (LSTM) networks were commonly used for processing sequences of text.
RNNs and LSTM networks would read text input sequentially, the same way a human would. But transformer architecture is able to process and evaluate each word in a sentence at the same time, allowing it to score some words as more relevant, even if they’re in the middle or at the end of a sentence. This is known as a self-attention mechanism.
Take the sentence: “The mouse couldn’t fit in the cage because it was too big.”
A transformer could score the word ‘mouse’ as more important than ‘cage’, and correctly identify that ‘it’ in the sentence refers to the mouse.
But a model like an RNN might interpret ‘it’ as being the cage, since it was the noun most recently processed.
The ‘transformer’ aspect allows ChatGPT to better understand context and produce more intelligent responses than its predecessors.
Training Process
ChatGPT is trained through a two-step process: pre-training and fine-tuning.
Pre-training
First, the AI model is exposed to a vast amount of text data – from books, websites, and other files.
During pre-training, the model learns to predict the next word in a sentence, which helps it understand patterns in language. It essentially builds a statistical understanding of language, which enables it to generate text that sounds coherent.
Fine-tuning
After pre-training, the model is fine-tuned on more specific datasets. For ChatGPT, this includes datasets curated for conversations.
A key part of this step involves Reinforcement Learning from Human Feedback (RLHF), where human trainers rank the model’s responses. This feedback loop helps ChatGPT improve its ability to generate appropriate, helpful, and contextually accurate responses.
Key Terms
Tokens
The units of text (words or parts of words) that the model processes. ChatGPT’s inputs and outputs are tokenized for efficient computation.
Zero-shot learning
The ability of the model to perform tasks it has not been specifically trained for by relying on its general knowledge.
One-shot learning involves giving the model one example, while n-shot learning involves giving the model many examples in order to learn.
Attention mechanism
A component of the transformer model that allows it to focus on different parts of the input text when generating responses.
Hallucination
An AI model ‘hallucinates’ when it generates incorrect or nonsensical information. Hallucinations can be mitigated with strategies like retrieval-augmented generation (RAG).
Chain of thought reasoning
A method that helps the model think step by step, improving its ability to handle complex prompts or tasks.
Some ChatGPT models are automatically equipped with this strategy – like the latest OpenAI o1 models. But you can request any version to do chain-of-thought reasoning: just ask it to explain its reasoning step-by-step.
Pre-training
The initial phase where the model is trained on a massive dataset to learn language patterns before being fine-tuned for specific tasks.
Fine-tuning
The process of refining the model on a narrower dataset or task to enhance its performance in specific use cases.
Context window
The limit on the amount of input text the model can consider when generating a response.
A low context window means you can’t send a long report and ask for a summary – the model will have ‘forgotten’ the beginning of the document.
How to customize ChatGPT
There are a few different ways to customize powerful LLMs, like the GPT engine that powers ChatGPT:
Custom GPTs
OpenAI allows its users to customize GPTs to their liking. You can instruct a custom GPT to help you learn the rules for a particular board game, design rock metal band posters, or teach you AI concepts.
Custom AI agents
With the advance in AI technology, it’s easy (and free) to create your own LLM-powered AI agents.
From low code drag-and-drop builders, to advanced coding ecosystems, there are great AI building platforms for any use case and skill level.
Building your own LLM-powered agent means you can design a bespoke AI assistant that schedules your meetings and generates your weekly metrics reports. Or you can build a customer support AI agent that you deploy on WhatsApp. There’s no shortage of possibilities.
Build a GPT-powered chatbot today
ChatGPT is a generalist chatbot, but you can use the powerful GPT engine from OpenAI to build your own custom AI chatbot.
Harness the power of the latest LLMs with your own custom chatbot.
Botpress is a flexible and endlessly extendable AI chatbot platform. It allows users to build any type of AI agent or chatbot for any use case.
Integrate your chatbot to any platform or channel, or choose from our pre-built integration library. Get started with tutorials from the Botpress YouTube channel or with free courses from Botpress Academy.
Start building today. It’s free.
Table of Contents
Stay up to date with the latest on AI agents
Share this on: