
In this blog, I’ll walk you through how one of today’s most intriguing technology - LLMs, works. Under the hood there are many sophisticated algorithms at play but I’ll keep it simple enough that even someone with zero background in machine learning can follow. So, the only thing you really need here is curiosity.
Let’s dive in.
Step 1: You type something
When you type a query like this:
Why the sky is blue?
The LLM doesn’t see your question as a simple string of words. Instead, it goes through a few key steps to process it.
Step 2: The chat template
First your input is added into a Chat template which looks something like -
<system-start>
System: You are a helpful assistant.
<system-end>
<user-start>
User: Why the sky is blue?
<user-end>
<assitant-start>
...
This template acts like the script of a play: it gives structure to the conversation using special tokens such as <user-start>, <assistant-start>, or <eos> (end of sequence). These tokens aren’t real words but simple markers that tell the model where your message begins, where its reply should start, and when everything ends. They are crucial for helping the model understand the conversation flow.
Step 3: Tokenization
Computers can’t actually understand words. To make sense of your question, the LLM breaks it down and turns each word, symbol, or tag into a number (this process is called tokenization). These numbers, called tokens, come from a dictionary (the tokenizer). A part of that dictionary might look like this:
{
"why": 121,
"the": 12,
"sky": 4414,
...
"<system-start>": 2
"<eos>": 1
}
Each model has its own unique dictionary.
At this point, your question has been transformed into a string of numbers the computer can actually handle. For example, your input might look like:
[13903, 382, 290, 17307, 9861, 30, ...]
This step is key, because turning words into tokens is the only way the LLM’s internal “brain” can understand your message.
You can play around with this tool from OpenAI to see how LLM tokenize input.
Step 4: The Prediction Game
Now these tokens go into the core of the LLM. Think of it as a black box that takes tokens as input and predicts what token (word) should come next, based on the patterns it learned from a massive amount of text.
For example, if your input is:
The capital of India is
The model might predict probabilities like:
{
31: 0.8, # Delhi
1311: 0.1, # hot
771: 0.05, # very
...
}
The model then picks the token with the highest probability, in this case 31 (Delhi).
As you might have noticed, the model gives probabilities of numbers as output. These numbers need to be turned back into words for us to be able read, we’ll explore this process in later step.
Step 5: One word at a time
The LLM can only predict one word at a time. So, how does it generate a whole sentence or paragraph?
It’s like a game of building blocks.
Once it predicts the first word, we have to feed the input + output again to the model to predict the next word in the sequence.
So for example, If the input is
[21, 411, 7811]
And model predicted 31 as output. We will combine this output and feed it to the model again. So the next input to the model will become -
[21, 411, 7811, 31]
This process repeats until we have the final answer.
That’s why LLMs are so resource-intensive and why usage is often billed by tokens.
But, how does the model know when to stop?
Remember those special tags from the second step? This is where another one comes in. The LLMs are trained to know when the output ends.
The prediction game repeats until the model outputs a special token that may look something like <eos> (end of sequence).
Step 6: Mapping back to words
Now, what do we do with all those numbers that models spits out?
We simply flip the process and map these tokens back to human-readable text using the same dictionary we used in step 3, just in reverse.
And voilà! The numbers are transformed back into words, appearing as a complete answer on your screen.