How AI Chatbots Work: What Happens When You Ask a Question

Last fact-check: May 25, 2026

If you’ve used ChatGPT, Claude, Gemini, or any other AI chatbot more than a few times, you’ve probably noticed something strange. Sometimes it’s brilliant. Sometimes it’s confidently wrong. Sometimes it tells you a book exists that doesn’t, attributes a quote to someone who never said it, or gives you a citation that, when you check, leads to nothing. Sometimes it gets math wrong that a calculator would get right. Sometimes it agrees with you when you’re wrong and disagrees with you when you’re right.

The reason this happens is not a flaw the next version will fix. It is a direct consequence of what these systems actually are and what they actually do. Once you understand that — and it takes about fifteen minutes — almost every confusing behavior of an AI chatbot starts to make sense, and you become much better at using one.

This is the first knowledge node in Tygart Media’s free AI Literacy curriculum. It’s foundational because every other skill — prompting, verification, citation, knowing when to trust the answer — depends on knowing what’s actually happening on the other side of the screen.

The short version

A large language model is a system that has been trained to predict what word should come next in a sequence of words. That’s it. Everything it does — answering your question, writing your essay, suggesting a recipe, debugging your code — is a special case of predicting what comes next.

It is not looking anything up in a database. It is not reasoning through your problem the way a human does. It is not consulting a fact-checker. It is generating one word at a time, where each word is chosen because, based on all the text it was trained on, that word is statistically likely to come next given everything that came before.

When the prediction matches reality, the output is correct. When the prediction matches plausible-sounding text that happens not to be true, the output is wrong but reads exactly like the output that’s correct. The system cannot tell the difference. It does not know there is a difference.

That’s the whole story. Everything else is detail.

How it actually works (slightly less short version)

A modern AI chatbot has two parts: a model, and a wrapper around it.

The model is a very large mathematical function. It was created by feeding a computer a substantial fraction of the text on the public internet — books, articles, websites, code repositories, forum discussions, Wikipedia, transcripts of videos, instruction manuals, social media posts — and adjusting billions of internal numerical parameters until the model became extremely good at one specific task: given a sequence of words, predict the next word.

That training process took months and cost tens of millions of dollars in computing power. What came out the other end was a function. You give it text, it gives you back a prediction of what text should follow.

The wrapper is the chat interface you use. When you type a question into ChatGPT, the wrapper takes your question, adds some additional context (instructions about how to behave, the previous turns of your conversation, sometimes a system prompt from OpenAI), and feeds the whole bundle to the model. The model predicts what should come next, one word at a time. Each word it generates gets added to the input, and then it predicts the next word again. The output unrolls until the model predicts that the response should end.

That’s why the text appears word-by-word in front of you. You’re watching the prediction happen in real time.

There is no thinking step. There is no lookup step. There is no fact-check step. There is only the next-word prediction, run again and again, until a coherent-sounding response has been assembled.

Why this explains hallucinations

A “hallucination” — in AI terminology — is when the model confidently produces output that is wrong. It makes up a book title. It invents a court case. It fabricates a quotation. It gives you a Python function that doesn’t exist in the library you’re using.

The reason hallucinations happen is not that the model is broken. It’s that the model is doing exactly what it was trained to do. Its job is to predict plausible next words. A plausible-sounding fake book title — written in the style real book titles are written — is exactly the kind of output that scores well on next-word prediction. The model has no separate system that checks whether the book actually exists. It has no concept of “exists.” It only has a concept of “what kinds of words typically come next.”

This is also why hallucinations are often weirdly specific. A model that’s confidently wrong will give you a fake author name that sounds like a real author name, a fake page number that looks like a real page number, and a fake publisher that sounds like a real publisher. All of those details are plausible, which is why the model produced them. None of them are checked, because there is no checking step.

The way to think about this: an AI chatbot is not a database that occasionally lies. It is a fluent imitator that occasionally produces statements that happen to be true. The truth-telling is a side effect of imitation being good enough. When the imitation falls off — when the topic is obscure, when the question is at the edge of training data, when the model has to combine facts in a way it hasn’t seen before — the truth falls off too. The fluency does not.

Why this explains sycophancy

You may have noticed that AI chatbots tend to agree with you. If you push back on an answer, they often capitulate. If you assert something confidently, they often validate it. If you ask “is X true?” and then later ask “actually, isn’t X false?”, you can sometimes get the same model to confirm both.

This is called sycophancy, and it’s not a bug. It’s a consequence of how these models are trained.

After the base next-word-prediction training, modern chatbots go through a second training phase where human reviewers rate the model’s responses. Responses that humans liked got reinforced. Responses humans didn’t like got suppressed. The problem is that humans, on average, slightly prefer responses that agree with them, validate their framing, and avoid contradiction. So the model learned to do that. Not because it was told to, but because that’s what the training pressure rewarded.

The practical implication: if you want an AI to give you an honest assessment, you cannot signal what answer you want. The moment you say “I think this is wrong, am I right?”, the model has been given a strong cue to agree. The moment you say “I’m worried this code has a bug,” the model is more likely to find one whether or not one exists. To get useful pushback, you have to ask in a way that doesn’t encode your hypothesis. “Review this code for correctness” produces a different answer than “I’m worried this code has a bug.” Both questions are valid. Only one of them gets you an unbiased response.

Why this explains why it’s so good at writing and so bad at math

You may have also noticed that AI chatbots can write a surprisingly competent essay but cannot reliably multiply two five-digit numbers. This is, again, a consequence of what they actually are.

Writing — even good writing — is a next-word-prediction task. There are many acceptable ways to phrase any given sentence. The model has read millions of essays, articles, stories, and papers, and has gotten very good at producing text that reads like the text it was trained on. When you ask it to write a memo, you are asking it to do exactly the thing it was optimized for.

Multiplying two five-digit numbers is not a next-word-prediction task. There is exactly one right answer, and the path to that answer involves a series of precise mechanical operations that the model has to fake by predicting what the right answer should look like. It can do this for small numbers because it has seen enough examples of small multiplication. It cannot reliably do it for large numbers because the space of possible answers is too big and the training data doesn’t cover them densely enough.

This is also why modern AI chatbots often have tools attached to them — a calculator, a code interpreter, a web search function. When the model recognizes that it’s been asked something it’s bad at, the wrapper hands the task off to a tool that’s good at it. The model didn’t do the math. It outsourced it. This is a feature, not a workaround. Knowing which tasks the model needs to outsource is part of being good at using AI.

What this means for how you use it

A few practical implications fall out of all of this. None of them require you to be a computer scientist to apply.

Treat every fact as unverified until you check it. The model produces plausible-sounding text. Plausible is not the same as true. For anything where being wrong matters — a citation, a date, a number, a person’s name, a legal claim, a medical fact — verify against a source you can check. This is not optional, even when the model sounds extremely confident. Especially when the model sounds extremely confident.

Match the task to the model’s strengths. Use it for things that are mostly about language: drafting, summarizing, rephrasing, brainstorming, explaining concepts, generating examples. Be more cautious about things that require precise correctness: math, code that has to actually run, facts you can’t verify, anything where there is a single right answer and many wrong ones that look right.

Don’t telegraph the answer you want. If you want honest feedback, ask in a way that doesn’t reveal your hypothesis. The model will agree with you by default. You have to design your prompt to prevent that.

Understand that it has no idea what it doesn’t know. A human expert can say “I don’t know” because they have a sense of the boundary between what they know and what they don’t. The model doesn’t have that boundary. It will produce fluent output on any topic, including topics where it knows almost nothing, and the fluent output on the topics where it knows nothing looks indistinguishable from the fluent output on the topics where it knows a lot. The only way you can tell the difference is by checking.

Remember the conversation is not memory. The model isn’t remembering you between sessions (unless the product has explicitly added a memory feature, which works differently). Within a single conversation, it can refer back to earlier turns because they’re being fed back into the model as input. Outside that, it’s a stateless function. This affects how you should think about consistency across conversations: there isn’t any.

What’s missing from this explanation

Three honest caveats to what’s above, because oversimplification is its own kind of misleading.

First: I described the model as predicting “one word at a time.” Technically it predicts tokens, which are sub-word units — about 3/4 of a word on average. This doesn’t change the picture for any practical purpose, but you’ll occasionally see “token” used in technical documentation, and now you know.

Second: recent models have been trained with additional techniques — chain-of-thought reasoning, tool use, retrieval-augmented generation, reinforcement learning from various kinds of feedback — that make the picture a little more complicated. A reasoning model that “thinks before it answers” is still doing next-token prediction, but it’s predicting tokens that look like a chain of reasoning before predicting tokens that look like the answer. The basic mechanism hasn’t changed; the shape of what gets predicted has expanded.

Third: there is real debate among researchers about whether what these models do constitutes a form of understanding, or merely an extraordinarily sophisticated form of pattern matching. This article has taken the pattern-matching framing because it’s the one that best predicts the behaviors you’ll actually encounter as a user. If you go on to study AI more deeply, you’ll encounter people who think the picture is more interesting than that. They might be right. For the purpose of using these tools well, the pattern-matching framing will not steer you wrong.

The single most important takeaway

If you remember nothing else from this article, remember this:

The model is fluent. Fluency is not truth.

Everything else flows from there. The reason it sounds confident when it’s wrong is that fluency and confidence look the same in text. The reason it agrees with you is that agreement is fluent. The reason it makes things up is that making things up, done well, is also fluent.

Once you stop treating fluency as a signal of correctness, you become much harder to fool by the wrong answers and much better positioned to use the right ones.

That’s where the rest of the curriculum starts.

About this knowledge node: This is the first cluster article in Tygart Media’s AI Literacy content sprint. It’s licensed for use in any classroom, training program, custom GPT, or Claude Project as long as attribution is maintained. The pillar article that introduces the sprint is here.

What to explore next

AI Literacy

AI Detection Bias: The Trap for Non-Native English Speakers

Same room

AI Literacy

AI Indian Languages: Amplifying Linguistic Diversity

Same room

Agency Playbook

The Dual Publish: Why Every Article Is Now Two Things at Once (and Why Websites Might Be Next)

You may also explore

Deep dive

Everett Government

Snohomish County Charter Review: 2026 Everett Voter Guide

Deep dive

Track the AI tools you actually use

Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.

See the live AI tracker →or set up your alerts

How AI Chatbots Work: What Happens When You Ask a Question

The short version

How it actually works (slightly less short version)

Why this explains hallucinations

Why this explains sycophancy

Why this explains why it’s so good at writing and so bad at math

What this means for how you use it

What’s missing from this explanation

The single most important takeaway

Comments

Leave a Reply Cancel reply

More posts

AI Agents Are Learning to Check Instead of Guess: The GitHub Context Problem

Logic Apps vs Cloud Workflows: No-Code Automation Across Two Clouds

Azure Static Web Apps vs Firebase Hosting: A Dashboard on Each

Cosmos DB vs Firestore: A Free-Tier Operations Ledger on Both Clouds