large language models (large language models, LLM), on which ChatGPT and its growing list of competitors are based, are not only a human creation, but also an object of study. Computer scientists treat them as if they were a natural phenomenon, despite knowing better than anyone exactly what their logical units consist of, their basic operations, their mechanical guts. And they have a reason for this, because ChatGPT responses are not always predictable. The behavior of the machine cannot be fully deduced from first principles. It is an example of an emergent system manual, a whole that seems more than the sum of its parts, or at least cannot be inferred from them. A field of paradoxical mines.
The structure of large language models has very little mystery. At heart they are neural networks, a software already classic that is inspired by biological neurons, which receive many inputs by their dendrites, combine them and produce a single output by the axon The great innovation that got us all talking about artificial intelligence ten years ago wasn’t so much a groundbreaking idea as it was an increase in strength. If the primitive neural network had three layers of neurons (input, processing, and output), new ones started adding processing layers stacked by the dozens. Also mimicking the brain, these layers abstract information in progressive steps before issuing a response. This is deep learning (deep learning) that has revolutionized the field. Understand deep in the mere sense that it has many layers. It’s just a name.
The way ChatGPT works is disappointingly simple. Large language models are text-eaters that can gobble up the National Library, Wikipedia, and all the world’s newspapers before breakfast. With this glut of material, they produce some very powerful and refined statistics, although focused on crude questions such as which words tend to appear together or two positions further or more here. The previous version of ChatGPT, version three, processed 2,000 words at a time. The new one, GPT-4, processes 32,000. This is again a quantitative advance.
But it turns out that large systems start to do things that their smaller predecessors don’t. Researchers at OpenAI, the creator of ChatGPT, have classified 137 “emerging skills” on large language models, including yours. An example is writing inclusive sentences in German. Another is to pass a bar exam before graduating. GPT-3 fails at this, but GPT-4 succeeds, and the difference is only one of computational power. Turn up the volume on the remote and a complex system suddenly emerges.
Humans do not learn to speak by swallowing food. Encyclopaedia Britannica, although Aldous Huxley boasted that he had read it all. It is true that neural networks are inspired by biological neurons, but only at a very elementary level, which is the only level at which we understand how the brain works. This does not mean that the great language models are stupid, but that their intelligence is different from ours. Let’s keep adding layers and you’ll see.