Emergence in Large Language Models: What It Actually Means

The word "emergence" gets used loosely in ML research. It often means "this capability appeared and we don't fully know why." That's honest but not satisfying.

Let's be more precise.

What Emergence Means in Complex Systems

In the scientific sense, emergence describes properties that appear at a higher level of organization that aren't present (or predictable) from the lower-level components alone.

A classic example: wetness is a property of water at the bulk level. Individual H₂O molecules are not wet.

In LLMs: Scale Changes Capability Qualitatively

The landmark Wei et al. 2022 paper documented capabilities that appeared to emerge sharply at certain model sizes — arithmetic, multi-step reasoning, chain-of-thought. Below a threshold, near-zero performance. Above it, the capability appears.

This is not just a gradual improvement. The curve looks like a phase transition.

The Measurement Problem

A significant caveat from Schaeffer et al. 2023: some apparent emergent capabilities are artifacts of non-linear evaluation metrics. When you measure with a smooth metric, the "sudden" appearance becomes a smooth improvement.

This doesn't dismiss emergence — it demands more careful measurement.

Implications for Building

If you're building applications on LLMs, the practical takeaway is:

Capability floors matter — some tasks require model scale to work at all
Prompting strategy is not model-independent — chain-of-thought only helps above certain scales
Benchmarks lie — measure on your actual task distribution, not proxies

Emergence is real and interesting. It also shouldn't be magical thinking that excuses us from understanding our systems.