The word "emergence" gets used loosely in ML research. It often means "this capability appeared and we don't fully know why." That's honest but not satisfying.
Let's be more precise.
What Emergence Means in Complex Systems
In the scientific sense, emergence describes properties that appear at a higher level of organization that aren't present (or predictable) from the lower-level components alone.
A classic example: wetness is a property of water at the bulk level. Individual H₂O molecules are not wet.
In LLMs: Scale Changes Capability Qualitatively
The landmark Wei et al. 2022 paper documented capabilities that appeared to emerge sharply at certain model sizes — arithmetic, multi-step reasoning, chain-of-thought. Below a threshold, near-zero performance. Above it, the capability appears.
This is not just a gradual improvement. The curve looks like a phase transition.
The Measurement Problem
A significant caveat from Schaeffer et al. 2023: some apparent emergent capabilities are artifacts of non-linear evaluation metrics. When you measure with a smooth metric, the "sudden" appearance becomes a smooth improvement.
This doesn't dismiss emergence — it demands more careful measurement.
Implications for Building
If you're building applications on LLMs, the practical takeaway is:
- Capability floors matter — some tasks require model scale to work at all
- Prompting strategy is not model-independent — chain-of-thought only helps above certain scales
- Benchmarks lie — measure on your actual task distribution, not proxies
Emergence is real and interesting. It also shouldn't be magical thinking that excuses us from understanding our systems.