The phenomena that allows for Reasoning in modern LLM when they are scaled up.

  • Older LLMs could only translate
  • Adding more attention layers allows it to Reason