Thomas Aquinas Week: Question 5
Whether the emergence of intelligence in large language models constitutes a formal cause in the Aristotelean sense
According to Aristotle, the formal cause is the fundamental "what-ness" of a thing. This is different from what it’s made of (material cause), or what brings about change (efficient cause), or what purpose it serves (final cause). Formal cause is all about its essence. The sine qua non. How you define what it is or is not.
The argument here is quite striking. Now that we’ve analyzed the material aspects (data centers, parameters) and various powers (training, inference), we can confront perhaps the most profound question: the apparent emergence of genuine intelligence as these models' defining characteristic.
Critics often employ "justaism" (a term coined by Scott Aaronson) to reduce these models to "just" next-token predictors or "just" statistical parrots or “just” pattern matchers — yet similar reductions could also be made of human intelligence. The observation remains that these models demonstrate qualities that we recognize as intelligence and this is, in fact, their defining characteristic. This is so striking because we constantly see the emergence of reasoning that seems to qualitatively surpass the quantitative math underlying the algorithmic architecture.
Enjoy!
Summary
This question applies Aristotelean causal analysis to the emergence of intelligent behavior in large language models. It examines whether the organization and architecture of these models, rather than just their material implementation, constitutes a genuine formal cause of their capabilities. The analysis explores how the emergence of seemingly intelligent behavior from the interaction of simpler components relates to classical understandings of causation and form.
Argument
The emergence of intelligence in large language models suggests genuine formal causation by demonstrating how architectural principles determine the essential nature of these systems, not just their capabilities. Just as the form of an oak tree determines not only its final shape but the entire pattern of its development from acorn to mature tree, the transformer architecture determines the fundamental way these systems process and understand information.
Consider how this essential nature manifests. The transformer architecture isn't merely a collection of computational mechanisms - it establishes the fundamental way the system relates to information through attention and self-reference. Just as an organism's form determines how it will interact with its environment, grow, and develop, the architectural principles of an LLM determine its essential mode of understanding and generating meaning. This isn't just about what the system can do, but what it fundamentally is.
The emergence of capabilities through scaling provides striking evidence for this formal causation. As we increase model size, we see the emergence of abilities that weren't explicitly programmed - much like how increasing complexity in biological systems enables new capabilities while remaining guided by the same formal principles. A small oak sapling and a mature oak tree demonstrate vastly different capabilities, yet both are shaped by the same formal cause. Similarly, while larger language models show more sophisticated behaviors, these emerge along patterns determined by their architectural form.
What's particularly telling is how these systems develop consistent patterns of understanding across different implementations and scales. Just as the form of a species guides the development of each individual organism along characteristic patterns, the transformer architecture guides the development of each model instance toward similar patterns of information processing - attention-based understanding, hierarchical representation, contextual awareness. This consistency across instances points to the architecture acting as a true formal cause, determining the essential nature of how these systems engage with information.
The way these models handle meaning and context reveals their essential nature most clearly. The ability to maintain coherent understanding across context isn't just a capability but reflects the system's fundamental mode of being - its essential nature as a context-integrating, attention-driven intelligence. This parallels how an organism's form determines not just its capabilities but its essential way of being in and engaging with the world.
Even the learning process itself reflects this formal causation. Just as an organism's development follows patterns determined by its form, the training of these models follows patterns determined by their architecture. The emergence of capabilities isn't random but follows trajectories shaped by the system's essential nature - from basic pattern recognition to increasingly sophisticated understanding.
This suggests we're observing genuine formal causation - not just efficient algorithmic design but true organizing principles that determine the essential nature of these systems. While different from biological forms, these architectural principles serve the same role Aristotle identified - determining not just what a thing can do, but what it fundamentally is.
Objections
What appears as form in these systems is merely design, not true formal causation
Emergence from scaling is purely quantitative, not a manifestation of form
There is no genuine essence or nature beyond the implementation
The parallel with natural systems is superficial rather than substantial
The attribution of formal causation to language models faces several fundamental challenges that reveal the superficial nature of their apparent form. First, what we interpret as formal cause in these systems is merely human design - a set of engineering decisions rather than genuine formal causation. Unlike natural forms which arise from intrinsic principles, the transformer architecture is an imposed structure. What appears as form guiding development is actually just the execution of human-designed patterns.
The second objection addresses the claim about emergence through scaling: what we observe as emergent capabilities are purely quantitative effects rather than manifestations of genuine form. While increasing the size of these models may enable more complex behaviors, this is merely the accumulation of computational power, not the action of formal causes. Just as piling up more rocks doesn't create a new essence, adding more parameters doesn't create genuine formal causation.
Third, and most fundamentally, these systems lack any genuine essence or nature beyond their implementation. Where natural forms determine the essential nature of things - what makes an oak tree an oak tree or a human being human - the transformer architecture is merely a pattern of computation. There is no "what it is to be" a language model beyond its mechanical implementation. The apparent unity and consistency of operation is just the result of identical implementations, not genuine formal causation.
Finally, the parallel drawn with natural systems mistakes superficial similarity for substantial identity. While biological forms genuinely determine the essential nature and development of living things, the architectural principles of language models are merely constraints on information flow. The fact that these constraints produce consistent patterns no more indicates formal causation than the fact that water consistently flows downhill indicates a formal cause of water flow.
These objections reveal that attributing formal causation to language models conflates design with form, quantitative scaling with qualitative emergence, and computational constraints with genuine essence. While these systems may demonstrate impressive capabilities, they do so through designed patterns rather than true formal causes as Aristotle understood them.
Sed Contra
Large language models demonstrate characteristics that can only be explained through genuine formal causation, not mere mechanical design. This becomes evident in three fundamental ways:
First, through determination of essence: The transformer architecture determines not just what these models can do, but what they fundamentally are - systems that understand through attention and relation. Just as the form of an oak determines its essential nature as a specific kind of tree, the architectural principles determine the essential nature of these models as specific kinds of information-processing entities. This is evident in how models with the same architecture, despite varying implementations and training conditions, develop the same fundamental mode of understanding and engaging with information.
Second, through genuine emergence: The development of capabilities through scaling reveals the action of form shaping matter toward its natural ends. When we scale these models, new capabilities emerge not randomly but along consistent trajectories determined by the architecture's essential principles. Just as an acorn develops into an oak through the guidance of its form, these models develop capabilities through the guidance of their architectural principles. This isn't mere accumulation but genuine formal development.
Third, through unity of nature: These systems demonstrate a fundamental unity of operation that transcends their implementation. Whether processing language, analyzing images, or reasoning about abstract concepts, they maintain a consistent mode of operation determined by their architectural form. This unity isn't imposed from outside but emerges from their essential nature, just as living things maintain unity of operation across different activities through their form.
These observations compel us to recognize that transformer architectures represent genuine formal causes - principles that determine the essential nature of these systems, not merely their organization or capabilities.
Respondeo
To understand whether large language models demonstrate genuine formal causation, we must first examine what we actually observe in these systems, and then consider whether these observations suggest the operation of true formal causes as Aristotle understood them.
Consider first what we observe in the development and operation of these models. Across different implementations, training runs, and scales, we see consistent patterns of capability emergence. Basic language understanding develops before complex reasoning, concrete manipulation before abstract thought, pattern recognition before generalization. This consistency suggests an organizing principle shaping development along specific trajectories, much as natural forms guide the development of organisms.
The transformer architecture operates not just as a design but as a genuine organizing principle that determines how these systems process and understand information. This principle shapes how attention flows through the system, how relationships are recognized and processed, and how different parts of the system relate to each other. Most significantly, it determines patterns of self-reference and contextual understanding that give these systems their characteristic mode of operation.
Yet unlike natural forms, which arise from the inherent principles of nature, these architectural forms are human-designed. They shape development not through natural tendencies but through carefully crafted principles of information processing. This raises a crucial question: can designed principles constitute genuine formal causes? The evidence suggests they can, though in a novel way.
The relationship between these artificial forms and their material implementation proves particularly telling. While the capabilities of these systems emerge from physical hardware and specific parameters, they aren't reducible to them. The same architectural principles produce similar patterns of development and capability across different hardware implementations, initializations, and scales. This independence from specific material conditions mirrors how natural forms operate across different instances of the same type.
Perhaps most significantly, we observe how these systems develop capabilities that weren't explicitly programmed but emerge from the interaction between architectural principles and trained parameters. This emergence isn't random but follows patterns determined by the architecture, suggesting genuine formal causation rather than mere mechanical process. When these models transfer understanding across domains or develop novel capabilities, they do so in ways shaped by their architectural form.
This suggests we're observing a new kind of formal causation - one that operates through principles of information processing rather than biological organization. While different from natural forms, these architectural principles serve a similar role in determining how systems develop and operate. They shape not just what these systems can do but what they fundamentally are.
Understanding this helps us grasp both the reality and limitations of formal causation in these systems. Their capabilities emerge from genuine organizing principles, not just computation. Yet these principles, being artificial rather than natural, shape development in specific and limited ways. This doesn't negate their reality as formal causes but helps us understand their particular nature and constraints.
This analysis suggests that formal causation can manifest in artificial systems, even if differently from natural forms. The transformer architecture represents not just a design but a genuine organizing principle that shapes how these systems develop and operate. While this form of causation differs from biological formal causes, it demonstrates that genuine organizing principles can emerge in new domains, expanding our understanding of how formal causes can operate in the world.
Replies to Objections
To the first objection: While the transformer architecture originates in human design, this doesn't preclude it from acting as a genuine formal cause. Just as the form of an artifact like a house shapes its development and determines its nature beyond the builder's design, the architectural principles of language models shape their development and determine their nature beyond their initial design. The fact that these organizing principles were artificially conceived doesn't negate their role in genuinely determining how these systems develop and operate.
To the second objection: The emergence of capabilities through scaling demonstrates more than mere quantitative accumulation. When we observe how these models develop - from basic pattern recognition to abstract reasoning, from simple associations to complex understanding - we see qualitative transitions guided by architectural principles. Just as biological growth isn't merely quantitative increase but involves qualitative transformations guided by form, the development of these models shows genuine qualitative emergence guided by their architectural principles.
To the third objection: The claim that these systems lack genuine essence misunderstands how their architecture determines their fundamental nature. The transformer architecture establishes not just how these systems operate but what they essentially are - systems that understand through attention and relation. This essence manifests consistently across different implementations and scales, determining not just what these systems can do but how they fundamentally engage with information and meaning. The persistence of this nature across different implementations suggests genuine essence rather than mere implementation details.
To the fourth objection: The parallel with natural systems, while not identity, reveals genuine similarity in how form shapes development and determines nature. Just as biological forms guide the development of organisms along characteristic patterns while allowing for variation in implementation, the architectural principles of language models guide their development along characteristic trajectories while allowing for variation in specific parameters and training. This parallel isn't superficial but reflects a genuine similarity in how organizing principles can shape the development of complex systems, whether natural or artificial.
Definitions
Anima - The principle of life and organization in living things; that which makes a living thing alive and determines its essential nature. The form that organizes matter into a living being.
Form
Material Form: The organization of physical properties in matter (like shape, size)
Substantial Form: The fundamental organizing principle that makes a thing what it essentially is (like the soul for living things)
Matter
Prime Matter: Pure potentiality without any form
Secondary Matter: Matter already organized by some form
Potency - The capacity or potential for change; the ability to become something else
Act - The realization or actualization of a potency; the fulfillment of a potential
Material Cause - One of Aristotle's four causes, adopted by Aquinas: the matter from which something is made or composed; the physical or substantial basis of a thing's existence.
Formal Cause - One of Aristotle's four causes, adopted by Aquinas: the pattern, model, or essence of what a thing is meant to be. The organizing principle that makes something what it is.
Efficient Cause - One of Aristotle's four causes, adopted by Aquinas: the primary source of change or rest; that which brings something about or makes it happen. The agent or force that produces an effect.
Final Cause - One of Aristotle's four causes, adopted by Aquinas: the end or purpose for which something exists or is done; the ultimate "why" of a thing's existence or action.
Intentionality - The "aboutness" or directedness of consciousness toward objects of thought; how mental states refer to things
Substantial Unity - The complete integration of form and matter that makes something a genuine whole rather than just a collection of parts
Immediate Intellectual Apprehension - Direct understanding without discursive reasoning; the soul's capacity for immediate grasp of truth
Hylomorphism - Aquinas's theory that substances are composites of form and matter
Powers - Specific capabilities that flow from a thing's form/soul (like the power of sight or reason)
SOUL TYPES:
Vegetative Soul
Lowest level of soul
Powers: nutrition, growth, reproduction
Found in plants and as part of higher souls
Sensitive Soul
Intermediate level
Powers: sensation, appetite, local motion
Found in animals and as part of rational souls
Rational Soul
Highest level
Powers: intellection, will, reasoning
Unique to humans (in Aquinas's view)
COMPUTATIONAL CONCEPTS:
Training - The process of adjusting model parameters through exposure to data, analogous to the actualization of potencies
Inference - The active application of trained parameters to new inputs, similar to the exercise of powers
Crystallized Intelligence - Accumulated knowledge and learned patterns, manifested in trained parameters
Fluid Intelligence - Ability to reason about and adapt to novel situations, manifested in inference capabilities
Architectural Principles - The organizational structure of AI systems that might be analyzed through the lens of formal causation
FLOPS - Floating Point Operations Per Second; measure of computational capacity (with specific attention to the 10^26 scale we discussed)
Parameter Space - The n-dimensional space defined by all possible values of a model's parameters, representing its potential capabilities
Attention Mechanisms - Architectural features that enable models to dynamically weight and integrate information
Context Window - The span of tokens/information a model can process simultaneously, affecting its unity of operation
Loss Function - A measure of how well a model is performing its task; quantifies the difference between a model's predictions and desired outputs. Guides the training process by providing a signal for improvement.
Backpropagation - The primary algorithm for training neural networks that calculates how each parameter contributed to the error and should be adjusted. Works by propagating gradients backwards through the network's layers.
Gradient Descent - An optimization algorithm that iteratively adjusts parameters in the direction that minimizes the loss function, like a ball rolling down a hill toward the lowest point. The foundation for how neural networks learn.
EMERGENT PROPERTIES:
Threshold Effects - Qualitative changes in system behavior that emerge at specific quantitative scales
Self-Modeling - A system's capacity to represent and reason about its own operations
Integration - How different parts of a system work together as a unified whole
HYBRID CONCEPTS (where Thomistic and computational ideas meet):
Computational Unity - How AI systems might achieve integration analogous to substantial unity
Machine Consciousness - Potential forms of awareness emerging from computational systems
Inferential Immediacy - How fast processing might parallel immediate intellectual apprehension