Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Generating Meaning: Active Inference and the Scope and Limits of Passive AI

Giovanni Pezzulo1,*, Thomas Parr2, Paul Cisek3, Andy Clark4,5, Karl Friston2,6

1. Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
2. Wellcome Centre for Human Neuroimaging, Queen Square Institute of Neurology,
University College London, London, UK
3. Department of Neuroscience, University of Montréal, Montréal, Québec, Canada
4. Department of Philosophy and Department of Informatics, University of Sussex, UK
5. Department of Philosophy, Macquarie University, Sydney, New South Wales, Australia
6. VERSES Research Lab, Los Angeles, CA, USA

Corresponding author:

Giovanni Pezzulo,

Institute of Cognitive Sciences and Technologies,

National Research Council,

Via S. Martino della Battaglia 44,

00185, Rome, Italy.

[email protected]

1
Abstract

Prominent theories depict brains as generative models of organismic interaction with the
world, raising potential similarities with current work in Generative AI. However, because they
are centred upon the control of purposive, life-maintaining sensorimotor interactions, the
generative models of living organisms are inextricably anchored to body and world. Unlike the
essentially passive models learnt by Generative AIs, these models must capture and control the
sensory consequences of action. This allows embodied agents to intervene upon their worlds
in ways that constantly put their best models to the test, providing a solid bedrock that is—we
argue—essential to the development of genuine understanding. Here, we review the resulting
implications, as well as possible and fecund future directions for Generative AI.

Keywords

Generative AI; Large Language Models; Active Inference; Predictive Processing; Foundation
Models; Embodied Cognition

Highlights

• Generative AI, such as Large Language Models (LLMs), have achieved remarkable
performance in various tasks, such as text and image generation.
• We discuss the foundations of Generative AIs by comparing them with our current
understanding of living organisms, when seen as active inference systems.
• Both Generative AI and active inference are based on generative models, but they
acquire and use them in fundamentally different ways.
• Living organisms and active inference agents learn their generative models by engaging
in purposive interactions with the environment and by predicting these interactions.
This provides them with a core understanding and a sense of mattering, upon which
their subsequent knowledge is grounded.
• Future Generative AIs might follow the same (biomimetic) approach—and learn the
affordances implicit in embodied engagement with the world before—or instead of—
being trained passively.

2
Optimism and scepticism about generative AI

Generative Artificial Intelligence (AI) systems are taking society by storm, demonstrating
impressive capabilities in domains that were previously considered the exclusive province of
human cognition. Large language models (LLMs) like ChatGPT [1] generate high-quality text,
and text-to-image systems like DALL-E [2] generate [in]credible illustrations, all from simple
prompts. Multimodal systems complement LLMs with visual (e.g., Flamingo [3]) and sensor
data to generate planned actions for robots (e.g., PaLM-E [4]), perhaps starting to bridge the
apparent gap with sensorimotor integration and agency.

These and other Generative AI systems—or foundation models [5]—are engendering


excitement and fostering intense theoretical debate. Does ChatGPT “understand” what it talks
about in the way we do, or is it an example of a “Chinese room” [6], transforming symbols
without any real understanding? Does it have a “grasp” on external reality or is it an article
mimic, driven by the sequential statistics of natural language? Can Generative AI go beyond the
data it ingested and be creative? Ultimately, is Generative AI on path towards true artificial
understanding or is it the dénouement of an intrinsically self-limiting approach?

The current debate vacillates among these directions (Box 1) and the development of novel
Generative AIs with better capabilities—and novel emergent properties—proceeds at a fast
pace; along with tools to understand what they do [7–9]. Given this, answering the above
questions is perhaps premature. In this treatment, we take a different approach: we offer a
biophilic perspective on Generative AIs, by comparing them to an active inference (or predictive
processing) view of brain and cognition, which foregrounds the notion of generative models (or
world models), but in a biological setting [10,11].

Interaction and active inference in biological systems

For any biological system to be sustainable, it must actively maintain itself within characteristic
states and counter perturbations that supplant those states. This is accomplished by
physiological control, which operates through homeostasis [12], and allostatic behaviour,
which extends feedback control through the environment [13]. As considered by philosophers
[14–16], psychologists [17–21], neuroscientists [10,22,23], and engineers [24–26], the primary
function of the brain is not to accumulate knowledge about the world, but to control interaction
with the world. Crucially, specific interactions reliably change states of affairs in positive or
negative ways (e.g., eating reduces hunger, fleeing from a predator reduces danger, etc.), and
we can use them to our advantage. Thus, certain features of the world are meaningful to us
because they specify the ways that we can act on the world—what Gibson called “affordances”
[27]. The perception of affordances and their adaptive use in behaviour is a kind of
sensorimotor understanding that precedes explicit knowledge of the world both in evolution
[28] and in the course of a child’s development [17].

For many kinds of interaction, some (implicit or explicit) knowledge of the dynamics of the
world is beneficial [29]. This includes the ability to predict how our actions will influence our

3
state, and to infer the context in which such predictions apply. These are cornerstones of a
prominent perspective in cognitive neuroscience called “active inference”. A key idea here is
that in living organisms, sentient behaviour is fundamentally predictive and rests on models of
the world that can generate predictions about the observable consequences of action
[10,11,13,30].

In this sense, Generative AI shares several commitments with active inference. Both emphasize
prediction, and both include generative models, albeit differently, see Figure 1. Generative AIs
are based on deep (neural) networks that construct generative models of their inputs, via self-
supervised learning. For example, the training of most LLMs involves learning to predict the
next word in a sentence, usually using autoregressive models [31] and transformer architectures
[32]. Once trained on a large corpus of exemplars, the models learned by Generative AIs afford
flexible prediction and the generation of novel content (e.g., text or images). Furthermore, they
excel in various downstream tasks, such as text summarization and question answering;
learning from instructions and task examples without additional training (i.e., in-context
learning [33]). Additional fine-tuning using small, domain-specific datasets permits LLMs to
address even more tasks, such as interpreting medical images [34] and writing fiction [35].

In active inference, generative models play a broader role that underlies agency. During task
performance, they support inference about states of the extrapersonal world and of the internal
milieu, goal-directed decision-making, and planning (as predictive inference). During off-line
periods, such as those associated with introspection or sleep, generative models enable the
simulation of counterfactual futures and a particular form of training “in the imagination”,
which optimize generative models that—crucially—include the agent’s policies [36–38].

“ First go north 10m, then go south 10m. “ First go north 10m, then go south 10m.
Where did you end up?” Where did you end up?”

“Where I started” “Where I started”

“ north”
“ south”

“ 10m”
Policies

“ where”

Generative A I A ctive Inference

Figure 1. Generative models in Generative AI and Active Inference. This figure highlights the
conceptual differences between the ways generative models support the solution of the same

4
problem: predicting a travel destination. The left schematic is designed to resemble that of a series
of transformers networks [32]. The boxes on the left highlight semantically meaningful words
which might be picked out by the implicit ‘self-attention’ mechanisms to predict the most likely
output. The active inference architecture [36–38], on the right, illustrates a network of neuronal
systems with reciprocal connectivity—of the sort found in the brain—supporting recurrent
dynamics [39]. These are arranged hierarchically, such that predictions based upon the policy we
might pursue—shown as combinations of ‘north’ (upwards arrow) and ‘south’ (downwards
arrow) actions—influence hidden states of the world (e.g., my location in allocentric space), which
themselves predict both the words we might hear and speak, and the views we might encounter.
These inferred hidden states—crucially including where we, as a physical agent, are in the world
and where we plan to go—are central to biological systems that engage in active inference.
Hearing the question shown at the top of the figure updates our beliefs about the sequence of
actions we might take (or imagine ourselves taking), which updates predictions about the
sequence of locations we will visit (and the visual scenes we will encounter), itself updating our
predictions about the next words we will speak to answer the question; see [40] for an example in
a simple navigation setting.

What route from generative models to understanding?

The generative models of active inference—and of living organisms—can distil latent variables
that abstract away from data to afford good explanations and predictions; and in parallel,
provide conceptual understanding. Interestingly, the studies reviewed above speak to the
possibility that—in virtue of their predictive training—the deep latent variables of Generative
AIs likewise come to reflect deep regularities beyond the training domain (e.g., non-linguistic
regularities for LLMs, such as the relations between looks and tastes). This may be because
distilling such knowledge about the world (through language) is the best way to predict the
next word. After all, the latent process that generates text rests on people who communicate to
pursue their goals. Successful generative models might develop latent variables that capture
aspects of this generative process, in the same way that a parrot has an implicit notion of syntax
when repeating a heard phrase. While this remains to be fully assessed, there might be
important differences in the ways latent representations are installed in living organisms and
Generative AI.

An example will help: for humans and other creatures, interactive control exploits certain
properties of the world that has been ‘carved out its joints’. For instance, a table affords a place
to rest one’s plate, a place to sit, or a place to find shelter during an earthquake. While each of
these meaningful affordance is mechanistically distinct, they are all attached to objects in the
world. Consequently, the concept of a “table” may serve as useful (compressed) shorthand for
“the thing I can place stuff on, sit on, or hide under”. Thus, the concept is a constellation of latent
constructs that link an object to its action-dependent consequences [41,42]. This perspective is
in keeping with embodied cognition studies, showing that living organisms learn about objects
through sensorimotor interaction, and their abstract concepts—such as that of ‘weight’ ‘size’
and ‘throwability’—are grounded in modal information [20]. Language competence itself—

5
comprising semantic and pragmatic abilities—is built on top of knowledge grounded in the
sensory modalities [43] and a non-linguistic “interaction engine”, which capitalizes on
nonverbal joint actions [44,45]; such as moving a table around a tight corner.

From an embodied perspective, communication is a kind of sensorimotor interaction, albeit one


that extends through other creatures in our environment [46]. Consider a human infant that
cannot accomplish much on its own. Fortunately, in the niche of helpless human infants, there
is something called a parent, which has the handy properties of being incredibly complex but
also very easy for the infant to control. The baby cries and the parent rushes over to figure out
and fix whatever is the problem, whether this involves getting some milk or driving at high
speed to a hospital. With time, the baby can learn to make different noises to produce different
outcomes via the parent, and the parent will deliberately help the baby learn which noises make
the parent bring her food versus water versus changing the diaper, etc. Throughout, the real
purpose of making noises is not to convey knowledge but to persuade. Animals do this all the
time, from the threat postures of crayfish to monkeys baring their teeth to humans uttering
“back off!”. Importantly, the meaning of the communique is not in the acoustics or syntax of a
given utterance, but in the interaction that the utterance is predicted to induce in those who
speak the same language, and the desired consequence of that interaction. The words
themselves are just shorthand notation for the meaningful interactions, and they are compact
and “symbolic” because the complex external agent will handle the details of implementing the
interaction. Human linguistic communication takes this to extremes of abstraction, but it is still
grounded by the fundamental context of interactive control. These examples illustrate the fact
that we learn the meaning of linguistic symbols as part of pragmatically rich interactions with
our conspecifics and on top of a more primitive understanding of the world that we acquire by
interacting with it. Current efforts to model grounded language acquisition in cognitive robotics
follow a similar (albeit simplified) approach, which consists in training models to develop
linguistic and symbolic abilities in the context of goal-directed actions [47] and in interactive
settings [48]. This is different from the approach taken by current LLMs and other Generative
AI, which learn passively from large sets of textual multimodal (e.g., text and video) data.

In sum, the meaning of linguistic symbols does not originate from our capability to process
natural language, but from the more foundational understanding of the lived world that we
accumulate by sampling and interacting with it. While it is possible that the latent variables of
Generative AIs likewise come to reflect statistical regularities of the world inherent in our
language and art, this is accessed by skipping the above scaffolding processes—by distilling
world knowledge from curated sets of text- or image-based content. Because this content is the
product of human communication, they inherit the structure in the kinds of meaningful
interactions the humans express (e.g., causes precede effects, paragraphs stay on topic, certain
phrases get repeated in specific contexts). In the case of a large language model, for example,
the meaning to which the words refer is understood by the humans who produced the training
text, and by the humans who read the transformed text, but the transformer of the text itself
was never provided with any connection to the interactions that lent the words their meaning.
Thus, it remains to be seen whether Generative AIs trained on human generated content inherit
the semantics of that content or whether they merely mimic its statistical structure [49]. In this
respect, the efforts reported above (see Box 1)—to assess whether the latent variables of
6
Generative AIs reflect meaningful colour or distance representations—might not be sufficiently
diagnostic. These should be complemented by efforts to understand whether these are
semantically meaningful for the Generative AIs that use them and not only for us as collocutors.
The problem here is that it is not clear what kind of analysis would offer a fair test (could we
replace driving instructors with large language models—and would you let them drive your
car?). It is as if we encountered an alien species whose window on reality was through or
descriptions of the world (see Box 2).

In the AI community, models are usually judged on the basis of performance metrics, but good
performance on a task done well by humans does not imply they employ similar processes. For
example, despite initial excitement about deep convolutional networks—as models of the
primate visual object recognition system [50]—empirical evidence suggests that their
operation bears little resemblance to established psychophysical phenomena [51]. Similarity
to brains may not be relevant for many engineering applications, but it may foreground viable
paths toward general artificial intelligence. Though too early to tell, an analogous lesson may
await LLMs and other Generative AIs: will they overcome apparent limitations when given still
more data, or is their capacity for understanding inherently limited? An answer would require
novel benchmarks that measure the biomimetic ability of Generative AIs (e.g., embodied in
robots) not just to answer questions but to achieve open-ended goals in the environment [52].

A complementary approach to this question—pursued below—compares how Generative AI


and active inference acquire generative models, to draw conclusions about what sort of ‘grip’
on reality these generative models might afford.

Generative model acquisition in Generative AI and active inference

"The child does not 'learn,' but builds his knowledge through experience and relationships
with his surroundings." (Maria Montessori)

Generative AIs and living organisms (seen from the active inference perspective) acquire their
generative models through different training regimes. While both systems might learn about
the same concepts (e.g., what it means to go north versus to go south), they do it differently, see
Figure 2.

Living organisms (and active inference systems) acquire their generative models by engaging
in sensorimotor exchanges with the world—and conspecifics—and learning the statistical
regularities of such interactions. Those interactions enable sensorimotor predictions that
shape and structure perception of the world—and other agents—and affords our causal
understanding of action and effects. Empirical and cognitive robotics studies have shown the
importance of active engagement and moving in the world, as a means of developing generative
models and specific forms of understanding within them [53]. We move from our inception and
it is possible that grounding perceptual concepts requires action [46,53] in the spirit of active
sensing and learning. It is by moving that we acquire representations of affordances, space,
object, scene, and a sense of self [11,53]. For example, various studies show that the

7
hippocampal formation and the entorhinal cortex develop spatial codes—and possibly codes
for more abstract conceptual spaces [54]—by path integrating self-motion information [55].
Likewise, studies of frontoparietal cortex suggest that it contains specialized circuits for
detecting affordances and using them to guide specific kinds of movements [56–58]. In living
organisms, these circuits (and others) support a form of core sensorimotor understanding of
reality that grounds our knowledge, our capabilities for abstract thought and our ability to
generalize to novel tasks, without the extensive re-training that is required in current AIs.

Passively receive
“ First go north 10m…”
data Selection of data to
“ I’ll go north 10m”
“…then go south 10m…”
“ I’m now 10m north” optimise beliefs
Optimise predictions “ If I go north “ If I go south
of data, including I will end up…” I will end up…”
self-generated data

More uncertain Less uncertain

Decision to go M ore to learn


Optimise weights so north going north
outputs predicted from
inputs Generative A I A ctive Inference

Figure 2. Parallels and points of divergence between how generative AI systems and biological
systems might learn. Left: cartoon of the pretraining process for generative AI systems, in which they are
passively presented with (large quantities) of data. The weights of the network are then optimised so that
their outputs are more probable given the inputs. State of the art models often include subsequent fine-
tuning in a (semi)supervised manner [59]; however, this still relies upon passive presentation of labelled
data or self-generated outputs paired with rewards. Right: in contrast, the generative models that
underwrite active inference [60] involve reciprocal interactions with the world. This means our current
beliefs about the world can be used to select those data that have “epistemic affordance”; i.e., are most
useful to resolve our uncertainty about the data-generating process. In the process of learning what it
means to go north or south, we may be more or less certain about the location we will end up in under each
of these actions (shown here with a relatively high confidence of ending up in the southern position if going
south, but more uncertainty in going north). By choosing to go north and observing we have gone north,
we are now in a better position to resolve our uncertainty, and optimise our predictions. Beliefs about the
causes of our data are an important part of this process of curiosity, exploration, or information seeking
[61]. However, these beliefs may easily be neglected in the process of function approximation used in
current Generative AIs, where all that matters is that the desired output.

By contrast, LLMs such as ChatGPT learn by passively ingesting large corpora and by
performing self-supervised tasks (e.g., predicting words). Other Generative AI systems use the

8
same approach, albeit with other data formats, such as pictures and sometimes robot sensor
data [4]. The ‘understanding’ of current Generative AIs are not action-based, but essentially
passive: they reflect statistical (rather than causal) regularities evidenced within large datasets
of curated data (text, images, code, videos, et cetera). Without the capability to actively select
their observations—and to make interventions during training—Generative AIs may be unable
to develop causal models of the contingencies between actions and effects; and of the
distinction between predictions and observations [62].

Without a core understanding of reality (or a “common sense”), current AI systems are brittle:
they can learn specific tasks, but often fail, when presented with close variants of the same
tasks, because they learn inessential features that do not generalize [63]. Technically, this kind
of overfitting reflects a focus on predictive accuracy, at the expense of model complexity (see
Box 3). This may limit the kinds of learning possible using LLM’s and Generative AI. This is a
matter of debate, as some believe that autonomous machine intelligence will emerge by
enriching and scaling internal models, letting them learn as much as possible from textual
knowledge or by passive video observation (see Box 1). However, this kind of ‘scaling up’ might
be intrinsically limited. For example, it has been shown that learning a context-sensitive
programming language is not possible using any finite set of exemplar code, and it is likely that
the challenge is even greater for inferring meaning from natural languages [64]. A more
promising path—to artificial general intelligence—combines real-world interactions with
sensorimotor predictions.

Given their different training regimes, Generative AIs and active inference agents have different
ways to determine what is salient, and what to attend to. In the transformer architectures used
in Generative AI, attention (or self-attention) refers to a mechanism that assigns greater or
lower weight to their (extremely long) inputs, therefore filtering them. In active inference,
attention encompasses both this filtering role (by varying the precision of predictions and
sensory information) and the active selection of salient data from the environment that
resolves uncertainty. Active inference systems can perform “experiments” and elicit
information expected to maximise information gain. This curiosity is ubiquitous in living
organisms but more challenging to obtain in passive learning systems [61].

Another potentially important aspect of natural intelligence is embodiment. Creatures acquire


their generative models under the selective (evolutionary) pressure of adaptive control that
serves metabolic needs and survival [28,65]. It has been speculated that this grounding
engenders our emotions; reflecting a sense of ‘mattering to me’ that structures and informs the
ways we process information [22,66,67] and that imbues our world models with meaning and
purpose. Active inference models this aspect of agency using the key construct of ‘interoceptive
prediction’ [68,69]. This provides a firm ground to evaluate the courses of actions that increase
or decrease an organism’s viability and ultimately, to determine what matters and what does
not. Importantly, interoceptive prediction, exteroceptive prediction, and proprioceptive
(action-guiding) prediction are all co-computed as living organisms go about the task of living.
In this way, active inference may naturally scale-up in ways that do not seem to have clear
analogues in the sessile, data-fed methods used by generative AI, in which learning and fine-
tuning are implemented sequentially [59].

9
A related point is that to maintain bodily viability and pursue their goals, living organisms
cannot passively wait for the next input, but need to proactively engage in purposeful (and
sometimes risky) interactions with the world. This requires generative models that ensure
behavioural flexibility in the face of constant careful trade-offs: for example, between
exploratory and exploitative behaviour, stay-or-leave decisions, et cetera. Furthermore,
generalisability requires generative models that are not just accurate but also parsimonious
(and thereby energy-efficient). Depending on the ecological niche, this trade-off might favour
sophisticated (e.g., temporally and hierarchically deep) generative models that encompass a
hierarchy of timescales in action and perception [65], versus minimalistic generative models
that afford accurate control without forming rich representations of the environment [70–72],
such as the generative models (c.f., central pattern generators) for action cycles in simple
organisms [73]. In active inference, the trade-offs between exploratory and exploitative
behaviour—and between efficiency and accuracy of generative models—are all gracefully
resolved by pursuing the imperative of free energy minimization (see Box 3). Solving these
trade-offs evinces flexible forms of control that balance the cost-benefits of low-to-high level
goals [74] and of habitual versus goal-directed policies [75]. Context sensitive, flexible control
of this kind is not yet enabled in Generative AI, where there is generally only one fixed form of
inference or “response”, with a fixed budget [63].

Finally, a key difference regards the phylogenetic trajectories (or training curricula) that living
organisms and Generative AIs follow. In advanced organisms like us, abstract thought and
linguistic knowledge are grounded in the circuits that supported sensorimotor predictions and
purposive control in our earlier evolutionary ancestors [23,28,76,77]. In other words, linguistic
abilities develop on top of grounded concepts, even if they can—to some extent—become
“detached” from the sensorimotor context [78]. Current Generative AIs are following an
“inverse phylogeny” that starts from acquiring knowledge directly from text, alone or with
other modalities. This approach is motivated by technological considerations, such as the
availability of large textual corpora and the effectiveness of transformer architectures on
textual learning and prediction. An interesting question arises here: will further scaling up of
Generative AI move in the opposite direction to natural intelligence and active inference—that
foregrounds statistical and thermodynamic efficiency.

A way forward?

Given the above discussion, one might ask: what are the most promising future directions for
Generative AI? One might imagine future developments along several lines. One axis is
continuum between simpler and more complex models. Here, the complexity reflects the
number of model parameters and their training data. A second issue is the type of inputs used
for training (e.g., textual, visual, multimodal), perhaps to exploit their complementarities and
synergies. A third axis is the addition of extra capabilities, as exemplified by Generative Agents
that engage in simulated dialogue in virtual environments and similar applications [79].A
fourth axis regards various training and “engagement” regimes that range from the passive
ingestion of curated data, versus the active selection of data through embodied and interactions

10
with the world (and others); and which include the pursuit of intrinsic (i.e., epistemic) goals
while learning about the world. (Note that the notions of action and interaction generalise
beyond the movements of the physical body; see the Outstanding Questions).

Current efforts to scale up Generative AIs focus on increasing complexity, but with little
emphasis on actively selecting their training corpus; i.e., selecting ‘smart’ data that optimises
active learning and inference. This may be a missed opportunity. The ‘meaningful anchoring’
characteristic of natural intelligence might rest on instantiating an (implicit) generative model
of the sensory consequences of its own actions; namely, the epistemic and instrumental
affordances implicit in an embodied interaction with the world [13]. The resulting ‘core
understanding’ of concepts such as effort, resistance, weight, inertia, and cause and effect, might
then be leveraged using essentially passive (LLM-style) resources trained on huge datasets to
deliver something closer to a [super]human understanding of the lived world, perhaps even
surpassing our capabilities for flexible behaviour and abstract thought. This approach would
therefore not simply recapitulate the ways living organisms evolve, but exploit the
unprecedented possibilities of Generative AIs to learn from large corpora. This synergy could
be better realized using an interaction-first, LLM-style-last method, as opposed to the current
trend in Generative AI.

Conclusion

A practical consideration—that inherits directly from an enactivist perspective—is the


distinction between Generative AI and generalised AI, which involves active inference and
learning. Both rest upon the implicit or explicit use of generative or world models [80–83].
However, generative AI is limited to generating content (images, code or text) of the sort that
we would generate given the same prompt or context. Conversely, active inference is in the
game of generating the causes of content in the service of action selection: a.k.a., planning as
inference [84–86]. This has several foundational implications. First, planning entails agency, in
the sense that only agents are equipped with a generative model of the consequences of action.
Second, it means that generative models have to be learned via exchange with a world that is
actionable. In short, generalised AI has to experience the consequences of its actions. This
provides agents with information that directly (and efficiently) reveals the causal structure of
the world, relative to information gleaned from a corpus of data that only indirectly and
implicitly reflect that structure. The implicit learning of affordances is fundamentally different
from learning the statistical regularities in data or content generated by others. Practically, this
means that generative AI is not necessarily the kind of technology that could be deployed in
autonomous robots or vehicles. Furthermore, because it has no notion of epistemic affordance,
it will not be apt for active learning or applications that rest upon artificial curiosity or insight
[83,87].

Despite these differences, the current wave of generative AIs can impact our ecosystems in
interesting ways. They do not simply throw our own understandings back at us (though they
do that, for obvious reasons). They also package and repackage those understandings and can,
with mixed results, suggest bridges between distant parts of the world-model we have

11
uploaded into our various data streams. This positions them to play a role in something we
think crucial but under-theorised—the way we humans repeatedly externalise our thoughts
and ideas, creating new structured objects for critical scrutiny [88]. Generative AI, by finding
faint and distant patterns—ones we may have missed in our own material trails and then
repackaging them according to arbitrary prompts—offer a golden opportunity to take this
distinctively human form of epistemic self-engineering to a whole new level; allowing us to
materialize and engage hitherto hidden aspects of our cumulative world-model.

It could be argued that generative AI is one of the most beautiful and important inventions of
the century—a 21st-century ‘mirror’ in which we can sometimes see ourselves in a new and
revealing light. However, when we look behind the mirror, there is nobody there.

Funding and acknowledgements:

This research received funding from the European Union’s Horizon 2020 Framework
Programme for Research and Innovation under the Specific Grant Agreements No. 945539
(Human Brain Project SGA3) to GP and KF and No. 952215 (TAILOR) to GP; the European
Research Council under the Grant Agreements No. 820213 (ThinkAhead) to GP and No. 951631
(XSCAPE: Material Minds) to AC; the Natural Sciences and Engineering Research Council of
Canada (RGPIN/05345) to PC, the Wellcome Centre for Human Neuroimaging (Ref:
205103/Z/16/Z) to KF, a Canada-UK Artificial Intelligence Initiative (Ref: ES/T01279X/1) to
KF, and the PNRR MUR project PE0000013-FAIR to GP. The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.

The authors declare no conflicts of interest

12
Glossary:

Generative model: A probabilistic model that describes how observable content is generated
from unobservable causes (e.g., how can object generates an image on the retina). It can be used
to generate novel, synthetic content.

Latent (or hidden) variable: An internal variable of a generative model. It is called “latent” or
“hidden” because it cannot be observed but has to be inferred.

Unsupervised (or self-supervised) learning: An algorithmic procedure usually adopted to


learn generative models, without the need for supervision, annotated data or reward. A typical
self-supervised task used in LLMs is learning to predict the next word in a sentence.

Generative AIs: Here we use “Generative AIs” to refer to AI systems that use large-scale
generative models to process and generate various kinds of information, such as textual
information (e.g., LLMs) and multimodal (e.g., text and images) information.

Large Language Models (LLMs): Generative models that generate natural (textual) language,
usually via self-supervised learning. Some famous examples are BERT and GPT.

Predictive processing: A theoretical approach to the study of living organisms and cognition,
based on the idea that the brain is fundamentally a “prediction machine”.

Active Inference: A normative framework that describes the neural and cognitive processes of
sentient behaviour, from first principles.

Intervention: In the field of causal inference, it refers to the purposeful modification of the
state of the world (e.g., by acting) to disclose its causal structure.

Embedding: In machine learning, embedding denotes a low-dimensional representation of a


discrete variable (e.g., a continuous vector representing a word or another token). An appealing
feature of embeddings used in state-of-the-art models (e.g., LLMs) is that items having similar
meaning are close in embedding space.

Reinforcement Learning from human feedback: A methodology used to align the outputs of
LLMs (e.g., ChatGPT) to human preferences, using feedback from human raters.

Precision-weighting: A mechanism in active inference that determines how much weight or


impact sensory observations have on belief updating. Precision is the inverse of the variance or
standard error.

Foundation models: Large-scale generative models (e.g., language or multimodal models) that
are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Autoregressive model: A statistical model that predicts future data (e.g., words) based on past
data (e.g., previous inputs and ensuing predictions).

World model: An internal model of the world and its dynamics (in this context, we use it
synonymously with generative model, but it could be used in different ways)

13
Transformer architecture: A machine learning architecture that is particularly effective in
training LLMs and other Generative AI. One of its peculiarities is using an attention (or self-
attention) mechanism to give a greater weight to the most relevant inputs, when predicting
outputs.

Outstanding Questions

Generative AI is a powerful technology that has caught the attention of the general public.
Providing an accurate assessment of their capabilities and true understanding is desirable not
just for scientific reasons, but also for society. How do we provide such assessment and answer
people’s questions? How do we avoid the “Eliza effect”, or the tendency to anthropomorphise
the behaviour of advanced AI systems?

Much of the debate about whether LLMs “understand” is based on our intuitions about the way
we (humans) understand. Do we need a more nuanced notion of “understanding” that goes
beyond classical all-or-nothing dichotomies and speaks to the diverse capabilities of various
living organisms and artificial systems?

We suggested the importance of agent-environment interaction to bootstrap meaning, which


could form a kind of grounding that all other cognitive abilities (e.g., linguistic learning) can
build upon. Yet, it is still unclear how much of this meaning-generating interaction is necessary;
is the role of agent-environment interaction smaller than usually suspected by embodied
cognition theories, given how much information has been uploaded into the word matrix
already?

We suggested that living organisms acquire a sense of “mattering”, because they learn
generative models under the selective pressure to satisfy metabolic needs and remain within
viable states. Is it possible that processes that align Generative AIs with human values (e.g.,
reinforcement learning though human feedback) also imbue them with a form of “mattering”—
and of “prior preferences” similar to those of active inference systems?

We suggested that generative AIs follow an “inverse phylogeny”: they are trained using selected
corpora, such as textual corpora, imbued with meaning (for us) in virtue of the fact that it was
produced by human exchanges. Is this sufficient to acquire meaning and a causal understanding
of reality, even without the initial grounding in sensorimotor interaction in living organisms?

Embodied and action-based theories of cognition assign importance to perception-action loops


to ground knowledge. However, action is not just physical movement. There can be all manner
of actions, including outputting words, which also have meaningful consequences. What kinds
of actions are relevant for autonomous systems to acquire a grounded understanding of reality?

14
Boxes

Box 1. The debate around Large Language Models (LLMs) and other Generative AIs

There has been a general scepticism that LLMs command any kind of deep understanding of
reality. Such scepticism is often rooted in the unhappy experiences that some have had when
questioning such systems on complex topics, in which they are already expert. In addition, LLMs
struggle with causal reasoning and multi-step compositional reasoning [89] and sometimes
“hallucinate” rather than reporting factual information and show “self-delusions” (i.e., they take
their predictions as evidence that the predicted circumstance is true [62]; c.f., circular inference
in psychosis [90]). This suggests a lack of causal understanding of actions and that the apparent
meaningfulness of dialogues with LLMs might come from the ease with which we project our
mental states and agency into these systems [49,91].

Another related kind of scepticism is rooted in their apparent ‘disembodiment’ and lack of true
causal connection to the world about which they so fluently speak [52,92]. An LLM might write
movingly about the experience of eating a new breakfast cereal, but no LLM has ever eaten
anything. Furthermore, some of the most advanced LLMs only show a limited sensitivity to
affordances compared to humans [93]. The lack of anchoring on embodied reality motivates
novel foundation models for embodied intelligence that include multiple modalities [63,94] or
that mimic the visual cortex, rather than starting from language [95].

On the other hand, it has also been claimed that foundation models like LLMs show some form
of general intelligence [96,97] and have surprising emergent properties [5]. For example, they
can generate meaningful answers to university problems [98] and textual descriptions of
moves on a chessboard [99]. Although they are only trained with textual input, it has been
claimed that they nevertheless develop models of the shape and causal structure of non-
linguistic reality, including implicit models of entities mentioned in a discourse [100] and of
things like space and direction [101], colour [102], and theory of mind [103] (but see [104]).
Furthermore, they can be used to generate robot plans, even without or with little visual
information [105]. This might be because Generative AI systems are trained to extract
statistical regularities from their inputs and the regularities of texts and images implicitly distil
regularities in our lived world. Under this reading, multimodal information and embodiment
would not be necessary preconditions to learn about the causal structure of the world.
Linguistic training could provide the same understanding. Support for this view comes from
the failure of visual-and-language models (so far) to improve upon purely linguistic models in
acquiring useful semantic information [106]. Furthermore, another stream of research
suggests that LLMs encode conceptual information structurally similarly to vision-based
models, where the structural similarity means that word embeddings and image embeddings
self-organize and cluster in the same way in the latent spaces of their respective (language or
vision-based) models [107]. The ability of ChatGPT and similar models to engage in meaningful
conversation suggests that LLMs might acquire some pragmatic ability for dialogue—and some
alignment with human values—through human interaction and a fine-tuning procedure called
reinforcement learning from human feedback [59].

15
Box 2. Word-world: A Thought Experiment

Imagine an alien lifeform whose only contact with some underlying reality is via a huge stream
of words: items that bear real but complex and sometimes unreliable relations to that hidden
reality. The hidden reality is our human world populated with cats, pastors, economic
depressions, LLM’s, elections and more. Think of this being’s access to the stream of words as
itself a kind of modality, a sensory channel. During its youth, our alien being (let’s call it Wordy)
found itself driven to try to predict the next item in that sensory stream, inferring underlying
patterns that enabled it to do that job surprisingly well. This was good for Wordy’s survival.

Wordy has but a single sensory channel. Still, that single channel bears rich indirect traces of
our own much more varied forms of sensory access. Wordy is, however, oddly separate from
its own underlying world. When we humans act in our world, we are regulated by the very
world we are attempting to describe and engage. When I try to pick up the cup I see in front of
me, there is a possibility that I will fail. The true location of the cup in space relative to me, an
embodied organism, constantly holds my visuo-motor action routine to account. Other humans
also hold me to account, and there too learning from my mistakes is possible. We are constantly
answerable to a web of regulative interactions that anchor us, both individually and collectively,
to the world.

Wordy, by contrast, is only very indirectly regulated by the world the texts it was trained on
happen to describe. It is held to account only by a successor relation defined over words. Even
within its own domain of action (outputting more words) Wordy was never in the business of
needing to estimate the consequences of its actions. Nor could it learn from failures to correctly
estimate those consequences, or select actions designed to test or improve its own state of
information.

This lack of anchoring marks a real difference both from biological organisms and the active
inference systems discussed in the text. Could we alter Wordy’s survival niche in some way that
remedies this shortfall? Perhaps. But as things stand, the kind of generative model that Wordy
commands remains quite unlike that of an embodied organism whose actions constantly
expose them to the very world they are attempting to model.

16
Box 3. Trade-offs in active inference between complexity and accuracy and between
exploration and exploitation

The imperative to maximise the evidence (a.k.a. marginal likelihood) for generative (a.k.a.,
world) models of how observations are caused has been an essential feature of recent trends
in theoretical neurobiology, machine learning and artificial intelligence. Evidence-
maximisation explains both sense-making and decision-making in self-organising systems,
from cells [108] to cultures [109]. This imperative can be expressed as minimising an evidence
bound called variational free energy [110] that comprises complexity and accuracy:

Free energy = model complexity – model accuracy

Accuracy measures goodness of fit, while complexity measures the divergence between prior
beliefs (before seeing outcomes) and posterior beliefs (afterwards). More intuitively,
complexity scores the information gain or (informational and thermodynamic) cost of changing
one's mind. This means evidence-maximisation is about finding an accurate explanation that is
minimally complex (c.f., Occam’s principle). Importantly, in the context of generative and
generalised AI, it implies optimizing generative models so that they explain data more
parsimoniously, with fewer parameters [87].

In an enactive setting—apt for explaining decision-making—beliefs about ‘which plan to


commit to’ are based on the expected free energy under a plausible plan. This implicit planning
as inference can be expressed as minimising expected free energy [10,111]:

Expected free energy = risk (expected complexity) + ambiguity (expected inaccuracy)

Risk is the divergence between probabilistic predictions about outcomes, given a plan, relative
to prior preferences. Ambiguity is the expected inaccuracy. An alternative decomposition is:

Expected free energy = expected cost - expected information gain

The expected information gain underlies the principles of optimal Bayesian design [61] while
expected cost underlies Bayesian decision theory [112]. In short, active inference appeals to
two kinds of Bayes optimality and subsumes information- and preference-seeking behaviour
under a single objective.

17
References

1. Introducing ChatGPT[Online]. Available: https://openai.com/blog/chatgpt. [Accessed: 15-


Mar-2023]
2. DALL·E 2[Online]. Available: https://openai.com/product/dall-e-2. [Accessed: 15-Mar-
2023]
3. Alayrac, J.-B. et al. (2022) Flamingo: a Visual Language Model for Few-Shot
LearningarXiv
4. Driess, D. et al. (2023) PaLM-E: An Embodied Multimodal Language Model. ArXiv Prepr.
ArXiv230303378
5. Bommasani, R. et al. (2022) On the Opportunities and Risks of Foundation ModelsarXiv
6. Searle, J.R. (1980) Minds, brains, and programs. Behav. Brain Sci. 3, 417–424
7. Srivastava, A. et al. (2022) Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language modelsarXiv
8. Binz, M. and Schulz, E. (2023) Using cognitive psychology to understand GPT-3. Proc.
Natl. Acad. Sci. 120, e2218523120
9. Lampinen, A.K. et al. (2022) Can language models learn from explanations in
context?arXiv
10. Parr, T. et al. (2022) Active Inference: The Free Energy Principle in Mind, Brain, and
Behavior, MIT Press
11. Clark, A. (2015) Surfing Uncertainty: Prediction, Action, and the Embodied Mind, Oxford
University Press, Incorporated
12. Cannon, W.B. (1929) Organization for Physiological Homeostasis. Physiol. Rev. 9, 399–431
13. Pezzulo, G. and Cisek, P. (2016) Navigating the Affordance Landscape: Feedback Control
as a Process Model of Behavior and Cognition. Trends Cogn. Sci. 20, 414–424
14. Dewey, J. (1896) The Reflex Arc Concept in Psychology. Psychol. Rev. 3, 357–370
15. Merleau-Ponty, M. (1945) PhÈnomÈnologie de la perception, Gallimard
16. Clark, A. (1998) Embodied, situated, and distributed cognition. In (Bechtel, W. and Graham,
G., eds), pp. 506–517, Malden, MA: Blackwell
17. Piaget, J. (1954) The Construction of Reality in the Child, Ballentine
18. Gibson, J.J. (1979) The ecological approach to visual perception, Lawrence Erlbaum
Associates, Inc
19. Powers, W.T. (1973) Behavior: The Control of Perception, Aldine
20. Barsalou, L.W. (2008) Grounded cognition. Annu. Rev. Psychol. 59, 617–645
21. Glenberg, A.M. (2010) Embodiment as a Unifying Perspective for Psychology. Wiley
Interdiscip. Rev. 1, 586–596
22. Quigley, K.S. et al. (2021) Functions of Interoception: From Energy Regulation to
Experience of the Self. Trends Neurosci. 44, 29–38
23. Cisek, P. and Kalaska, J.F. (2010) Neural mechanisms for interacting with a world full of
action choices. Annu Rev Neurosci 33, 269–298
24. Wiener, N. (1948) Cybernetics: or Control and Communication in the Animal and the
Machine, The MIT Press
25. Ashby, W.R. (1952) Design for a brain, ix, Wiley
26. Brooks, R.A. (1991) Intelligence without representation. Artif. Intell. 47, 139–159
27. Gibson, J.J. (1977) Perceiving, acting, and knowing: Toward an ecological psychology.
Theory Affordances
28. Cisek, P. (2019) Resynthesizing behavior through phylogenetic refinement. Atten. Percept.
Psychophys. DOI: 10.3758/s13414-019-01760-1
29. Conant, R.C. and Ashby, W.R. (1970) Every good regulator of a system must be a model of
that system. Intl J Syst. Sci.
30. Hohwy, J. (2013) The predictive mind, Oxford University Press

18
31. Brown, T. et al. (2020) Language Models are Few-Shot Learners. in Advances in Neural
Information Processing Systems, 33, pp. 1877–1901
32. Vaswani, A. et al. (2017) Attention Is All You NeedarXiv
33. Liu, J. et al. (2021) What Makes Good In-Context Examples for GPT-$3$?arXiv
34. Chambon, P. et al. (2022) Adapting Pretrained Vision-Language Foundational Models to
Medical Imaging DomainsarXiv
35. Team, M.N. (2023) Introducing MPT-7B: A New Standard for Open-Source, Commercially
Usable LLMs[Online]. Available: www.mosaicml.com/blog/mpt-7b
36. Pezzulo, G. et al. (2021) The secret life of predictive brains: what’s spontaneous activity
for? Trends Cogn. Sci.
37. Pezzulo, G. et al. (2019) Planning at decision time and in the background during spatial
navigation. Curr. Opin. Behav. Sci. 29, 69–76
38. Hinton, G.E. et al. (1995) The “wake-sleep” algorithm for unsupervised neural networks.
Science 268, 1158–1161
39. Singer, W. (2021) Recurrent dynamics in the cerebral cortex: Integration of sensory
evidence with stored knowledge. Proc. Natl. Acad. Sci. 118, e2101043118
40. Parr, T. and Pezzulo, G. (2021) Understanding, explanation, and active inference. Front.
Syst. Neurosci. 15, 772641
41. Bastos, A.M. et al. (2012) Canonical microcircuits for predictive coding. Neuron 76, 695–
711
42. Borghi, A.M. et al. (2019) Words as social tools: Language, sociality and inner grounding in
abstract concepts. Phys. Life Rev. 29, 120–153
43. Harnad, S. (1990) The Symbol Grounding Problem. Phys. Nonlinear Phenom. 42, 335–346
44. Pezzulo, G. (2011) The “Interaction Engine”: a Common Pragmatic Competence across
Linguistic and Non-Linguistic Interactions. IEEE Trans. Auton. Ment. Dev.
45. Levinson, S.C. (2006) On the Human “Interaction Engine.” In Roots of Human Sociality,
Routledge
46. Cisek, P. (1999) Beyond the computer metaphor: Behaviour as interaction. J. Conscious.
Stud. 6, 11–12
47. Sugita, Y. and Tani, J. (2005) Learning Semantic Combinatoriality from the Interaction
between Linguistic and Behavioral Processes. Adapt. Behav. 13, 33–52
48. Steels, L. (1995) A self-organizing spatial vocabulary. Artif. Life 2, 319–332
49. Bender, E.M. et al. (2021) On the Dangers of Stochastic Parrots: Can Language Models Be
Too Big? in Proceedings of the 2021 ACM conference on fairness, accountability, and
transparency, pp. 610–623
50. Yamins, D.L.K. and DiCarlo, J.J. (2016) Using goal-driven deep learning models to
understand sensory cortex. Nat. Neurosci. 19, 356–365
51. Bowers, J.S. et al. (2022) Deep Problems with Neural Network Models of Human Vision.
Behav. Brain Sci. DOI: 10.1017/S0140525X22002813
52. Zador, A. et al. (2023) Catalyzing next-generation Artificial Intelligence through NeuroAI.
Nat. Commun. 14, 1597
53. Buzsaki, G. (2019) The brain from inside out, Oxford University Press, USA
54. Theves, S. et al. (2020) The Hippocampus Maps Concept Space, Not Feature Space. J.
Neurosci. 40, 7318–7325
55. McNaughton, B.L. et al. (2006) Path integration and the neural basis of the “cognitive map.”
Nat. Rev. Neurosci. 7, 663–678
56. Cisek, P. (2007) Cortical mechanisms of action selection: the affordance competition
hypothesis. Phil Trans R Soc B 362, 1585–1599
57. Fadiga, L. et al. (2000) Visuomotor neurons: ambiguity of the discharge or ëmotorí
perception? Int J Psychophysiol 35, 165–177
58. Graziano, M.S. (2016) Ethological Action Maps: A Paradigm Shift for the Motor Cortex.

19
Trends Cogn. Sci. 20, 121–132
59. Ouyang, L. et al. (2022) Training language models to follow instructions with human
feedbackarXiv
60. Mirza, M.B. et al. (2018) Human visual exploration reduces uncertainty about the sensed
world. PLOS ONE 13, e0190429
61. Lindley, D.V. (1956) On a measure of the information provided by an experiment. Ann.
Math. Stat. 27, 986–1005
62. Ortega, P.A. et al. (2021) Shaking the foundations: delusions in sequence models for
interaction and controlarXiv
63. LeCun, Y. (2022) A path towards autonomous machine intelligence version 0.9. 2, 2022-06-
27. Open Rev. 62
64. Merrill, W. (2021) Sequential Neural Networks as AutomataarXiv
65. Pezzulo, G. et al. (2015) Active Inference, homeostatic regulation and adaptive behavioural
control. Prog. Neurobiol. 136, 17–35
66. Clark, A. (2019) Consciousness as Generative Entanglement. J. Philos. 116, 645–662
67. Seth, A.K. (2013) Interoceptive inference, emotion, and the embodied self. Trends Cogn.
Sci. 17, 565–573
68. Seth, A.K. and Friston, K.J. (2016) Active interoceptive inference and the emotional brain.
Phil Trans R Soc B 371, 20160007
69. Barrett, L.F. and Simmons, W.K. (2015) Interoceptive predictions in the brain. Nat. Rev.
Neurosci. 16, 419–429
70. Tschantz, A. et al. (2020) Learning action-oriented models through active inference. PLOS
Comput. Biol. 16, e1007805
71. Mannella, F. et al. (2021) Active inference through whiskers. Neural Netw. Off. J. Int.
Neural Netw. Soc. 144, 428–437
72. Pezzulo, G. et al. (2017) Model-Based Approaches to Active Perception and Control.
Entropy 19, 266
73. Kato, S. et al. (2015) Global brain dynamics embed the motor command sequence of
Caenorhabditis elegans. Cell 163, 656–669
74. Pezzulo, G. et al. (2018) Hierarchical Active Inference: A Theory of Motivated Control.
Trends Cogn. Sci. 0
75. Parr, T. et al. (2023) Cognitive effort and active inference. Neuropsychologia 184, 108562
76. Pezzulo, G. and Castelfranchi, C. (2009) Thinking as the Control of Imagination: a
Conceptual Framework for Goal-Directed Systems. Psychol. Res. 73, 559–577
77. Pezzulo, G. et al. (2021) The evolution of brain architectures for predictive coding and
Active InferencePhilosophical Transactions of the Royal Society B: Biological Sciences
78. Pezzulo, G. and Castelfranchi, C. (2007) The Symbol Detachment Problem. Cogn. Process.
8, 115–131
79. Park, J.S. et al. (2023) Generative Agents: Interactive Simulacra of Human BehaviorarXiv
80. Dayan, P. et al. (1995) The Helmholtz machine. Neural Comput. 7, 889–904
81. Parr, T. and Friston, K.J. (2018) The Anatomy of Inference: Generative Models and Brain
Structure. Front. Comput. Neurosci. 12
82. Taniguchi, T. et al. (2023) World Models and Predictive Coding for Cognitive and
Developmental Robotics: Frontiers and ChallengesarXiv
83. Schmidhuber, J. (2006) Developmental robotics, optimal artificial curiosity, creativity,
music, and the fine arts. Connect. Sci. 18, 173–187
84. Attias, H. (2003) Planning by Probabilistic Inference. in Proceedings of the Ninth
International Workshop on Artificial Intelligence and Statistics
85. Botvinick, M. and Toussaint, M. (2012) Planning as inference. Trends Cogn. Sci. 16, 485–
488
86. Lanillos, P. et al. (2021) Active Inference in Robotics and Artificial Agents: Survey and

20
ChallengesarXiv
87. Friston, K.J. et al. (2017) Active Inference, Curiosity and Insight. Neural Comput. 29,
2633–2683
88. Clark, A. (2001) Mindware: An Introduction to the Philosophy of Cognitive Science, Oxford
University Press
89. Dziri, N. et al. (2023) Faith and Fate: Limits of Transformers on CompositionalityarXiv
90. Jardri, R. and Denève, S. (2013) Circular inferences in schizophrenia. Brain J. Neurol. 136,
3227–3241
91. Sejnowski, T.J. (2023) Large language models and the reverse turing test. Neural Comput.
35, 309–342
92. Aru, J. et al. (2023) The feasibility of artificial consciousness through the lens of
neurosciencearXiv
93. Jones, C.R. et al. (2022) Distrubutional Semantics Still Can’t Account for Affordances.
Proc. Annu. Meet. Cogn. Sci. Soc. 44
94. Huang, S. et al. (2023) Language Is Not All You Need: Aligning Perception with Language
ModelsarXiv
95. Majumdar, A. et al. (2023) Where are we in the search for an Artificial Visual Cortex for
Embodied Intelligence? ArXiv Prepr. ArXiv230318240
96. Bubeck, S. et al. (2023) Sparks of Artificial General Intelligence: Early experiments with
GPT-4arXiv
97. Chalmers, D. (2020) Gpt-3 and general intelligenceDaily Nous
98. Katz, D.M. et al. (2023) GPT-4 Passes the Bar Exam
99. Noever, D. et al. (2020) The Chess Transformer: Mastering Play using Generative Language
ModelsarXiv
100. Li, B.Z. et al. (2021) Implicit Representations of Meaning in Neural Language ModelsarXiv
101. Patel, R. and Pavlick, E. (2022) Mapping language models to grounded conceptual spaces.
in International Conference on Learning Representations
102. Abdou, M. et al. (2021) Can language models encode perceptual structure without
grounding? a case study in color. ArXiv Prepr. ArXiv210906129
103. Kosinski, M. (2023) Theory of mind may have spontaneously emerged in large language
models. ArXiv Prepr. ArXiv230202083
104. Shapira, N. et al. (2023) Clever Hans or Neural Theory of Mind? Stress Testing Social
Reasoning in Large Language ModelsarXiv
105. Jansen, P. (2020) Visually-Grounded Planning without Vision: Language Models Infer
Detailed Plans from High-level Instructions. in Findings of the Association for Computational
Linguistics: EMNLP 2020, Online, pp. 4412–4417
106. Yun, T. et al. (2021) Does Vision-and-Language Pretraining Improve Lexical
Grounding?arXiv
107. Merullo, J. et al. (2023) Linearly Mapping from Image to Text SpacearXiv
108. Friston, K. et al. (2015) Knowing one’s place: a free-energy approach to pattern regulation.
J R Soc Interface 12
109. Veissière, S.P. et al. (2020) Thinking through other minds: A variational approach to
cognition and culture. Behav. Brain Sci. 43
110. Winn, J. et al. (2005) Variational message passing. J. Mach. Learn. Res. 6
111. Friston, K. (2010) The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11,
127–38
112. Berger, J.O. (2013) Statistical decision theory and Bayesian analysis, Springer Science &
Business Media

21

You might also like