# Embodying the Avatar in Videogames

Videogames are a pervasive part of lives of children and adults alike, with 73%
of Americans older than 2 years engaging with them {% cite npd2019videogames %}.
Playing videogames can be seen as an activity that is done through our
fingertips and with our visual apparatus focused on a screen, without
involvement of the rest of our body, and it is usually considered as such from a
cognitivist point of view {% cite campbell2012video %} {% cite gee2003video %}
{% cite klimmt2006effectance %} however this raises the question of whether
videogames can alternatively be thought of as an embodied experience, and if so,
how can we formulate them as such, and what factors are at play?

Virtual reality videogames are more commonly studied from an embodied
perspective, since they lend themselves to the framework more easily by being
more engaging to the whole body and by the fact of their immersive experience,
however the same question can be asked for non-virtual reality games, with
keyboard and mouse or the controller, and the screen.
  
We will first talk about what do we mean by embodiment when we say playing
videogames is an embodied experience, and this is a very important part of our
discourse. We then continue to talk about what motivates us to think that
videogames fit such notions of embodied experience, and from there we further
ask questions about the factors at play, including, but not limited to, camera
control and perspective and its relationship with peripersonal space and the
social aspect of videogames.

# Did Somebody Say Embodiment?

The question of whether we can think of playing videogames as an embodied
experience is quite puzzling, and it requires unraveling questions that are
unanswered about what embodiment means, how do we distinguish it from else, and
how does something like playing videogames fit into this picture. There are
different accounts of embodiment, and they stand in contrast to cognitive
psychogolical accounts. Cognitive psychology accounts study mental processes,
which are usually associated with the brain, and where the body is thought of as
an input and output interface with the world that is controlled by the brain. {%
cite neisser2014cognitive %} {% cite anderson1980cognitive %}. There are
numerous accounts of embodied experience, we will review some of them and lay
out our understanding of embodiment, one which allows us to discuss videogames
in its light.

{% cite thelen2000grounded %} gives an account that focuses mostly on the fact
that our experiences arise because we have a particular kind of body with
particular capacities and apparatus that lead to us experiencing the world as we
do. This might be one of the most high-level accounts that shares a considerable
amount with most other embodiment accounts:

> "\[T\]o say that cognition is embodied means that it arises from > bodily
interactions with the world, from this point of view, cognition > depends on the
kinds of experiences that come from having a body with > particular perceptual
and motor capacities that are inseparably linked > and that together form the
matrix within which memory, emotion, > language, and all other aspects of live
are meshed.\"

With this account, it is necessary to consider the body as a constitutive part
of cognition, not merely an input/output system controlled by the brain.
Questions about cognition only make sense with consideration of the way we
interact with the world with our bodies.

Merleau-Ponty's phenomenological account unifies the body and the mind and
instead of talking about them separately, he proposes talking about an
intentional, lived body, that is continuously adapting to the world through
formation of habits:

> The body's orientation toward the world is essentially temporal, > involving a
dialectic between the present body (characterized, after > Husserl, as an "I
can") and the habit body, the sedimentations of past > activities that take on a
general, anonymous, and autonomous > character. \[..\] it has affective
experiences that are not merely > representations; and its kinesthetic sense of
its own movements is > given directly. > > This kinesthetic awareness is made
possible by a pre-conscious system > of bodily movements and spatial
equivalences that Merleau-Ponty terms > the "body schema". In contrast with the
"positional spatiality" of > things, the body has a "situational spatiality"
that is oriented > toward actual or possible tasks. The body's existence as >
"being-toward-the-world", as a projection toward lived goals, is > therefore
expressed through its spatiality, which forms the background > against which
objective space is constituted. \[..\] > > The body's relationship with space is
therefore intentional, although > as an "I can" rather than an "I think"; bodily
space is a > multi-layered manner of relating to things, so that the body is not
> "in" space but lives or inhabits it. {% cite sep-merleau-ponty %}

Merleau-Ponty's account requires substantial consideration when we talk about
embodiment in video-games, since his terminology and framework make it easier to
express what we are trying to affirm in this report. When we talk about
embodiment, we are using Merleau-Ponty's framework, along with anecdotes and
inspirations from other frameworks which we will mention.

Let's consider one of the most important pillars of this account: our existence
in the world is intentional, and our body, with all of its habits and its
capabilities, shapes our intentional stance towards the world, since it is our
body that limits our "I can\" from an endless list of possibilities down to the
way we live right now. Cognition need not be thought of as perceiving, thinking
(or processing), and then acting, but rather, we live in direct interaction with
the world, and perception and thinking and acting are no longer separated, no
longer representational, but through our long-formed habits, our spatial
presence and a body schema that shapes our capabilities towards the world around
us, the world appears to us directly with meanings and values.

The body schema and our ability to morph this body schema through our
interactions with tools and in different contexts is vital to our discourse.
Merleau-Ponty's account allows for our body schema, which is what shapes our
intentional stance towards the world, to be changed as we incorporate tools and
certain environments into our lives. His famous example of a blind man's stick
is worth mentioning:

> "When the cane becomes a familiar instrument, the world of tactile > objects
expands, it no longer begins at the skin of the hand, but at > the tip of the
cane. > > \[..\] the cane is no longer an object that the blind man would >
perceive, it has become an instrument with which he perceives. It is > an
appendage of the body, or an extension of the bodily synthesis.\" > {% cite
merleau1962phenomenology %}

Andy Clark gives a similar account when talking about our embodied experience of
using virtual reality headsets:

> The infant, like the VR-exploring adult, must learn how to use > internally
unresponsive hands, arms and legs to obtain its goals. > > \[..\] > > With time
and practice, enough bodily fluency is achieved to make the > wider world itself
directly available as a kind of unmediated arena > for embodied action. At this
point, the extrabodily world becomes > poised to present itself to the user not
just as a problem space > (though it is clearly that) but as a problem-solving
resource. For the > world, specially when encountered via inhabited interaction,
is a > place in which we can act fluently in ways that simplify or transform >
the problems that we want to solve. At such moments, the body has > become
"transparent equipment\": equipment that is not the focus of > attention in use.
Instead the user "sees through\" the equipment to > the task in hand. When you
sign your name, the pen is not normally > your focus. The pen in use is no more
the focus of your attention than > is the hand that grips it. Both are
transparent equipment. > {% cite clark2008supersizing %}

To summarise what we mean by embodiment as we talk about it here:
	
1. Cognition depends on our body as a whole, and our experiences that arise are
   specifically tailored by our body and its particular features.

2. Our body has an intentional stance towards the world, and this intentional
   stance is dependent on our habits, and is limited by the capacities of our
   body.

3. The "body schema" is what allows for our pre-conscious kinaesthetic awareness
   of our body in a "situational" sense, oriented towards possible tasks.

# Videogaming as an Embodied Activity

This is not a simple question, and our discussion here is not to be taken as
granted, of course. There are many complexities involved with attributing
something as complex as "embodiment" to an activity as complex as playing
videogames. This is a puzzling notion, but nevertheless, it is worth considering
and thought.
	
Given the framework described, we can now formulate videogaming as an embodied
activity. A more trivial example of what we are trying to formulate is driving
cars, which is a common example used when talking about embodiment in cognitive
science. When we drive a car as a proficient driver, we manoeuvre by considering
what we want to do, and acting towards that direct goal without focusing on how
we do this by using the gears, the clutch, the brake, pedal and the wheel, etc.
We might be taking 3 to 4 actions at the same time, e.g. when reverse turning:
brake in, clutch in, wheel to one side, change gears to reverse, look in the
mirrors, however we are mostly thinking about where we want to go, not all the
details and specifics of our interactions with the car's interface. Similar to
the example of the blind man and the stick, the apparatus has become transparent
and now our body schema includes the car. We decide we want to reverse and turn
to one side, and given our new intentional stance towards the world that is
limited and extended using the car, we consider our self to be capable of doing
so. The way we question and talk about the world changes, too, we ask "do I fit
here?", wondering if we can pass through a narrow passage with the car, we are
now embodying a new intentional stance towards the world, and this new body
schema is what gets attribution for our action.

Videogames are similar, with the difference that instead of sitting inside a car
that moves spatially in the world, our human body sits in one place, but we
still go places in the game-world. A proficient gamer is not concerned with the
buttons they press or how they move the mouse, for example, they are directly
concerned with what they do in the game-world. A new intentional stance arises
towards the game-world, that is defined by the avatar that we embody in the
videogame.

Our body schema is now extended to include the avatar in the game-world, and
this new body-schema limits what our human body does (just like in driving a car
where some of our body is not actively used towards our goals), *we* now want to
*climb* things with our new intentional body, and *shoot* the monsters and we
feel real feelings of anxiety and stress (and we may even sweat) when we are
playing a stealth game and we are in hiding. *We* are afraid of being found out,
and when we are getting hit by enemies or falling from a height, our human body
tenses, and we sometimes even get the feeling of falling dropping in our stomach
(this can depend a lot on the camera of the video-game, which we will talk
about). When playing a car or motorcycle racing game, our human body inevitably
leans in as we are turning in the game-world. video-games have structured
worlds, with certain rules that make them predictable enough to an experienced
player, much like the real world, this can lead to us believing that we have
control over the world and we can take guided actions towards certain ends. A
high correspondence between our interactions with the interface that connects us
to our extended body in the game-world (e.g. the game controller, or the
keyboard and the mouse) and visual and proprioceptive feedback might be the key
to creating a strong sense of ownership of actions. {% cite martin1995bodily %}
{% cite tsakiris2005experimenting %}

Besides the notion of embodiment that we have been discussing so far, there are
other kinds of embodiment. Social embodiment seems to be a slightly more
ambiguous and challenging notion that must be considered with care, but consider
{% cite barsalou2003social %}'s account of social embodiment effects:

> "First, perceived social stimuli do not just produce cognitive states, > they
produce bodily states as well. Second, perceiving bodily states > in others
produces bodily mimicry in the self. Third, bodily states in > the self produce
affective states. Fourth, the compatibility of bodily > states and cognitive
states modulates performance effectiveness\"

Real-time online video-games can exhibit similar effects, I may walk with my
avatar towards a friend's avatar in the gameworld and wave my hand, leading to
them waving their hand, and as I start walking away, they might follow me and we
may start an activity together without need for verbal or text communication,
but rather only by the effect of our avatar's state of body. We have learned the
affordances of our new environment and our new extended body, and that of our
fellow players.

# Camera, Avatar and Controller Relations

The camera-avatar relationship and the input interface are important factors to
be considered when asking questions about embodiment of the experience, so it is
necessary to consider these factors more explicitly.

<figure id="fig:first-person" class="row">

  <img alt="First-person view" src="/img/embodying-the-avatar/first-person.jpg"
  width="30%"/>

  <img alt="Third-person view" src="/img/embodying-the-avatar/third-person.jpg"
  width="30%"/>

  <img alt="Isometric view" src="/img/embodying-the-avatar/diablo-view.jpg"
  width="30%"/>
  
  <div class="break"></div>

  <figcaption>Different video-game camera modes. From left to right:
  First-person view, Third-person view and Isometric view.</figcaption>
  
</figure>

<figure id="fig:dota" class="row">

  <img alt="Camera's independence from the avatar in Dota2"
  src="/img/embodying-the-avatar/dota-1.png" width="30%" />

  <img alt="Camera's independence from the avatar in Dota2"
  src="/img/embodying-the-avatar/dota-2.png" width="30%" />

  <img alt="Camera's independence from the avatar in Dota2"
  src="/img/embodying-the-avatar/dota-3.png" width="30%" />

  <div class="break"></div>

  <figcaption>Camera's independence from the avatar in Dota 2.</figcaption>
  
</figure>

Most research around this subject seems to focus on a First-Person view, where
the player is looking out through the avatar's eyes or head, only able to see
the avatar's arms most of the time. This is the view adopted almost exclusively
by all Virtual Reality games and many shooter games. The controller used with
this type of view is either a dual-axis controller or mouse and keyboard where
the character is moved with keys on the keyboard and the camera (or rather, the
head of the avatar!) is moved using the mouse. This camera-avatar relation and
interface seems to fit the research literature most of the time since it is
considered directly by researchers most of the time. (Figure
[1](#fig:first-person)).

Another common view in video-games is the third-person view where the camera
moves along with the avatar as the avatar moves. The camera usually has the
ability to look around the avatar by rotating in its place, but never able to
move away from the avatar across any axis. This view is also similarly
accompanied by either dual-axis controller or keyboard and mouse where the
keyboard is used to move the avatar and the camera while the mouse is used to
rotate the camera. (Figure [2](#fig:third-person))

Note that these two camera modes, albeit similar in some aspects, lend us
completely different body schemas and they change our intentional stance
strongly. This is best illustrated by the online multi-player video-game Dead by
Daylight, where in a post-apocalyptic setting, a group of survivors are trying
to survive against a killer who is trying to kill them, both of which are played
by actual players. What is interesting is that the survivors and the killer use
different camera views, and this is an important distinction between the two.
Survivors have third-person camera which allows them to rotate the camera and
look behind them as they are running away or as they are trying to fix a broken
engine to get their car running so they can run away, this also means that the
survivors avatars do not move their head as the camera is moved. On the
contrary, the killer has first-person camera, this means that the killer can
only look in the direction that they are running in, and this allows survivors
to be able to know where the killer is currently looking at by looking at them.
There is a significant difference between how these two roles are played in this
video-game mostly because of the camera movement, each player has a different
body schema depending on which camera mode they have.

A less common, but still discussed in the literature type of camera-avatar
relation is that of isometric cameras locked on the character, found in the
Diablo game series. This kind of camera-avatar relation is very similar to a
third-person view, with the difference that the camera is taking an isometric
angle and is not controlled by the user at all, merely following the avatar. The
controls used for this kind of game are usually either a dual-axis controller,
or in case of keyboard and mouse, the mouse, rather than the keyboard, is used
to move the avatar by issuing commands to move to a certain place. This type of
movement control might seem unintuitive, however {% cite klevjer2012enter %}
proposes that "because the clicking happens so fast, the experience nevertheless
approaches a sense of "pulling\" the avatar through a tangible interface.\" and
as such, the control interface can still create a sense of high correspondence
between the player's actions and movements of the avatar, reaching a real-time
synchrony as mastery of the control interface is reached.

What is common between these three camera-avatar relations is the tight coupling
of the camera with the avatar: the camera always follows the avatar as the
avatar moves around the world. In some cases, the camera can be rotated or moved
around slightly to peek around a box while crouching for example, but almost
always the camera and the avatar are in tight synchrony. {% cite
klevjer2012enter %} considers all of these camera modes to fall under the same
umbrella of camera indirectly controlled by the movements of the avatar, as if
the camera is pulled by the avatar around with an invisible string.

This group of camera-avatar relations can be considered to be intuitive and
similar to how we as humans almost always have a synchrony between our vision
and our body, with exception of cases like out-of-body experiences where a
person sees the world and their own body from a place outside of their physical
body. {% cite blanke2004out %} However, there are video-games where something
analogous to an out-of-body experience happens, and these are video-games where
the camera is not automatically attached to the avatar, but rather, the player
has manual control of the camera. This camera-avatar relation is most
characteristic of MOBA (Multiplayer online battle arena) games such as Dota 2,
where the camera angle in relation to the avatar is very similar to that of
Diablo, with the difference that the mouse is not only used to move the avatar,
but also allows panning of the camera across the world. (Figure [6](#fig:dota))

In these videogames you are allowed to look at the world and your avatar from
any place, and given our framework, the camera is now a novel extension to our
body-schema. Most embodied activities exhibit the same synchrony of vision and
body, like walking, swimming, driving a car, and in most cases of playing
videogames too, however in this case, we have a new range of intentional acts
available to us, through movement of the camera around the world. Our
body-schema now includes a different apparatus to work with, it's as if our
vision is no longer limited to our body, but rather there is a drone above us
that we can see from.
	
This opens up the possibility of a new kind of vision interaction with the
world. When the stakes are high, as is the case with e-sports, players strive
for the ultimate proficiency with their new body-schema, and the result is ways
of using vision that are not usual and can sometimes be cryptic for us. Camera
movement of professional players tends to be very fast, and sometimes outright
chaotic to an untrained eye since they want to optimise being able to scout for
information while still keeping an eye on their avatar, since the avatar is
still the most important part of the game, and the free-form camera movement is
mostly used similar to a binocular: to scout for information.

# Further Questions

There is a question to be asked here about how much this technical camera-avatar
independence leads to actual camera-avatar independence: do players actually end
up with their camera away from their avatar much, or is the camera still in
synchrony with the avatar for the majority of the time, but in a manner directly
controlled by the player rather than automatically.
	
Does the camera-avatar independence affect the embodied experience of playing
this videogame, perhaps by making it more difficult to be proficient in the
game, it is initially harder to extend your body-schema with this new form of
vision, but what happens once you are proficient?
	
Competitive e-sports increase the stakes and are motivation for players to
strive for utmost proficiency in a videogame, this usually leads to players
being very creative and highly skilled in using the interface available to them
(e.g. mouse and keyboard or a controller). In case of MOBA games with
independent cameras, players reach very high action-per-minute numbers, in the
last game of the largest competition for Dota 2, The International 10, the
players averaged 303 actions per minute, which is about 5 actions per second,
not including camera movements (camera movements are fluid and continuous and
are not considered as discrete, hence their exclusion from a numerical value).
{% cite dotabuff-true-sight %}
		
{% cite dolezal2009remote %} considers the question of action-ownership and
stakes with regards to telesurgery and embodiment. She stresses the importance
of a feeling of agency towards the task at hand, and proposes that high-fidelity
technologies could help induce a sense of agency and ownership of action.
	
There is a place to ask a similar question about videogames, when the stakes are
high, such as competitions with millions of dollars at stake, do players think
of the actions they take in the game as their own, do they feel complete agency
towards their actions in the game? What factors are at play here?

There is a place to ask a similar question about video-games, when the stakes
are high, such as competitions with millions of dollars at stake, do players
think of the actions they take in the game as their own, do they feel complete
agency towards their actions in the game?

# Conclusion

Videogames are usually formulated under information-processing cognitive models
when studied in cognitive science, however on a closer look, they can be
considered an embodied activity given the right framework. Here we consider
Merleau-Ponty's intentional stance and body schema as a framework to formulate
how a videogame might be considered an embodied activity.
	
Camera-avatar relations are an important factor affecting our intentional stance
in a videogame, and they lend us different body schemas, from first-person and
third-person camera views to an independent isometric camera that is controlled
by the player. Independent cameras in videogames allow for a novel extension to
our body-schema, an apparatus for vision that can move independent of the body.
	
There are still many questions left to be explored on the topic, and rightly so,
as the notion of videogames as an embodied activity is fairly perplexing and
requires a lot more exploration and study until we reach a more holistic
understanding of it.

{% bibliography --cited %}