[MUSIC] When I have been talking about body language, I've mostly been concentrating on how to simulate it for agents, virtual characters that are completely controlled by the computer. But as well as agents, we also have avatars, virtual characters that represent and are controlled by users. Avatars are really important for social VR. And they also need body language, so that people can take part in realistic social interactions via their avatars. The challenge of creating an avatar with body language is very different from creating an agent. And, the good news is that it's potentially easier. While with agents, you have to generate, in real time, body language for a character and that body language should respond to the user. With an avatar, the user controlling that avatar will already be doing body language. And we just have to reflect that body language on the avatar. This means we have to track the user's body language so we can reproduce it on the avatar. This is a harder tracking problem than for agents. For agents we just need enough information to respond to a user. But for avatars we need to completely reproduce that body language. That means we need more information. The good news is that we can do quite a lot with standard VR kit. If you have a high end VR set up with hand tracking, you can capture quite a lot, audio from speech, head movements, and hand gestures. This means that avatars can represent all our verbal communication, and a lot of our non-verbal communication. The audio gives us speech, the hand trackers give us gestures, and the head tracker gives us gestures like nods. Because high-end trackers give us position that can also be useful for proxemics, knowing how far you are from other people and some idea of posture shifts. Mobile VR systems are a lot more limited, giving us only audio and head rotation. These can still provide important social signals but we should be aware that mobile VR avatars will have much reduced capabilities. All current VR systems also have important drawbacks. They can't track facial expression. This isn't just a drawback of current systems, it's a problem with head mounted displays in general. You can't really track the face if it's covered by a big headset. Though some people are trying to use muscle sensors embedded in an HMD to track expressions. Similarly, you can't track the eyes either. Though it seems quite likely that HMDs with eye tracking will be released soon. The FOVE headset that's recently been released already supports eye tracking. But for those without eye tracking, all we can represent is whether the user is facing someone else, not where they're actually looking. Finally, a typical high end set up will track three points in the body, head and two hands. These are very important, but very far from capturing all the body language. There are full body motion capture systems that can be integrated with VR, but these can be expensive. Body language tracking could actually be improved massively by adding just one point, the chest. If all I have is a head tracker it's hard to know if I'm moving my body forward or just poking my head forward. A chest tracker would sort this out. It would also make it much easier to represent posture. Representing the movement we do have is fairly straightforward. We can map the hand controllers onto the movement of the hands using inverse kinematics. For the head tracker, we have to decide whether to map the movement onto the head or the whole body. For position tracking, it's normally more realistic to assume that we're moving our whole body though we might want to map very small movements onto the head. For rotation, it's harder. But moderate rotations are likely to be with the head. And large ones with the body. You just have to be aware that with limited information, you will get it wrong sometimes. What do we do about features we don't have information about? What about eyes or facial expression? We could generate them procedurally, just like we would for an agent. This could generate realistic animation, but there's a big problem. It is not the real movements the user is making. If I'm describing an emotional intense life experience to another person, I'm naturally and probably subconsciously going to read their reaction from their facial expression and gaze. But if these are automatically generated, they have nothing to do with what the real person is actually doing. There might just be some generic looking around and jolly smiling that is completely inappropriate to the situation. Worse still, because I'm interpreting this subconsciously I'm probably not even aware that I'm making the judgments. I could end up feeling insulted by this person through no fault of their own without even knowing why. Oculus made an interesting decision with their Oculus Avatars. If you look at them, they only represent visually information that can be tracked. They only have hands, heads, and a bit of body. Their eyes and most of their faces are covered with large sunglasses. This means they can represent the available tracking information accurately. But where there's no information, it isn't being misleading. While I think there's potential for using automatically generated behavior or better tracking for avatars, I do think that Oculus's decision was a wise one. [MUSIC]