Reconstructing Reality
When I started this book, I had a plan for where I wanted it to go and what I wanted to cover. There have been some issues that have cropped up (like a Simulator that isn’t quite capable of fully simulating the Vision Pro), and even some code that just doesn’t quite match with the developer documentation. Nonetheless, I have persevered, and you are now in the home stretch! I’m pleased to say that with the technologies covered in this chapter, you’ll have a leg up on many of the other visionOS developers I’ve chatted with.
You’re going to be using the data provider pattern established in Chapter 7, “Anchors and Planes,” with additional data providers to bring more of the real world into your applications. In the Plane Detection hands-on, you may have noticed that the planes weren’t quite as precise as you might hope, and objects placed in your scenes are still visible even if you walk into a different room. This chapter is going to solve those problems using the computing horsepower of the Apple Vision Pro.
This chapter focuses on three useful topics:
Hand-tracking: In Chapter 7, you used a hand AnchorEntity to attach objects to your left and right hands. Using the full ARKit hand-tracking provider, however, you can (and will) monitor each finger joint.
Scene reconstruction: See the world around you? When wearing your Vision Pro, you can literally see whatever is in your environment thanks to the high-resolution displays. However, that world is just an image. Yes, you can use a plane detector to find walls and tabletops, but with scene reconstruction, you can represent all the nooks and crannies as well.
Occlusion: Occlusion means to hide or block, and it’s something you experience in reality all the time. Walls hide the outdoors, closets hide your clothes, and basements hide unspeakable terrors. With the tools you’ve used up to this point, nothing hides your virtual objects (except other virtual objects). Using occlusion magic, you can make objects in the real world cover virtual objects to deliver much more immersive experiences.
Once again, what you’re working on is going to require a real Apple Vision Pro. The simulator just can’t provide the sensor access needed.
Hand-Tracking
Most VR and pseudo-AR headsets require the use of handheld controllers that present themselves as “hands” within your view. This is generally fine for gaming, but it doesn’t take long before your brain registers the disconnect between what you’re seeing on the screen versus what your hands are really doing. The Apple Vision Pro is designed to use your hands as its controllers, and it does so with almost alarming accuracy.
The hand-tracking you used in the last chapter is fun and can certainly create some interesting effects, but it has very little flexibility in terms of interactions. Wouldn’t you like to interact directly with objects with more than just a fingertip and a thumb? A hand-targeted AnchorEntity is easy to use, but by employing ARKit with a HandTrackingProvider (https://developer.apple.com/documentation/arkit/handtrackingprovider), you can track up to 27 different joints per hand.
Hand-tracking works in the same way as the PlaneDataDetector:
You create an ARKit session with ARKitSession().
A data provider is created. For hand-tracking this is done with HandTrackingProvider().
The ARKit session is run with the tracking provider.
Updates arrive containing a HandAnchor.
You process the updates however you want!
Hands are different than planes and so is the data that hand anchors provide. Let’s take a look at ARKit’s HandAnchor and what information it contains.
ARKit’s HandAnchor
An ARKit hand anchor tracks a hand’s position in 3D space and provides three useful properties you’ll access in your upcoming code:
.originFromAnchorTransform: The location and orientation of the base of the hand in world space.
.chirality: The “handedness” of the update. In other words, the .right or .left hand.
.hand: Access to the individual joints in the hand, along with the location of each joint in relation to the base of the hand.
Of these, I’d like to believe that your interest gravitates toward handSkeleton—because who doesn’t like a skeleton? Read more about HandAnchors at https://developer.apple.com/documentation/arkit/handanchor.
Hand Skeletons and Joints
The .handSkeleton property is an instance of a HandSkeleton data structure. Within the skeleton is a collection of joints, with associated names and transformations.
That, unfortunately, is about all the information Apple makes easily available. You can get a list of all the available hand joints at https://developer.apple.com/documentation/arkit/handskeleton/jointname, but the names of the joints don’t necessarily make that much sense (what is the intermediate tip of a finger?!).
For a better sense of where the different joints are located, you can turn to a developer video where Apple displays a few frames with a diagram of hand and joint locations: https://developer.apple.com/videos/play/wwdc2023/10082/?time=935.
Assuming you aren’t interested in playing a video as reference material, I’ve provided a screen capture in FIGURE 8.1. This figure, however, includes the word “hand” in front of each joint, which has been removed from the actual data structure since the video was created.
FIGURE 8.1 The joint locations on a hand— just ignore the “hand” prefix to each joint name
Accessing Individual Joint Locations
To access the current location and orientation (the transform matrix) of an individual joint within a hand anchor, you use this syntax:
<joint transform matrix> = <anchor>.handSkeleton?. joint(<joint name>).anchorFromJointTransform
The transformation matrix you can get from a joint is relative to the base of the hand, so you can’t use it directly. Instead, you must multiply it by the transformation matrix of the base in world space. That value is provided by anchor.originFromAnchorTransform:
<world transform matrix of joint> = <joint transform matrix> * <anchor>.anchorFromJointTransform
The world transform of the joint can subsequently be used to set the position of an entity. This enables you to create an entity that behaves like an AnchorEntity for every single joint on each hand.
Working with All Joints
When I first started coding the project in this chapter, I began by explicitly referring to individual joints and tracking just a few. After explicitly listing out about a dozen of the joints, I decided that rather than manually coding up a few joints, why not track them all?
To access a collection of all the joints in a HandSkeleton, you use the class property JointName.allCases:
HandSkeleton.JointName.allCases
From there, you can iterate over each joint with a loop like this:
for joint in HandSkeleton.JointName.allCases { if let fingerJoint = anchor.handSkeleton?.joint(joint) { // Do something useful with the fingerJoint here. } }
That’s everything you need to create a tracking class. You’ll be doing this as a hands-on project in a way that is slightly different from past projects. Your primary goal in this hands-on is to create a new HandTracker.swift class, not to build any fancy interfaces or experiences. Nonetheless, you’ll want to create that class within a Mixed Immersive Space project, making it much easier to test the code.