Complex Technologies in Virtual Reality Devices

Introduction

Most consumer VR devices lack motion-sensing interaction peripherals. Exceptions include HTC Vive, Oculus Rift, and PSVR. The absence of motion-sensing interaction can increase the likelihood of motion sickness during use.

VR devices that support motion-sensing interaction can reduce motion sickness, improve immersion, and enable interaction with virtual scenes. Motion-sensing devices include motion seats, treadmills, haptic suits, spatial positioning systems, and motion-capture technologies.

Five basic principles of motion-sensing interaction

1. Laser positioning technology

The basic principle is to install several laser-emitting units in the space and sweep lasers horizontally and vertically. The tracked object is equipped with multiple light receivers. By calculating the angular differences of the received beams, the system derives the object’s 3D coordinates. As the object moves, its coordinates update, producing motion data for capture.

Representative system: HTC Vive Lighthouse tracking

HTC Vive’s Lighthouse system uses lasers and photosensors to determine the position of moving objects. Two base stations are installed roughly on diagonal corners of the space; each base station emits laser beams and contains two scanning modules that alternately sweep the space in horizontal and vertical directions.

The HMD and controllers include as many as 70 photosensors. The system computes the time and angle of received laser signals to determine each sensor’s position relative to the emitters. Using the multiple photosensors located at different positions on the HMD and controllers, the system calculates the devices’ position and orientation.

Advantages and disadvantages: Laser positioning is relatively low cost compared with some other approaches, offers high positioning accuracy, provides broad tracking range, supports multiple targets, and yields low latency. Its disadvantages include reliance on mechanical scanning components, which can affect stability and durability. For example, base station vibration or mechanical wear over time can impair tracking performance.

2. Infrared optical positioning

This technique installs multiple infrared cameras to cover the space. Tracked objects are fitted with infrared reflective markers. Infrared light emitted by the cameras reflects off the markers and is captured by the cameras. By processing the captured reflections from multiple cameras, the system computes the tracked object’s spatial coordinates.

Representative system: Oculus Rift active infrared optical tracking + 9-axis system

Unlike passive infrared systems, Oculus Rift uses active infrared tracking: the headset and controllers carry infrared LEDs rather than passive reflectors. Two cameras, fitted with infrared filters, capture only the infrared emitted by the devices. Software then computes the spatial coordinates of the headset and controllers.

Oculus Rift also includes a built-in 9-axis inertial sensor. When infrared tracking is occluded or blurred, the 9-axis sensor provides pose information to maintain tracking accuracy.

Advantages and disadvantages: Standard infrared optical tracking provides high precision and low latency, but a full multi-camera setup is costly and cumbersome, so it is often used in commercial scenarios. Oculus Rift’s active infrared approach with a 9-axis sensor reduces setup complexity and camera count (only two cameras), improving ease of use and longevity. Its limitations include a limited interaction area (about 1.5 m by 1.5 m) and limited support for many tracked objects due to the cameras’ field of view.

3. Visible light positioning

Visible light positioning is similar to infrared optical tracking but uses visible light. Tracked objects carry light sources of different colors. Cameras capture these colored light points and distinguish objects and positions based on color and location.

Representative system: PSVR

Sony’s PSVR uses this method. The lights on the headset and controllers are captured by a camera; the headset emits blue light and the controllers emit different colors (for example, light blue and pink), allowing the system to compute the spatial coordinates of the light spheres.

Advantages and disadvantages: Visible light tracking is the lowest cost option, requires no complex algorithms, is easy to implement, and tends to be durable and reliable. Its drawbacks include lower positioning precision, poor occlusion tolerance, sensitivity to ambient lighting (strong ambient light can reduce detectability), potential confusion with similar colors in the environment, limited tracking area due to camera field of view, and limited numbers of trackable targets.

4. Computer vision motion capture

Computer vision motion capture uses multiple high-speed cameras to record a moving target from different angles. After cameras capture the target’s motion, software processing reconstructs the trajectory on a computer, completing motion capture.

Representative system: Leap Motion gesture recognition

Leap Motion’s gesture recognition for VR uses this principle. Two cameras mounted on the front of the HMD apply stereo vision to extract 3D position data and track hand gestures, building hand models and motion trajectories for interaction.

Advantages and disadvantages: This approach can capture multiple targets in a monitored area with relatively few cameras. It provides high accuracy for large-object tracking and does not require users to wear tracking devices, offering a less constrained interaction experience. Disadvantages include high computational demands, increased hardware requirements, and sensitivity to environmental conditions. Low light, cluttered backgrounds, occlusions, or suboptimal camera angles can degrade capture quality, and fine motions may be difficult to capture reliably depending on camera placement and processing.

5. Inertial sensor-based motion capture

This method requires tracked targets to wear inertial sensor units at key points, integrating accelerometers, gyroscopes, and magnetometers. The system consists of inertial modules and data processing units. The processing unit uses kinematic data from the inertial sensors; as the target moves, sensor measurements change and the system reconstructs the motion trajectory using inertial navigation principles.

Representative system: Noitom Perception Neuron

Perception Neuron is a flexible motion-capture system worn on relevant body parts, such as a glove for hand capture. Its small modules integrate accelerometers, gyroscopes, and magnetometers and can capture detailed motions from single limbs, full body, and fingers, including dynamic movements like running and jumping. Data can be transmitted wirelessly.

Advantages and disadvantages: Inertial sensor systems are less affected by external environmental factors, require no external beacons or cameras in the tracking space, provide rich motion data, high sensitivity, good dynamic performance, and wide movement range, offering a realistic interaction experience. Their drawbacks include the need to wear the equipment, which can be cumbersome, and other limitations related to sensor operation and calibration.

Summary

Each motion-capture approach has trade-offs. For example, HTC Vive’s laser positioning offers high precision and wide range but is less durable; Oculus Rift’s active infrared tracking addresses some durability concerns but has a limited tracking area.

Currently, for consumer VR, HTC Vive’s laser-based tracking is among the most practical solutions because it delivers the largest tracking space and high precision at the consumer level. In ideal conditions, inertial sensor-based systems such as Perception Neuron can capture finer motions over large spaces, but these systems are still mainly used in commercial applications rather than consumer markets.

In the longer term, computer vision motion capture is expected to become dominant. As cameras, algorithms, and processing hardware improve, computer vision approaches could surpass inertial systems by enabling fine-grained motion capture without wearable sensors. Examples such as remote 3D holographic demonstrations share similarities with this approach. Currently, however, computer vision motion capture is not yet fully mature, but its prospects remain promising.