Master's Thesis

Integration of Mixed Reality and Touchscreen Interfaces for Humanoid Robot Embodiment in a Virtual Clinical Setting

Link to Thesis

Introduction

When I was in the process of searching for a thesis topic, I had an idea early on: I wanted to work on something beyond just games. When I first learned about this project, my immediate reaction was like, "How cool!" Robotics has always been a field that I find incredibly fascinating. I thought that if I could dive into this area and make a connection with it, it would be an incredibly exciting and rewarding experience.

The main goal of this project was to use AR devices in conjunction with a smartphone to help users to remotely control a humanoid robot (Robody from Devanthro)’s arm through an immersive experience and an intuitive way, completing various tasks. The project began with building a virtual scene and a digital twin of the robot in Unity, followed by the development and integration of a smartphone application, and finally, testing my control methods on the physical robot.

Thesis Abstract

This paper aims to develop a comprehensive Robody control platform that integrates a smartphone and an augmented reality (AR) head-mounted display (HMD) to explore new modes of achieving remote embodiment through Robody. We created a multifunctional smartphone application allowing users to remotely monitor Robody, assign autonomous tasks, and use the smartphone as an alternative to AR controllers. Additionally, we proposed two smartphone-based and one hand tracking-based methods for controlling Robody's hands. We successfully integrated our developed control methods into the physical Robody, validating the feasibility of our approach. We evaluated the usability, embodiment, and performance of various control methods through a user study. The results indicate that our control system's usability surpasses the average level and induces a certain level of embodiment. In the experiment, the hand tracking control mode performed the best, while the smartphone pointer control mode performed the least satisfying. Our work demonstrates the potential of combining smartphone and AR HMD, as well as hand tracking-based control methods in Robody control, providing a foundation for achieving further embodiment and improved control in the future.

Control Methods

In the thesis, I designed and implemented three methods for controlling the robot's arm using AR HMD and smartphone hardware. Each method has its own specific focus and approach. Their common goal was to free users from the cumbersome controllers typically paired with AR HMDs and to find a new balance between portability and precision.

Smartphone Pointer Control Mode

This is an original design. Imagine holding not a smartphone, but a laser pointer. As you rotate your wrist, the laser pointer aims in the direction you want to target. If you could further control the length of the laser beam, its tip could accurately reach nearly any position in 3D space, similar to positioning via polar coordinates. This is the theory behind my control method: the user tilts the smartphone to direct a virtual ray towards the desired target for the robot's hand, and then uses a slider on the smartphone screen to control the ray's length, which corresponds to the distance the robot's arm needs to reach to touch the target.

Why use this method? Ideally, the smartphone would function like a VR/AR controller, precisely tracking the user’s hand movements and rotations. However, the motion sensors in a smartphone alone are insufficient to achieve this. The accelerometer in smartphones lacks the precision needed to track 3D translation: even minor errors, when integrated twice, can result in significant drift. In contrast, the smartphone’s gyroscope is quite reliable and provides accurate orientation data. So, based on the performance of these sensors, I designed a control method that relies solely on gyroscope data, while discarding accelerometer data (although the accelerometer might be used internally for gyroscope calibration).

This method is reliable and, compared to the other two approaches below, is robuster against environmental interference. However, same as the complex description, it has a steep learning curve. The lack of control over certain degrees of freedom also means that users cannot fully control all of the arms.

Smartphone Motion-Tracking Control Mode

While it’s challenging to track hand movements using only the smartphone's motion sensors, are there other options? Absolutely. We can use the smartphone’s camera data for translation tracking. Inspired by existing researches, I used AR-related APIs available on smartphones to achieve hand movement tracking. These APIs internally use SLAM (Simultaneous Localization and Mapping) algorithms, processing the camera's visual data and integrating various sensor inputs to reconstruct the user’s environment and determine the smartphone's position. With the help of the camera, users can control the robot's arm using their smartphone just as they would with an AR controller.

This approach is straightforward and intuitive, as confirmed by participants in the experiment. However, the optical data provided by the camera is vulnerable. Images captured by the camera can be easily affected by factors such as lighting, occlusion, and focus, which makes this control method less robust in certain conditions.

Hand-Tracking Control Mode

Speaking of cameras, there is an even more lightweight solution that doesn't rely on a smartphone at all: using a camera to track the user's hand movements and then using the tracked position and pose data to control the robot’s arm. This method is the most intuitive among the three, allowing users to control the robot's arm as if it were their own. It’s a mature technology that has been widely adopted across various fields.

However, this lightweight approach comes with certain trade-offs. By setting aside the smartphone, we also lose access to its motion sensors. Hand tracking based on an RGB monocular camera lacks robustness, similar to the previous method that relied on camera-captured images. It is highly vulnerable against external factors. In addition to being affected by lighting, occlusion, and focus, hand tracking can also become unpredictable if the user's hand moves out of the camera’s field of view, or if multiple hands—or even another person’s hand—appear within the camera’s range.

In summary, each of these three methods has its own strengths and weaknesses. However, they all offer a way to move beyond the traditional AR controller.

Validation on the Physical Robot

Even though I was already in the final stages of the thesis, I would still say that this part of the work was the most fulfilling for me. I needed to experiment with the control modes for the Robody arm that I had designed and implemented on the actual Robody to validate my methods.

Obviously, before I could safely apply my control methods directly to the expensive robot, I needed to ensure everything was secure enough. This required simulating the process in a simulator before moving on to the next stage-a point I discussed extensively in my thesis. To accurately replicate the robot's operating environment and avoid dealing with complex setup configurations, I obtained a Docker container image from Devanthro, which contained a simulation program for the upper body movements of the robot, for testing my implementation.

Once the container was successfully deployed, I used functions provided by the Unity ROS TCP Connector package to send ROS messages from my simulation application to a specified ROS topic within the Docker container. These messages transmit data about the angles of each joint of the robot arms simulated in Unity. The ROS nodes running within the container subscribed to this topic, processed the incoming messages, and then applied the changes to the Robody arm within the simulation, displaying the movement in RViz. Everything appeared to be working smoothly, which led me to the opportunity to interact further with the physical Robody.

As shown in the video above, my control method was successfully validated on the Robody, just as expected. After months of hard work, there’s nothing more rewarding than seeing this scene come to life!