uSens Linggan Wang Xiaotao: This is the best time for the development of gesture interaction

uSens Linggan Wang Xiaotao: This is the best time for the development of gesture interaction

[51CTO.com original article] On July 21-22, 2017, the WOT2017 Global Innovation Technology Summit with the theme of "Artificial Intelligence, More Than a Technological Revolution" was successfully concluded at the Beijing Renaissance Hotel. The two-day summit saw 30+ frontline experts in the field of artificial intelligence and technology experts from BAT and other companies share their views on cutting-edge technology topics such as machine learning, human-computer interaction, and technology practice. In addition to the wonderful speeches on the field, there were also hands-on labs and technology experience areas built specifically for AI enthusiasts outside the venue, all of which made this conference full of highlights.

Wang Xiaotao, senior researcher at uSens China R&D Center, gave a keynote speech on "Interaction Technology in VR/AR" at the Human-Computer Interaction Branch. After the speech, the reporter interviewed him and asked him to share his wonderful views on the field of gesture interaction.

Human-computer interaction often goes hand in hand with hardware changes

When it comes to VR and AR, most people's first impression is those virtual visualization devices, such as Oculus rift, Htc vive, PSVR, Hololens, etc. People who have experienced it know that these devices do give people a very magical feeling, and they are becoming lighter and lighter to wear, with higher resolution and richer content. It can be said that VR/AR devices have become more mature, consumer-oriented and popular. Wang Xiaotao pointed out that the development of VR and AR seems to be slowing down from the outside world, but in fact, from the industry's point of view, it has never stopped, and it is the advancement of key technologies. The entire industry not only relies on hardware equipment, but also needs many other things to support and improve.

Wang Xiaotao told reporters that when people look back at history, they will find that every major change in hardware products is accompanied by innovative human-computer interaction solutions, and the trend is to develop in a more natural direction. For example, the first desktop computer Programmer keyboard for input and paper tape for output; the first consumer PC Macintosh, mouse for input and GUI interface for output; the first smartphone iPhone, which everyone is familiar with, has multi-touch input and IOS interface for output. "It can be seen that with the replacement of hardware products, the interaction method is becoming more and more convenient, easy to use, and easy to learn, and no longer requires memorizing various commands. If VR/AR wants to become a new generation of revolutionary hardware products, it also needs a brand-new human-computer interaction method."

So what are the milestone products in the history of human-computer interaction? Wang Xiaotao listed three important products. The first is the first desktop computer in 1965 (which pioneered the popularization of commercial computers), which introduced cards as output and keyboard input; the second is the Macintosh in 1984 (which pioneered the popularization of PC consumer products), which introduced GUI output and mouse input (the second computer to use a graphical user interface after LISA, which was first applied to personal computers, system software, MacOS); the third is the iPhone in 2007 (which pioneered the popularization of smart phone consumer products), which introduced multi-touch and iOS systems.

Promising Gesture Interaction

It is understood that the current interaction methods of VR/AR can be said to be varied, and various manufacturers are making various attempts and explorations. Wang Xiaotao introduced that there are manufacturers based on handles and joysticks of handheld devices, such as Oculus, HTC Vivi, PSVR, there are also Gear VR based on touchpads, and there are also various manufacturers based on magnetoelectric sensor gloves such as Noitom, eye tracking, voice recognition, and bare hand gestures. He believes that each type of method has its own characteristics, suitable scenarios and applicable content, especially some are suitable for VR environments, and some are suitable for AR environments.

In a VR environment, since the scenes are all virtual, various handheld devices and sensors do not affect the sense of use. Although people cannot see what they are holding in their hands, they can habitually use these interactive tools like using a mouse. Therefore, in a VR environment, the device-based interaction method is currently dominant. On the one hand, the method based on physical components is better in terms of accuracy and stability, and it is relatively simple to implement; on the other hand, major manufacturers simultaneously launch interactive devices and matching interactive content, that is, sell both hardware and software, which is driven by profits.

He told reporters that in VR interaction, another way is gesture interaction. The reason is that even in the virtual world, people have to pursue natural, convenient and flexible solutions. From PC to smartphone, people have almost thrown away the mouse and keyboard, and should not pick up additional equipment in the new generation of products. But objectively speaking, natural gesture interaction is still in the technical research stage, and there are still many problems to be solved. But there is no doubt that gesture interaction is more natural, which is a crucial aspect for VR immersion. It is more engaging in VR and has more prospects in the mass market.

Unlike VR, in an AR environment, hands are visible, and there is a real need to use hands in AR, which means that the interaction method in an AR environment cannot use devices such as handles, and is more inclined to an interaction solution that does not affect the real function of the hands. Then the only options are slightly darker technologies such as gestures, eyeballs, voice, EEG and myoelectricity. Voice recognition is already very mature, but dragging something through a conversation is very inconsistent. So, from the perspective of technical maturity and scope of application, gesture interaction has more advantages. Hololens chose the gesture solution, and other methods can be used as auxiliary to make AR interaction more three-dimensional and convenient. "It can be seen that gesture interaction has a future in VR and a place in AR. As the technology matures, it will be used more and more."

Accurately identifying gestures is a technical job!

Gesture interaction is not simply recognizing a few fixed gestures, nor is it tracking the five fingertips. Instead, it is about being able to recognize the current state of the hand in real time while being able to see it, such as the hand shape, angle, distance, etc.

Wang Xiaotao revealed that currently, it is popular to model the hand through bones, including joints, bone lengths, and angles between bones. Commonly used quantitative models include NYU's 14-joint model, Imperial College's 16-joint model, MSRA's 21-joint model, etc. Different ways of expressing quantization have different precisions. Usually, the number of degrees of freedom is used to characterize the precision of the hand model, such as Microsoft's 26 degrees of freedom.

"The key technologies of gesture interaction basically include a set of overall processes of detection, tracking, regression, and hand model optimization." Wang Xiaotao gave an example, first the device must be able to find the position of the hand and identify the joints of the hand so that the joints connected in series look like a hand; secondly, after finding the hand, the position of each joint needs to be determined. He introduced that there are methods based on point cloud matching, which matches the point cloud of the detected hand through a ball model or a ball-and-stick model; there are also methods that directly regress coordinate points, where each point has two or three dimensions, and the predicted values ​​of all points are given in order.

He also emphasized the structural constraints on the hand. There are two main types of this technology. One is segmented, which first obtains each joint point, and then uses post-processing to ensure that the hand obtained by the joint points is not deformed; the other is end-to-end, which directly integrates the hand model constraints into the joint point regression, or directly regresses the 3D hand pose, etc. He emphasized that this link is very important and plays a very obvious role.

Gesture interaction is at a new development opportunity

Of course, Wang Xiaotao also admitted that there are still many problems to be solved in gesture interaction, such as the need to improve the accuracy and stability of recognition, and the topic of the algorithm remaining unchanged for thousands of years. He gave an example, in the VR/AR environment, people need to achieve real-time, which cannot occupy too many computing resources, and may still run on the mobile terminal, and the algorithm is deep learning, so the optimization and acceleration of the algorithm face great challenges.

In addition, we need to solve the problem of visual blind spots and feedback. Take the feedback problem as an example. In AR/VR, the feedback of gesture interaction is more troublesome. People do not have the tactile experience. Currently, there are devices that provide a sense of touch, and there are also multiple feedback mechanisms, such as using sound and other elements to create an atmosphere. Because of this, gesture interaction still has a long way to go.

Wang Xiaotao told reporters that gesture recognition has been developed for so many years, but it has never been popularized before, so that many people began to doubt the authenticity of this technology. Why is this? Because the previous needs were imaginary, for example, people hoped to use gestures to replace the mouse or the remote control, but after all, it is optional and may not be easy to use. On the one hand, due to the problems of technology and computing power, the performance of gestures was worse than that of the mouse and remote control. On the other hand, the demand for application scenarios was low, and the advantages of gestures could not be reflected.

"But VR and AR are a good opportunity for gestures. First of all, as we said before, gestures are a high-priority interaction solution in AR/VR interaction. In addition, the computing power of mobile devices has become stronger. It can be said that combining with VR/AR is a huge opportunity for the development of gestures." Wang Xiaotao is very sure of this.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  Qi Chao, Triangle Beast: How to eliminate machines’ misunderstanding of humans

>>:  To accelerate the layout of ICT, Fenghuo released the new server FitServer V5, which will show you what "more than powerful" means!

Blog    

Recommend

Inventory of common Ul Ol lists and common list marker icons in HTML

[[402167]] 1. Concept The CSS list properties are...

The impact of drone technology and use cases

Before we dive into the ways drones can make the ...

What is the difference between localhost and 127.0.0.1? Do you know?

When front-end developers are debugging locally, ...

Why 5G needs network slicing and how to implement it

[[189050]] When 5G is widely mentioned, network s...

Google withdraws from 2021 MWC World Mobile Communications Conference

The annual MWC World Mobile Communications Confer...

Brief analysis: What exactly does a smart network card do?

What exactly is SmartNIC (Intelligent Network Car...

When you "ping", do you know the logic behind it?

[[262430]] When we encounter a network outage, we...