Five-minute technical talk | AI technology and the governance of "cyber violence"

Five-minute technical talk | AI technology and the governance of "cyber violence"

Part 01

What is “cyberbullying”?

"Cyber ​​violence" refers to defaming and slandering others in the form of text, pictures, videos, etc. on the Internet, damaging others' reputation and privacy, etc., causing mental stress and psychological trauma to the parties involved. It is an extension of social violence on the Internet. The most common cyber violence we see is mainly on Weibo, videos, news information, and forums.

The causes of "cyber violence" are: first, the anonymity of the Internet, which protects personal privacy while also allowing infringers to make reckless remarks; second, some media, in pursuit of traffic and attention, use one-sided reporting and deliberately distort facts to increase topicality; third, when public opinion is formed, individuals tend to tend towards the direction of group values ​​and ignore their own ability to think rationally.

Part 02

Natural Language Processing (NLP) and "Cyberbullying"

Cyber ​​violence on social media is mainly spread in the form of comments and bullet comments. For the analysis of unstructured language data such as comments and bullet comments, the core AI technology used is mainly natural language processing. Natural language processing technology is based on machine learning and deep learning methods, which can enable machines to automatically learn language features, so that machines have the ability to understand human language. At present, this technology has been widely used in text classification, automatic summarization, question-answering systems, machine translation, sentiment analysis, etc. Common voice assistants in real life and the recently popular ChatGPT are common applications of natural language processing technology. In terms of "cyber violence" governance, the following directions will also be involved:

Text entity extraction:

The target of "cyberbullying" is usually a certain person or event, so we first need to filter out the comments on a certain cyberbullying event from the massive comment data, which mainly involves the named entity recognition algorithm (NER). NER algorithms are mainly divided into rule-based methods, statistical methods, and deep learning methods.

Figure 1 Named Entity Recognition Method

Text Sentiment Analysis:

Sentiment analysis can score a comment positively or negatively, identify whether the semantics contain different types of emotional details, and intelligently extract keywords that have the greatest impact on the overall sentiment from the text. This allows us to understand the emotional distribution of netizens behind tens of millions of comments, and even analyze the emotions of different groups towards different events by time period, region, and gender, timely control negative and violent emotions towards the event, and at the same time, discover more potential cyber violence behaviors based on polarity words.

Figure 2 Different emotion classifications

The technical points involved are mainly text classification and polarity word mining using machine learning (SVM, etc.) or deep learning (CNN). The overall process is shown in the figure:

Figure 3 Sentence-level sentiment analysis solution

Text similarity analysis:

Similarity analysis of comments on the same event can help us discover the public opinion trend of event comments. Similarity analysis of comments on different events can find comments that have similar words or expressions as those used by "cyberbullying" users, and dig out the recent positive/negative public opinion about a certain event/person. Currently, there are two main deep learning paradigms for similarity analysis, as shown in the following figure:

Figure 4 Two paradigms of similarity analysis

The first paradigm first extracts the representation vector of the comment content through a deep neural network, and then calculates the similarity between the two through a simple distance function of the representation vector (such as Euclidean distance). This method of extracting representation vectors is usually implemented using a twin network. Common models belonging to this category include DSSM, CNTN, etc.

The second paradigm is to extract cross-features of comment content through a deep model, obtain matching signal tensors, and then aggregate them into similarity scores.

Syntactic/lexical analysis:

Through syntactic and lexical analysis, we can dig out the common syntactic and lexical habits of a large number of "positive" comments and "cyberbullying" comments, and thus summarize the rhetoric and words commonly used by "cyberbullying" users in the current online environment, as well as the language characteristics used by different users when expressing the polarity of their opinions.

Syntactic structure analysis is used to identify the subject, predicate, object, attributive, adverbial, and complement of a sentence and analyze the relationship between the components. It is generally based on the RNN and LSTM sequence models of deep learning.

The task of lexical analysis is to convert the input comment content string into a word sequence and mark the part of speech of each word. Sequence labeling technology is mainly used. Specific algorithms include conditional random field (CRF), RNN+CRF, etc.

Figure 5 Lexical analysis example


Part 03

Summarize

The existence of "cyber violence" will not only directly endanger the rights and interests of the victims, but also have a negative impact on network security and social harmony. Relying on its technical accumulation in deep learning, image recognition, natural language processing, OCR, etc., China Mobile Smart Home Operation Center has launched content security protection products that can conduct security detection on multi-dimensional content such as pornography, violence and terrorism, politics, gambling, image OCR, and face recognition in pictures, texts, videos, and audio.

With the development of AI technology, Internet violence governance based on technical means will gradually play an important role. China Mobile Smart Home Operation Center will continue to explore advanced technologies in this scenario, combine cutting-edge technologies in the industry to empower content ecosystem construction, actively respond to the "Clear and Bright" series of special actions of the Cyberspace Administration of China, and contribute to a clear and bright network environment.

<<:  Five-minute technology talk | The next milestone in the 5G era: 5.5G

>>:  What role does a switch play in a network?

Recommend

5G applications drive cellular IoT module market growth

Global cellular IoT module shipments are expected...

How much do you know about intelligent edge?

What is the Intelligent Edge? The so-called intel...

Four tips for network capacity planning and configuration

When designing an enterprise network, there is a ...

Did you know that subset problems are actually template problems?

[[426614]] After understanding the essence, this ...

The challenges of 5G have just begun

The COVID-19 outbreak that has ravaged the world ...

Huawei grandly releases the Intelligent Micro Module 5.0 solution

[51CTO.com original article] On September 17, 202...

Review of 2021丨Highlights of the three major operators

2021 is the first year of implementation of my co...