Detailed explanation of the design points and principles of 6G system data governance solutions

Detailed explanation of the design points and principles of 6G system data governance solutions

This article is reprinted from the WeChat public account "Big Data DT", the author is Tong Wen and Zhu Peiying. Please contact the Big Data DT public account to reprint this article.

The scope of data use is different, and the economic and technical connotations of data governance itself are also different. Data governance refers to the management, maintenance and in-depth development of data through relevant processes and technologies to obtain high-quality data that can serve as a key asset for the organization.

Each mobile network operator (MNO) isolates and stores the data generated in the mobile communication system according to technical domains, including the Radio Access Network (RAN), Core Network (CN), Transport Network (TN), and Operation, Administration, and Maintenance (OA&M). The data owned by different network elements and different participants is not open and transparent, and the resulting data islands are the main bottleneck in data collection and sharing.

On the other hand, large OTT (Over-The-Top) business companies have accumulated expertise in data governance and monetization strategies (such as data storage, analytical services, API interfaces) that is far ahead of companies in the telecommunications field.

The data governance solution of the 6G system will provide strong support for AI and perception services, and will give rise to new business methods and system characteristics.

1. Design points and principles

The scope of data governance goes far beyond traditional data collection and storage. In general, system design needs to consider four aspects, as shown in Figure 1.

Figure 1 Key points of data governance design

1. Data availability and quality

Data availability and quality are one of the biggest challenges to the application of AI in various industries. Improving data availability means that data cannot come from just a single system or a single field, but needs to come from multiple systems and different fields at the same time. This raises a fundamental question: how to break the physical boundaries (between multiple manufacturers, multiple operators, and multiple industries) and allow data to enter the heterogeneous data ocean?

Once the originally scattered and isolated data are collected and used, another question arises: how to improve the quality of the data? The acquisition of massive data does not mean that the acquired data is usable and high-quality. At the same time, while considering reducing the computational complexity and energy consumption of data processing, it is also necessary to improve the efficiency of data processing.

2. Data sovereignty

With the full digital transformation of society, the importance of data sovereignty, data security and privacy has never been more prominent. Many countries have formulated laws and regulations on privacy protection. Service providers are also constantly updating their privacy protection plans, and governments of major countries are also formulating or have issued regulations related to data management.

For example, the General Data Protection Regulation (GDPR) promulgated by the European Union in 2018 regulates the use of data at the EU level. In 2019, China promulgated the Data Security Management Measures, which together with the Cybersecurity Law promulgated in 2016 constitute the Chinese version of GDPR. The United States is also implementing privacy-related laws, such as California's Consumer Privacy Act, which officially came into effect in January 2020.

How to fully tap the intrinsic value of data, provide precise support for various businesses, while taking into account privacy protection and respecting data sovereignty has become a hot topic in recent years. The design of 6G systems should take into account regulatory uncertainties, especially the uncertainties caused by regulatory differences between different regions.

3. Knowledge Management

Generally speaking, knowledge can be regarded as processed data with specific uses or values, which can be directly used by physical or virtual entities in different technical and business fields.

Knowledge management includes the generation, updating and opening of knowledge. As for the generation and updating of knowledge, we need to carefully check the source and quality of data and take measures to intercept low-quality and harmful data generated by unreliable or even malicious data sources. Opening knowledge as a capability requires a suitable platform and interface design.

4. Legal issues

A variety of sensors and other technologies can generate data in real time, making data collection and use increasingly complex and sensitive. The increase in data generation capabilities not only provides new data streams and content types, but also raises policy and legal concerns about data abuse: malicious institutions or governments may use these capabilities to achieve social control.

At the same time, new technological capabilities also make it difficult for ordinary people to distinguish the authenticity of technical content. For example, it is difficult for ordinary people to distinguish between a real video and a "deep fake" video.

There is a fragile balance between protecting the social benefits of technology and preventing its capabilities from being used to exercise social control and deprive people of their freedom. How to protect this balance becomes increasingly important. In order to identify fraudulent behavior and prevent the abuse of advanced technologies, stricter legal and policy measures are needed.

2. Architecture Features

An independent data plane is a key feature in the design of a data governance system (as shown in Figure 2). It will provide 6G systems with common data-related capabilities, thereby providing transparency, efficiency, inherent security, and privacy protection for internal and external functions of the 6G system. The following will introduce the basic concepts and related network functions and services.

▲Figure 2 Independent data plane realizes complete data governance

1. Independent data plane

The independent data plane is designed to implement the data governance solution of the 6G system. The data it processes comes from different business entities. Regardless of where the data comes from, the entire life cycle of the data is processed on this plane, including data generation and collection, data processing and analysis, and data service issuance.

Therefore, an independent data plane can provide data services to external business entities (such as vertical industries such as automobiles, manufacturing, and medical care), and can also provide network automation and optimization services for the 6G system itself (such as the control plane, user plane, and management plane). Configuration, status, and logs related to network operation, as well as user personal data, sensor data, and data provided by other parties are all collected.

The collected data will form a rich data resource that can be organized in a distributed form. In order to prevent problems caused by directly using raw data for applications such as AI and perception, raw data usually needs to be preprocessed before being used (such as anonymization, data format reshaping, denoising, conversion, feature extraction, etc.).

To ensure data integrity and process compliance, policies involved in data processing (such as geographic restrictions, national or regional privacy regulations, etc.), whether or not they come from the regulatory level, must be followed by default. When passing data to the data plane, the data use rights and obligations agreed in the data contract must also be followed. Data desensitization is the key to protecting privacy, and the data plane needs to provide this service.

All the above services provided by the data plane are operated and managed by a self-contained OA&M system.

Another important function of the data plane is to generate knowledge based on data collection, processing and orchestration. In order to coordinate the processing and transmission of data from different data sources, the production of knowledge also needs to be carried out according to contractual requirements.

As new data sources, data models, and data themes are noticed and used by data customers, the data governance framework can continue to evolve and be enriched. Therefore, the operational management of the data governance framework and the real-time development of the framework can be parallel.

Since the data plane is a logical concept, it can be implemented through a centralized layered architecture or as a logical function distributed across edge or deep edge nodes. Next, we will explore some key elements of the data plane.

2. Multiple roles in data governance

The data governance ecosystem includes two dimensions of roles: from data customers to data providers, and from data owners to data managers. Different roles can be assumed by different business entities. Therefore, data governance in 6G is a typical multi-party participation scenario, where data customers who use the data or knowledge provided by the 6G system and data providers of the 6G system may all participate.

6G can have its own data governance framework, or it can build a data governance framework with other industry participants based on its own domain knowledge. In other words, data governance frameworks may have different evolution or development routes. Therefore, it is very important to determine data rights between different business entities during the operation stage, and this problem can be solved with the help of decentralized technologies such as blockchain.

3. Data resources

Data resources are very rich, including structured data, unstructured data, pre-processed data, post-processed data, and raw data. Efficiently collecting data from wireless environments (such as user behavior data such as mobility and network status data) is a prerequisite for data governance. Then, intelligent methods can be used to analyze the data and transfer the knowledge derived from the data to internal and external customers. Therefore, it is necessary to understand the source of the data.

▲Figure 3 Main data source categories

Figure 3 shows some major categories of data sources in the 6G system.

  • Infrastructure: Infrastructure refers to the communication system, including various physical and virtual resources such as RAN, TN and CN, as well as computing resources such as cloud, edge and deep edge. The data generated within the infrastructure includes computing resource information, communication resource information (such as the status of a certain network function), perception information (such as perception information from RAN), and certain user information (such as mobility information, location and related context).
  • Operation Support System (OSS): This layer of data includes all OA&M-related data, such as physical equipment status, system operation information, and business issuance information.
  • Business Support System (BSS): This layer of data includes all data related to business logic, such as customer information, partnership management information, and more importantly, the subscription data of consumers and corporate customers, for which they should have full ownership and control.
  • Industry communication system: In 6G industry application scenarios, the collected data may also include industry-related OA&M data information, industry user information (such as traffic patterns and mobility data), and business/service data stored in the cloud. The ownership of such data should belong entirely to the industry customers.
  • Terminal: Data from the terminal side includes computing and communication resources, business usage profiles, perception knowledge, etc. The ownership of such data should fully belong to the terminal user.

4. Data Collection

In 6G, one of the main functions of data governance is to provide a suitable method to build data resources, which requires the support of appropriate architecture and network functions. The first step to build data resources is to collect data, which has the following key actions:

  • Establish agreements (such as data authorization) and secure connections with data sources.
  • Receive data collection requirements, determine the scope of collection, and determine the location, time and method of collection based on the requirements.
  • Tells the data source about the data properties.
  • Collect data from data sources and store them in a database.
  • Operate and maintain data in the database.

5. Data Analysis

Based on the management of data resources, it is possible to provide data analysis services for different types of customers. The following four types of data analysis services can be provided:

  • Descriptive analytics mines statistical information from historical data to provide network insights, such as network performance, traffic patterns, channel conditions, users, and more.
  • Diagnostic analysis can realize autonomous detection of network faults and service impairments, identify the root causes of network anomalies, and thus improve network reliability and security.
  • Predictive analytics uses data to predict future events, such as traffic patterns, user locations, user behaviors and preferences, resource availability, and even failures.
  • Recommended analytics provides recommendations for resource allocation, content display, and more based on predictive analytics.

The knowledge provided by the data plane comes from data analysis services, and the knowledge provided includes active knowledge (such as action recommendations) and passive knowledge (such as information sharing and customer action decisions).

Data analysis services can be based on customer needs and customized according to customer requirements. The data plane should open services and data in multiple dimensions on demand. Table 1 lists examples of the types of services that can be provided to customers. It is foreseeable that the actual customer types are richer than those listed in the table, and customers' needs and usage scenarios for data analysis are also different.

▼Table 1 Examples of multidimensional data services provided by the data plane

6. Data desensitization

Collecting and storing sensitive data involves privacy risks and requires privacy protection responsibilities. Data desensitization is an important step to respond to privacy concerns and achieve legal compliance, and is also particularly important for supporting AI and perception services in 6G design.

Especially for AI tasks, cross-domain designs need to be considered. Recently, there has been a lot of research on differential privacy in the field of AI, exploring how to anonymize the training data of a single device.

Data desensitization during model training and AI reasoning is essential in 6G design. Methods to achieve differential privacy include: adding noise to training data without affecting the statistical properties of the data, so that the training model can still capture the characteristics of the original data set; using encryption technology to enable machine learning based on encrypted (rather than decrypted) data. Another method is to have the device send model parameters instead of training data, such as federated learning and split learning.

There is a risk in this process. If an insider who has full control over the learning method has bad intentions, he can use the gradual convergence of the model to construct information similar to the training data. For example, in federated learning, information may be leaked to malicious devices.

Regardless of the learning method, data desensitization is an issue that needs to be considered. Therefore, under this premise, we need to think about how to deal with the differences between different learning methods and the limitations of the learning methods themselves.

About the author: Dr. Tong Wen is Huawei's wireless CTO, Huawei's 5G chief scientist, Huawei Fellow, IEEE Fellow, and Canadian Academy of Engineering. He has won the IEEE Communications Society's Outstanding Industry Leadership Award and the Fessenden Medal. Dr. Zhu Peiying is Huawei's senior vice president of wireless research, Huawei Fellow, IEEE Fellow, and Canadian Academy of Engineering.

This article is excerpted from "6G Wireless Communication New Journey: Beyond Human Connection, Internet of Things, and Towards Intelligent Connection of All Things", and is authorized by the publisher. (ISBN: 9787111688846)


<<:  The era of increased traffic is here! IPv6+ development will enter a new stage in 2022

>>:  Still don't understand router networking? You will understand after reading this article

Recommend

Novos: €8/month KVM-2GB/40G NVMe+1TB/25TB/Belgium

According to information from LEB, Novos.be is a ...

What is the difference between private and public LoRaWAN networks?

[[331465]] The low-power wide-area network (LPWAN...

How does 5G use spectrum? This article tells you everything!

As we all know, wireless spectrum resources are t...

Five signs SCVMM isn't right for your data center

Today, System Center Virtual Machine Manager (SCV...

TCP retransmission problem troubleshooting ideas and practices

1. About TCP retransmission TCP retransmission is...

Talk about the other side of 5G that you don’t know

At present, domestic policies mainly revolve arou...

The current status and future prospects of 5G in the IoT market

The global 5G in IoT market is experiencing signi...

5G is useless: Why do some people still say that 5G is useless?

The Ministry of Industry and Information Technolo...