This article is reprinted from the WeChat public account "Big Data DT", the author is Tong Wen and Zhu Peiying. Please contact the Big Data DT public account to reprint this article. The scope of data use is different, and the economic and technical connotations of data governance itself are also different. Data governance refers to the management, maintenance and in-depth development of data through relevant processes and technologies to obtain high-quality data that can serve as a key asset for the organization. Each mobile network operator (MNO) isolates and stores the data generated in the mobile communication system according to technical domains, including the Radio Access Network (RAN), Core Network (CN), Transport Network (TN), and Operation, Administration, and Maintenance (OA&M). The data owned by different network elements and different participants is not open and transparent, and the resulting data islands are the main bottleneck in data collection and sharing. On the other hand, large OTT (Over-The-Top) business companies have accumulated expertise in data governance and monetization strategies (such as data storage, analytical services, API interfaces) that is far ahead of companies in the telecommunications field. The data governance solution of the 6G system will provide strong support for AI and perception services, and will give rise to new business methods and system characteristics. 1. Design points and principlesThe scope of data governance goes far beyond traditional data collection and storage. In general, system design needs to consider four aspects, as shown in Figure 1. Figure 1 Key points of data governance design 1. Data availability and qualityData availability and quality are one of the biggest challenges to the application of AI in various industries. Improving data availability means that data cannot come from just a single system or a single field, but needs to come from multiple systems and different fields at the same time. This raises a fundamental question: how to break the physical boundaries (between multiple manufacturers, multiple operators, and multiple industries) and allow data to enter the heterogeneous data ocean? Once the originally scattered and isolated data are collected and used, another question arises: how to improve the quality of the data? The acquisition of massive data does not mean that the acquired data is usable and high-quality. At the same time, while considering reducing the computational complexity and energy consumption of data processing, it is also necessary to improve the efficiency of data processing. 2. Data sovereigntyWith the full digital transformation of society, the importance of data sovereignty, data security and privacy has never been more prominent. Many countries have formulated laws and regulations on privacy protection. Service providers are also constantly updating their privacy protection plans, and governments of major countries are also formulating or have issued regulations related to data management. For example, the General Data Protection Regulation (GDPR) promulgated by the European Union in 2018 regulates the use of data at the EU level. In 2019, China promulgated the Data Security Management Measures, which together with the Cybersecurity Law promulgated in 2016 constitute the Chinese version of GDPR. The United States is also implementing privacy-related laws, such as California's Consumer Privacy Act, which officially came into effect in January 2020. How to fully tap the intrinsic value of data, provide precise support for various businesses, while taking into account privacy protection and respecting data sovereignty has become a hot topic in recent years. The design of 6G systems should take into account regulatory uncertainties, especially the uncertainties caused by regulatory differences between different regions. 3. Knowledge ManagementGenerally speaking, knowledge can be regarded as processed data with specific uses or values, which can be directly used by physical or virtual entities in different technical and business fields. Knowledge management includes the generation, updating and opening of knowledge. As for the generation and updating of knowledge, we need to carefully check the source and quality of data and take measures to intercept low-quality and harmful data generated by unreliable or even malicious data sources. Opening knowledge as a capability requires a suitable platform and interface design. 4. Legal issuesA variety of sensors and other technologies can generate data in real time, making data collection and use increasingly complex and sensitive. The increase in data generation capabilities not only provides new data streams and content types, but also raises policy and legal concerns about data abuse: malicious institutions or governments may use these capabilities to achieve social control. At the same time, new technological capabilities also make it difficult for ordinary people to distinguish the authenticity of technical content. For example, it is difficult for ordinary people to distinguish between a real video and a "deep fake" video. There is a fragile balance between protecting the social benefits of technology and preventing its capabilities from being used to exercise social control and deprive people of their freedom. How to protect this balance becomes increasingly important. In order to identify fraudulent behavior and prevent the abuse of advanced technologies, stricter legal and policy measures are needed. 2. Architecture FeaturesAn independent data plane is a key feature in the design of a data governance system (as shown in Figure 2). It will provide 6G systems with common data-related capabilities, thereby providing transparency, efficiency, inherent security, and privacy protection for internal and external functions of the 6G system. The following will introduce the basic concepts and related network functions and services. ▲Figure 2 Independent data plane realizes complete data governance 1. Independent data planeThe independent data plane is designed to implement the data governance solution of the 6G system. The data it processes comes from different business entities. Regardless of where the data comes from, the entire life cycle of the data is processed on this plane, including data generation and collection, data processing and analysis, and data service issuance. Therefore, an independent data plane can provide data services to external business entities (such as vertical industries such as automobiles, manufacturing, and medical care), and can also provide network automation and optimization services for the 6G system itself (such as the control plane, user plane, and management plane). Configuration, status, and logs related to network operation, as well as user personal data, sensor data, and data provided by other parties are all collected. The collected data will form a rich data resource that can be organized in a distributed form. In order to prevent problems caused by directly using raw data for applications such as AI and perception, raw data usually needs to be preprocessed before being used (such as anonymization, data format reshaping, denoising, conversion, feature extraction, etc.). To ensure data integrity and process compliance, policies involved in data processing (such as geographic restrictions, national or regional privacy regulations, etc.), whether or not they come from the regulatory level, must be followed by default. When passing data to the data plane, the data use rights and obligations agreed in the data contract must also be followed. Data desensitization is the key to protecting privacy, and the data plane needs to provide this service. All the above services provided by the data plane are operated and managed by a self-contained OA&M system. Another important function of the data plane is to generate knowledge based on data collection, processing and orchestration. In order to coordinate the processing and transmission of data from different data sources, the production of knowledge also needs to be carried out according to contractual requirements. As new data sources, data models, and data themes are noticed and used by data customers, the data governance framework can continue to evolve and be enriched. Therefore, the operational management of the data governance framework and the real-time development of the framework can be parallel. Since the data plane is a logical concept, it can be implemented through a centralized layered architecture or as a logical function distributed across edge or deep edge nodes. Next, we will explore some key elements of the data plane. 2. Multiple roles in data governanceThe data governance ecosystem includes two dimensions of roles: from data customers to data providers, and from data owners to data managers. Different roles can be assumed by different business entities. Therefore, data governance in 6G is a typical multi-party participation scenario, where data customers who use the data or knowledge provided by the 6G system and data providers of the 6G system may all participate. 6G can have its own data governance framework, or it can build a data governance framework with other industry participants based on its own domain knowledge. In other words, data governance frameworks may have different evolution or development routes. Therefore, it is very important to determine data rights between different business entities during the operation stage, and this problem can be solved with the help of decentralized technologies such as blockchain. 3. Data resourcesData resources are very rich, including structured data, unstructured data, pre-processed data, post-processed data, and raw data. Efficiently collecting data from wireless environments (such as user behavior data such as mobility and network status data) is a prerequisite for data governance. Then, intelligent methods can be used to analyze the data and transfer the knowledge derived from the data to internal and external customers. Therefore, it is necessary to understand the source of the data. ▲Figure 3 Main data source categories Figure 3 shows some major categories of data sources in the 6G system.
4. Data CollectionIn 6G, one of the main functions of data governance is to provide a suitable method to build data resources, which requires the support of appropriate architecture and network functions. The first step to build data resources is to collect data, which has the following key actions:
5. Data AnalysisBased on the management of data resources, it is possible to provide data analysis services for different types of customers. The following four types of data analysis services can be provided:
The knowledge provided by the data plane comes from data analysis services, and the knowledge provided includes active knowledge (such as action recommendations) and passive knowledge (such as information sharing and customer action decisions). Data analysis services can be based on customer needs and customized according to customer requirements. The data plane should open services and data in multiple dimensions on demand. Table 1 lists examples of the types of services that can be provided to customers. It is foreseeable that the actual customer types are richer than those listed in the table, and customers' needs and usage scenarios for data analysis are also different. ▼Table 1 Examples of multidimensional data services provided by the data plane 6. Data desensitizationCollecting and storing sensitive data involves privacy risks and requires privacy protection responsibilities. Data desensitization is an important step to respond to privacy concerns and achieve legal compliance, and is also particularly important for supporting AI and perception services in 6G design. Especially for AI tasks, cross-domain designs need to be considered. Recently, there has been a lot of research on differential privacy in the field of AI, exploring how to anonymize the training data of a single device. Data desensitization during model training and AI reasoning is essential in 6G design. Methods to achieve differential privacy include: adding noise to training data without affecting the statistical properties of the data, so that the training model can still capture the characteristics of the original data set; using encryption technology to enable machine learning based on encrypted (rather than decrypted) data. Another method is to have the device send model parameters instead of training data, such as federated learning and split learning. There is a risk in this process. If an insider who has full control over the learning method has bad intentions, he can use the gradual convergence of the model to construct information similar to the training data. For example, in federated learning, information may be leaked to malicious devices. Regardless of the learning method, data desensitization is an issue that needs to be considered. Therefore, under this premise, we need to think about how to deal with the differences between different learning methods and the limitations of the learning methods themselves. About the author: Dr. Tong Wen is Huawei's wireless CTO, Huawei's 5G chief scientist, Huawei Fellow, IEEE Fellow, and Canadian Academy of Engineering. He has won the IEEE Communications Society's Outstanding Industry Leadership Award and the Fessenden Medal. Dr. Zhu Peiying is Huawei's senior vice president of wireless research, Huawei Fellow, IEEE Fellow, and Canadian Academy of Engineering. This article is excerpted from "6G Wireless Communication New Journey: Beyond Human Connection, Internet of Things, and Towards Intelligent Connection of All Things", and is authorized by the publisher. (ISBN: 9787111688846) |
<<: The era of increased traffic is here! IPv6+ development will enter a new stage in 2022
>>: Still don't understand router networking? You will understand after reading this article
edgeNAT has just released a promotional event dur...
According to information from LEB, Novos.be is a ...
On May 13, according to IDC's Global Semi-ann...
[[331465]] The low-power wide-area network (LPWAN...
As we all know, wireless spectrum resources are t...
The high-speed 3GPP 5G standard work may encounte...
Take a simple topology: In this topology, G0/0/1 ...
Today, System Center Virtual Machine Manager (SCV...
OneTechCloud has launched a regular promotion aft...
Hostodo's Black Friday promotion this year is...
1. About TCP retransmission TCP retransmission is...
At present, domestic policies mainly revolve arou...
Recently, Acumen Research and Consulting, a globa...
The global 5G in IoT market is experiencing signi...
The Ministry of Industry and Information Technolo...