In the ever-changing information age, companies that successfully obtain valuable information from data will maintain their unique competitiveness in the increasingly competitive market. For data-driven companies, they may have a more comprehensive understanding of the business and customers hidden in the massive data. At the same time, this is why intelligent virtualization technology is committed to eliminating data silos.
Are data lakes the way of the future? In the future, data will only become more diverse, dynamic, and distributed. Many companies try to collect all their data and make it accessible by throwing it all into a data lake, which can hold the data in its original format until it is needed for analysis. This approach is more or less convincing, and most companies can afford to hire data scientists to collect, translate and analyze various types of data in the data lake. The demand for instant data storage and retrieval has become increasingly strong! As companies race to collect and analyze as much data as possible, aiming to gain even the slightest competitive advantage over their peers, traditional data lakes cannot handle the new data sources that are emerging and the new on-premises databases that are being created. Queries must match the specific database you are using, so the more databases you have, the more query languages you need to use. Importantly, integrating disparate data in a data lake still requires manual processing to make it accessible and readable, which is very time-consuming for data engineers and data scientists. Data lakes lack flexibility and will no longer be relevant in a data-driven economy Therefore, many enterprises are turning to data virtualization to optimize their analytics and BI. BI and data are connecting all their data and making it readable and accessible from one place. Not all data virtualization is created equal. Data virtualization creates a software virtualization layer that integrates all data across the enterprise. Regardless of the format of the data or which silo, server or cloud it resides in, the data is translated into a common business language and can be accessed from a single portal. In theory, this gives the organization a shared data lake where all the different business units and business users can instantly access the data they need. Having rapid access enables the business to make data-driven decisions for shared purposes. However, many data virtualization solutions do not achieve the desired results for analytics. There are several key reasons for this: 1. Proprietary format Many data virtualization vendors merge and convert all data into a proprietary format. While the merge allows the data to be integrated into a single location for a single view, the vendor's proprietary format often reduces the data to a lowest common denominator state. The common denominator state may cause some data to be skewed, lose specific functionality, or even be lost during the conversion process. Some data may also require the context of its original database to be relevant. As a result, users may draw information from incorrect data and make counterproductive business decisions. 2. BI tools are not compatible BI tools are a sizeable investment for an enterprise. Most enterprise-level companies have several different types of BI tools in different departments. For example, one department might use Tableau, while another department might use Microsoft Power BI or Excel. For big data analytics to work in an enterprise, data must be discoverable and accessible to all users, regardless of the tools they prefer. Many vendors use proprietary data formats that may not interoperate with the technology a company has already invested in. Different tools use different query languages and display data in different ways. When inconsistently defined data is integrated, costly mistakes can occur during analysis. Selecting the right BI tool is critical to minimizing business disruption and maximizing user productivity. 3. Query restrictions As data continues to grow and technology advances rapidly, queries become increasingly complex, which is not ideal for analytical workloads and processing large-scale data. The more data sources you manage, the more data engineering you need to support fast, interactive queries. Distributed joins moving large amounts of data are not suitable for interactive queries. It puts unpredictable and unacceptable pressure on enterprise infrastructure, and simple data caching is not enough for dynamic query environments and today's data sizes. When BI and AI workloads are added to the mix, performance quickly degrades, prompting end users to seek other direct paths to data, negating the benefits of data virtualization. In addition to these scaling shortcomings, traditional virtualization products have performed poorly in addressing analytical use cases. Scaling large and complex data services requires a deep understanding of the details: statistics about the data, the databases involved, the load on shared resources, the use cases and intentions of the data consumers, security constraints, etc. A virtualization solution needs to provide users with a business-wide view of their data, including hierarchies, measures, dimensions, attributes, and time series. What should data virtualization provide? Most data virtualization solutions have not evolved at the same pace as today’s datasets and data science practices, and still rely on traditional data federation approaches and simple caching techniques. However, there are more next-generation intelligent data virtualization solutions designed for today’s complex and time-sensitive BI needs. If your data virtualization solution doesn’t offer the following capabilities, it’s not smart enough. 1. Autonomous Data Engineering Humans can never be perfect; fortunately, computers can. Given the complexity of modern data architectures, humans are simply powerless to solve this problem, at least not at the speeds needed to stay competitive today. That’s why data virtualization solutions need to provide autonomous data engineering. Autonomous data engineering can automatically infer optimized outcomes based on countless connections and calculations that are beyond the reach of the human brain. Machine learning (ML) is used to dissect all of a company’s data and examine how it is being queried and integrated into the data models being built by all users across the organization. Automating data engineering can potentially save a lot of money and resources while freeing up data engineers to perform more complex tasks that are more valuable to the organization. 2. Acceleration structure Intelligent data virtualization can also automatically place data into specific databases for optimal performance. There are many types of data and different formats are more suitable for this data. Intelligent data virtualization can automatically decide which platform to put data on based on where the best performance is generated. Different data platforms have different strengths. For example, if your data model and queries are processing time series data, then intelligent data virtualization will place an acceleration structure optimized for time series data in the database. This automatically learns which database has which strengths and then takes advantage of them. The variability of different database types can be turned into advantages. Acceleration structures can save a lot of cloud operating costs. Depending on the platform you are using, you may be charged for the storage size of the database, the number of queries you are running, the data you are moving in the queries, the number of rows in the question, the complexity of the queries, or other variables. For example, with Google BigQuery, the amount you pay is proportional to the size of your database and the complexity of your queries. When users automatically use acceleration structures for performance and cost optimization, they are charged only for the query data used in the accelerated aggregations, not for the size of the entire database. 3. Automatic data modeling Next-generation data virtualization not only provides transformation and access to data, but intelligent data virtualization also automatically learns the capabilities and limitations of each data platform. It automatically identifies what information is available and how to merge and integrate it with other data when building models. Intelligent data virtualization can reverse the data models and queries used to create legacy reports, so users can continue to use the same reports without having to rebuild the data models or queries. For example, if a user creates a TPS report in the old system, they can still retrieve it in the new system. Some queries may have been run on the old data, but they will still run on the new system without any rewrite. 4. Support self-service In recent years, many aspects of IT have become "democratized" -- that is, advances in technology (particularly cloud computing) have made them "popular," making them accessible to laypeople without a broad technical background. While analytics and business intelligence have lagged behind the democratization trend, BI tools are now becoming more accessible to the general public. The use of BI has led to the development of a new "self-service" analytical culture, in which business users can directly access and analyze data using their favorite BI tools without having to rely on data engineers or data analysts. Self-service analytics is quickly becoming a necessity for optimizing big data analysis in the enterprise. For example, suppose the sales department has data on spending from the previous year but wants to supplement it with data on customer behavior patterns across multiple areas. Or, the marketing department needs to launch an account-based marketing campaign targeting companies deemed most likely to switch suppliers. With self-service analytics, business users in sales or marketing can access this data and use the right tools to call upon it. Self-service analytics is used instead of relying on trained data engineers to get data for BI tools and data scientists to build models and predictions. With the dynamics of self-service, every department in the organization can apply its own experience and expertise to BI, enabling a whole new level of convenience. Intelligent data virtualization provides a business logic layer that actually converts all data into a common business language that is both source-independent and tool-independent. Having a logic layer means that business users can use whatever BI tools they like and don’t have to succumb to a single standard for BI software. All data is accessible, no matter what tool the user uses or how many tools they use, and all queries return consistent answers. Standard and logical interpretations empower the enterprise with shared data intelligence and a self-service culture that is becoming increasingly necessary in today’s data-driven business environment. 5. Security In the pursuit of data customization, security and compliance must not be sacrificed, regardless of convenience and cost-effectiveness. It is well known that the virtualization layer introduces security risks. However, with next-generation intelligent data virtualization, data inherits the security and governance policies of all databases. The transparent management process means that user permissions and policies remain unchanged. By tracking the origin and identity of the data, all existing security and privacy information is preserved for each user. Even when using multiple databases with different security policies, these policies can be seamlessly merged and automatically applied to global security and compliance protocols. After adopting intelligent data virtualization, no additional steps are required to ensure security and compliance. Data virtualization must evolve as the rest of IT evolves For enterprises, having customized data is as important as having readable, accessible and reliable data, but today, many companies are trapped in the quagmire of massive data. More and more distributed models are added to the data in dynamic and diverse formats and use cases. If users cannot quickly find and analyze the required data and be sure of their accuracy and latestness, the quality of BI will decline, resulting in suboptimal data-based decisions. Therefore, data virtualization needs to evolve to meet these new challenges and complexities so that it can truly be used for big data analytics. If a data virtualization solution does not provide autonomous data engineering, acceleration structures such as automatic data modeling, and self-service analytics, there is a problem. Users need worry-free security and compliance, or a multidimensional semantic layer of platform language. Without these processes, the data virtualization solution must not be smart enough. |
>>: One chart to understand: IPv6 from "access" to "open to traffic"
IPv6 is the abbreviation of Internet Protocol Ver...
[[419147]] Arguably, no network technology has re...
How far has 5G construction progressed? [[424068]...
Of course, 5G has been in the spotlight in 2020. ...
Author: Fan Deyang, unit: China Mobile Smart Home...
In the blink of an eye, the summer vacation is ha...
[51CTO.com original article] If you ask what are ...
In order to actively respond to the national stra...
Following personal computers and the Internet, cl...
[[384223]] This morning, the State Council Inform...
SoftShellWeb has released a special package for D...
1. Introduction to IAR ZigBee Wireless Network No...
Flash storage, hyperconverged infrastructure, Lin...
Continuing from the previous article "Easy...
ZJI released a promotional plan during Black Frid...