Integration of Big Data + Cloud + Artificial Intelligence
Big data and cloud computing technologies have entered their second decade of rapid development. With the rise and explosion of artificial intelligence technologies, more and more enterprises tend to combine these three growing technologies to create intelligent big data cloud platform of the next generation. This big data platform, with the construction and integration among applications, data assets and AI models, is committed to build a state-of-the-art intelligent data service infrastructure, in order to achieve the upgrade and renovation of information architecture.
At present, most of the top Fortune 50 enterprises have already announced their cloud plans. A large number of startups have built their fundamental IT architecture on cloud and develop cloud native applications. At the same time, worldwidely known public cloud service providers have conducted in-depth exploration into intelligent scenarios. Regarding big data and AI as the arms for the coming business model revolution, industries use AI to improve the way to process data and take cloud as the carrier.
As a leading big data technology company, Transwarp keeps constant research and development on cloud computing during the on-going course of data cloudization. We leverage our strength in big data technology, while using cloud technologies to containerize our mature, sophisticated big data services, which leads to the release of our latest modern big data platform -Transwarp Data Cloud. It is a successful integration of big data + cloud + artificial intelligence, which enterprises can immigrate business to and then use big data, AI to maximize the value extracted out of data.
Data Services Evolution Route for Large Enterprises
With the advent of DT, IT companies like Google, Facebook and Amazon are in the process of converting to DT giants. These companies rapidly achieve business dataization, data assetization and operation datazation to accelerate the exploration of business value by the virtue of their advantages in big data technology, cloud computing and artificial intelligence. They have acquired huge business success while leading the technology trend. These large enterprises evolve step by step along the routes shown in the figure below, and have completed the conversion within a few years.
Data Unification
During this primitive phase, enterprises build up a flexible technology platform to support data set of huge size, large dimensions and diverse types. On this platform, data is integrated and unified, which involves building a unified computing output platform and unifying metadata management.
Data Assetization
After the unification, it comes to the phase of data assetization. To assetize data, enterprises need to carry out data analysis to ensure the quality and effectiveness of data. As more high-quality data is accumulated, more developers are attracted to the platform to develop their business, and thus generate more valuable data, consequently stimulating enterprises to establish data assets. The process of data assetization consists of mapping data to business dictionaries, standardizing data management process and so forth, so as to convert the original data into valuable asset.
Data Business
With the achievement of the previous two phases, enterprises now have powerful computing ability and abundant data assets, and it is time to start data business development. Presently, typical data services that generate tremendous value exist in industries tightly close to data such as digital operation, intelligent application and online data services, where data plays an essential role in dominating the profits. These business effectively combine big data and artificial intelligence technologies, therefore data values can be quickly discovered from massive data set.
Data Ecosystem
At this stage, as the enterprise has created a unified platform for data, computing and business, more developers can self-servingly develop business on it, while new businesses are generating new data and assets, attracting new developers to construct business, which creates a healthy data business closed-loop. Data, business and developers form a positive feedback, shaping a complete data ecosystem.
Although not all enterprises may strictly follow the four stages above in the process of business evolution, and there can be some overlap and iteration among each stages. However, with the rapid development of big data, cloud and artificial intelligence technology, the technological evolution happening in these four stages will become more mature in terms of both technology and business, and more compatible for the business strategies of enterprises.
Transwarp Data Cloud Introduce
In order to help enterprises complete the evolution of the data business, Transwarp makes use of strength in big data, container cloud platform and artificial intelligence that has respectively achieved in big data platform TDH, cloud operating system TOS, artificial intelligence platform Sophon, to develop a new generation intelligent big data cloud platform - Transwarp Data Cloud (referred to as TDC).
TDC can provide services in the form of private cloud, public cloud, or mixed cloud. In the field of private cloud, TDC can be deployed within large enterprises, offering customized implementations, centralized resources, data analysis services for various business units and branches, so as to meet the needs of enterprises for data analysis clouds. For the domain of public cloud, TDC provides a variety of internal products and services with which customers can quickly construct clusters and create applications to launch big data business shortly afterwards. TDC can also be used in a mixed cloud environment. For any enterprises providing external services that has deployed on private cloud, TDC will switch the access to public cloud IaaS services when the number of requirements dramatically increases, to improve customer responsiveness.
Native Cloud Platform
The bottom layer is the native Transwarp Operating System (referred to as TOS), which supports comprehensive implementation for big data application deployment and resource management in the cloud.
Native Cloud Platform
The second layer is the universal service management framework, which provides the infrastructure including cloud services such as security protection, multi-tenancy management, micro-service management and billing system.
Application Layer
The third layer is the service application layer that contains all the components of TDH and Sophon, and provides big data and AI services in the form of cloud products.
Management Platform
Upper level is a unified user management portal, enabling tenants and users to take an extremely short time to deploy big data products as well as set project management and permission.
Empower Data Ecosystem
TDC coordinates with three advanced data service frameworks, Data Asset Directory, AI Model Factory and Application Service Governance, to accelerate data business innovation and empower the formation of the data ecosystem within the system.
AI Model Factory
Modelized data analysis (capsulized AI models as API services for online visiting) is becoming a trend. TDC packs up constructed machine learning and artificial intelligence models into services for cloud users to install and use on the platform. With AI model factory, you can easily create and share models within the enterprise, thank to which machine learning and artificial intelligence can become universal, all-inclusive, to maximize the value of business.
Data Governance
Various applications and corresponding management services are released to customers so that users can develop, test and deploy applications on the same platform, which can produce value from data assets, and effective data services will furthermore create new data assets.
Asset Service
It provides a variety of valuable functions related to data assets that are rarely achieved on the market, and analyzes data automatically through machine learning, as well as offers data table management to assist enter prises in data normalization and assetization. With data asset service, the process to manage data asset can be easy and straightforward, which plays as a prerequisite for value exploitation.
Advantages of TDC PaaS
As TDC is designed to be a PaaS (Platform as a Service), it fully involves three core technologies from cloud computing: virtualization, data center and multi-tenancy, when implementing big data technology. As a result, TDC inherits the ability of auto-scaling, auto-healing, serving multiple tenancies from common cloud platform. The followings are TDC's characteristic strengths:
Low Cost
No duplicated construction of infrastructure; cluster can be destroyed as soon as it is not needed; utilize computing and storage resources with high efficiency.
Ease of Use
Install and manage big data application components in the unit of big data services which can automatically establish logical dependencies with each other. Besides, on the operation platform, TDC provides unified, secure maintenance tools to reduce maintenance costs.
Launch or stop services on demand, charged by byte, and finish big data services deployment in 10 minutes. Expand and shrink capacity on demand; resources are supplied on demand; batch processing and real-time tasks can share the cluster.
It provides almost all graphical services occurring in the lifecycle of big data development, such as ETL, data warehouse, reporting, search, data mining and database.
TDC is integrated with the innovations we created during the development of big data products for years, so the stability and reliability of each cloud products is proven and guaranteed. Moreover, the performance of big data products has been officially tested and approved by TPCx-HS, TPC-DS and TPC-H with high appraisal.
Distinguished multi-tenancy model and security control; accurate permission management fine-grained to data unit, to enhance data security and resource utilization.
Big Data Cloud Products
Data Warehouse
It is used to construct one-stop data warehouse services, providing a full set of data warehousing services and tools such as data integration, processing and analysis, with the purpose of building the core of data.
ETL, batch processing, data lake, data warehouse
Data Mart
It serves for departmental-level data analysis businesses. Interactive analysis, OLAP Cube engine, reporting tools and scheduling tools are developed to achieve the construction of automated reporting applications.
self-serving interactive reporting business, operating dashboards
Real-time Computing
It is a stream processing platform on cloud that experts at real-time data collection and processing. It aims to help enterprises build real-time data warehouse and applications, fully discovering the value of data streaming .
real-time data analysis, online anti-fraud, sensor network analysis
Search Engine
Support PB-level high-speed full-text services, and it provides several functions such as high concurrency support, hot / cold data isolation, accurate / fuzzy retrieval and quick statistics.
industry search engine, knowledge sharing platform, information retrieval service
Data Analysis Platform
This platform endows data scientists with rich data mining capabilities, meanwhile, more than 60 kinds of distributed machine learning algorithms and industry models are embedded, facilitating the conversion from data to valuable information.
data modeling and mining, user portraits, predictive analysis data
Analysis Data
It is a data processing platform used for deep learning and artificial intelligence development, which is designed to help enterprises develop deep learning and AI applications so as to achieve highly intelligent information processing.
image and video recognition, data mining , graphical modeling and feature engineering
Relational Database
It is used to build relational database service for enterprises, which devotes to OLTP business processing data less than 500GB. It supports complex SQL queries, provides highly stable, scalable, and consistent data processing services.
online trading system
Multi-tenancy Management
TDC management platform focuses on the projects, tenants and users, achieving reasonable division and management to permissions and resources, and multiple tenancy are provided with services by a unified management platform. Tenant administrator has the highest tenant management permission, who is responsible for permissions management. Product instances are managed and organized in each project as a unit, so as to achieve clear and rational permissions management division granularity.
Due to adopting the container technology, TDC features application isolation, data isolation, resource isolation, and operational isolation among multi-tenancy. Tenants running on the same platform are completely transparent to each other, which means that they are like independent groups operating on different infrastructures.
In addition, for enterprise private cloud, TDC can provide unified data management for internal tenants, commonly shared data is placed in the public data area, high-value data and sensitive data are stored in the sensitive data area, thus ensuring unified enterprise metadata management and data quality control. TDC can maintain unified data life-cycle management, and promote data assetization.
Accurate Billing Functionality
TDC adopts dependable and robust structure to do the billing job. Its architecture is highly available and scalable, able to conduct near real-time data calculations.
TDC platform brings up a reasonable and clear billing rules for users. Pricing items are categorized into four types, which are hardware resource, big data software, data service and third-party application. At the same time, a variety of billing units are flexibly supported so as to form fair and convenient billing model.
In addition, TDC provides powerful fee management operations for platform users, including detailed tenancy bill, unified operation analysis reports, account write-offs, tenant quota settings, charge items, price settings, and discount rules customization, to release full control of finance management to platform managers.
Unified Graphical Maintenance Monitor
The log management service Milano has been deployed inside TDC to provide a unified integrated log management system for tenants and platforms, including the following six functions.
TDC's maintenance management system has distinguished performance in throughput, security and usability, to provide high quality maintenance services for cloud platform.
High Throughput
The throughput can be as high as several TB bytes; the number of collected logs can be tens of thousands per second in a single node; a 3-node cluster can collect up to 2 billion logs in one day.
Full-chain Security
Log data is encrypted by Kerberos. Logs of different tenants are isolated from each other. Users needs to be securely authenticated when accessing to logs.
High Availability
Status monitoring is performed within the system to ensure that the data is highly available and safely stored.
TDC-based Data Businesses Solutions
TDC can provide targeted services for enterprises of different sizes, business types and operation modes in the form of public cloud, private cloud and data service cloud, to meet the diverse needs of big data cloud platforms.