Architecture of Transwarp Data Hub

Transwarp Data Hub is consisted of Apache Hadoop and 5 Transwarp brand system software,the big data development tool set Transwarp Studio, plus several software which are underline services for security or management.

Transwarp Data Hub (TDH)
One-Stop Enterprise Big Data Platform

Big data technology is rapidly changing the IT industry, and Hadoop has been growing to the most popular technology since it was born. Transwarp Data Hub (shortened to TDH) has been one of the mainframe Hadoop distribution recognized by Gartner since 2016.It contains 5 major products to help customers to capture value from big data: Transwarp Inceptor for big data analytics, Transwarp Stream for real time computing, Transwarp Discover for getting value from data through machine learning, Transwarp Hyperbase for unstructured data processing, Transwarp Search for building enterprise search engine.

As a one-stop big data platform, TDH has many new cutting-edge technologies which revolutionarily improved the usability, performance and stability so that enterprise can build core business system and create new applications in a more cost-saving and effort-saving manner. Some key advantages include:

Major Technological Advantages Of TDH Product

Extreme Performance and ScalabilityBatch processing is 10x-100x faster than open source Hadoop, and 5x~10x faster than RDBMS. TDH supports complex analytic workload or query on big data from GB to PB.

Full SQL and ACID supportTranswarp Inceptor is the first Hadoop distribution that supports SQL 2003, Oracle PL/SQL, DB2 SQL PL from 2014, and ACID & CRUD functionality from 2015. TDH offers JDBC, and ODBC connectors and third party tools can run on TDH easily besides.

Low-latency Streaming ProcessingTranswarp Stream contains the first event driven streaming engine based on Spark, and the latency can be reduced to 5ms. HA and exactly-once functionality is also in production.

Strong Data Mining CapabilityTranswarp Discover offers machine learning capability for data analyst and scientist through R/Python.

Varieties of data processing capabilityTranswarp Hyperbase can be used to store and compute both structured and unstructured data, including logs, JSON/XML and binary data like images and videos.

Full text searching on big dataTranswarp Search providesfull text search on big data through SQL, and innovations are achieved to make the search on big data stable and secure to build an internal search engine.

Easy to operate and manageTranswarp Manager is the component to deploy clusters, and to guarantee cheap operation in the meantime. One-click installation has been supported since 2014, and Alert/Health checking features can reduce the overhead of managing clusters.

Unified Security & Multi-tenant ManagementTranswarp Guardian is the centralized service to do the security control and resource management, and it provides a fine granularity of tenant management to leverage the functionality of Docker.

Transwarp Brand Products

Product brand and Major features and usage scenarioe

Transwarp Inceptor

Inceptor is a SQL-on-Hadoop product for batch processing and analysis. It offers full SQL 2003 compatibilityand Oracle PL/SQL, DB2 SQL PL as well, which are unique features in Hadoop industry. ACID support is its another big advantage which ensures data processing consistent. Inceptor provides extreme performance for big data analytics (more than 10x faster than Apache Hadoop, >5x faster than RDBMS in many data size), and has much better performance on both TPC-DS and TPC-H benchmarks than other Hadoop and MPP products. Inceptor is widely used to build data warehouse and data mart. More than 500 customers are running applications on it in China.

Transwarp Stream

Stream is targeted for real time computing and widely used in transportation and IoT industry, It has several big technical advantages compared with other solutions: full SQL support to make easier development; unified event-driven and batch processing streaming engine based on Spark reducing the latency to 5 ms, which is 100x faster than Apache Spark Streaming engine; HA and exactly-once feature are supported which are essential for streaming computing.

Transwarp Discover

Discover is a data mining product which offers distributed machine learning platform and algorithms. It offers industry modules like Financial Anti-Fraud, Text Mining and etc. In addition, R, Python and SQL interfaces are also provided to data scientists to develop their data mining algorithms.

Transwarp Hyperbase

Based on Apache HBase, Hyperbase is developed by adding other innovative technologies: it leverages the same SQL engine from Inceptor so that developers can use SQL to build complex applications; global and secondary index are ready so that non row-key queries are fast; native JSON/BSON data format support and Object Store technology are realized to reduce the complexity of processing unstructured data; distributed transactions and full CRUD functionality are also offered.

Transwarp Search

Search helps users to build an internal search engine for big data. It offers low-latency data searching on data size in PB level; it provides SQL support and full text search syntax for programming; off-heap memory management technology is used to improve system robustness; hybrid storage is supported and hot data can be put on SSD to speed up the performance.

Platform components and Manageability product

Product And Major features

Apache Hadoop

Based on Apache Hadoop 2.7.2, it is equips with HDFS as file system and YARN as resource management platform. Transwarp also has some performance optimization and security/stability improvements on those components so that they are able to provide 7x24 hour service.

Transwarp Operating System

TOS is developed based on Docker and Kubernetes and shaped to be a Cloud Operating System specially serving for big data applications. In order to improve resources usage, TOS supports one-click app deployment, auto-scaling, and allows big data service to share resources over cluster. By applying preemptive scheduling model, which consequently makes full use of idle resources, TOS enables batch processing and real-time processing to run at the same time without interruption with each other.

Transwarp Manager

Manager is a graphical management tool for deploying and operating TDH clusters. Its deployment module can deploy a TDH cluster on x86 serversor docker-based cloud in several manual steps; the operating module contains Alert, Health Checker, Controller and Metrics services, and users will be informed of the status of all services clearly on the web pages, and then take proper actions once any alert presents. Many operating plug-in are also offered like disk management, software upgrading, service migration, and etc.

Transwarp Guardian

Guardian is a centralized security and resource management service in TDH. Its security module which supports both LDAP and Kerberos can make sure Hadoop clusters are immune to attacks and security threats.The multi-tenant resource management module controls all the user and resource data, with a graphical tool to set the user privileges and resources quotas. All TDH services rely on Guardian to provide SLA controlling.

Apache Kafka

Transwarp Kafka is based on Apache Kafka 1.0, and there are plenty of security related features added: Kerberos is employed to protect data; Producer and Consumer can use different KDC for cross authentication; Authentication command lines are added to simplize operation.

Developer tools

Component and Major features

Transwarp Transporter

Transporter is a visualized tool for designing and building ETL jobs for TDH. It offers near real-time data synchronize feature from RDBMS to TDH so that users can migrate the data analysis work from RDBMS to Hadoop for analytics and mining; various data sources including CSV, JDBC, XML, JSON, RDBMS are supported, and Transporter are equipped with frequently used transformers like joining, aggregation, data cleansing and etc.The above-mentioned features of Transporter helps users build ETL jobs on the we beasily. All the data processing jobs generated by Transporter are done in Inceptor with full ACID support, therefore there is no need to set up a standalone ETL clusters, neither do people need to worry about the data consistence.

Transwarp Rubik

Rubik is a visualized tool to design OLAP Cubes which is materialized in HDFS or Holodesk. Both snowflake and star model are allowed in designing models, and various data sources including HDFS and remote RDBMS can be used for cube building. With the acceleration of Cube, analytical query can be hastened by 10x and give data analysts better interactive experience.

Transwarp Governor

Governor is a metadata management and data governance tool in TDH. It helps users to manage metadata including tables and SQL procedures, monitors all the data and program changing history, analyzes data lineage, and performs impact analysis. Developers can use this tool to debug data problems and root cause issues, and also to analysis what will be impacted before any meta change occurs so to notice the stakeholders early. Governor aims to aid users improve the data quality for big data.

Transwarp Waterdrop

Waterdrop is a developer IDE for TDH. It contains SQL editor, meta explorer, SQL execution and data import/export sub-modules. Grammar checking, SQL formatting and development assistant are developed as useful features to improve the development efficiency.

Target Markets

Why use TDH?

SQL 2003, PL/SQL & SQL PL, and SQL extension support

TDH is the first SQL-on-Hadoop engine in the industry to provide full SQL 2003 support and Oracle PL/SQL in 2014, and began to support DB2 SQL PL in 2015. With this functionality, the traditional applications built on Oracle or DB2 can be conveniently migrated to TDH to leverage the horsepower of big data. In order to fit database difference better, TDH allows to set database dialects and Oracle/DB2/Teradata dialects have been supported well so far.

In order to reduce the difficulty to develop stream applications, TDH provides Stream SQL standard which includes SQL 99 plus stream extensions. With TDH , developers is able to use SQL instead of various APIs to write programs, and don’t need to do packaging or deployment work, and never need to worry about the API changes in the future. For the full text search, TDH delivers SQL extension for Search which is compatible with Oracle spec. TDH provides full JDBC and ODBC compatibility, thus is compatible with all software and tools running on RDBMS.

Unified event-driven and batch processing engine for streaming computation

Transwarp Stream is the first spark-based event driven streaming engine. It contains a micro batching and an event driven engine thus can fit in different usage scenario. In micro batching mode, data are processed in micro batches and SQL program is executed for the batch.

Transwarp Stream provides very high throughput which makes it very suitable for some industry like video detections in Transportation. When using the event driven engine, any data event will trigger computing and the latency for data processing can be as low as 5 milliseconds, as a result, users can build applications sensitive to latency like online anti-fraud.

The high availability and exactly-once are supported in both cases, so developers don’t need to write the logic to handle data or repeating. Machine learning applications can also run on the event driven streaming engine with low latency.

Superior Performance

Transwarp Inceptor has superior performance and scalability for big data analysis. It has a well optimized distributed execution engine which is scaling with cluster size, and the data shuffling and broadcasting logic are well tuned for better performance. Inceptor Holodesk is a columnar store which can be accelerated by SSD or memory, so the data reading is extremely fast by avoiding IO impacts. The cost based optimizer and rule based optimizer are developed to generate the best execution plan to the engine. All these features make the batching processing efficientand scalable, which empowers Inceptor to pass the TPC-DS benchmark test from small size to 100 TB with a very good scaling result.

Inceptor also fits the interactive data analysis and OLAP scenarios well. We provide index support for Holodesk and this feature helps a lot in the interactive analysis scenario. For the fixed pattern data reports business, OLAP Cube technique can improve the analysis performance by 10x ~ 100x. For the TPC-H benchmark, Inceptor runs at top100 times faster than SparkSQL and Greenplum at the 1TB data size scale with OLAP Cube.

User friendly for Development

The technical barrier for big data is quite high, including cluster deployment, operation and application developing. Developers need to learn the underline architecture well before using it, and this is a very scalability issue for software engineering. Compared with other Hadoop Vendors, TDH is much more user friendly for development and management.

TDH supports SQL which is the first choice for developer for OLAP, batching processing, Streaming, Search and other kind of applications. Consequentially developers neither need to learn the details of the target engine before coding nor need to know APIs well or maintain applications when API changes. Transactional support gets developers free from writing complex logic in their applications for various transactional exception handling, so to improves their efficiency a lot. In general, TDH creates high developer efficiency while remains low technical entry,which is a key advantage towards a successful big data applications.

ACID/Transaction support

ACID is important for data processing and cleansing for database, and it is also true for big data. Without ACID, the end users need to reason about what kinds of failure may occur during the operation and work out the solutions to the failures, which would make the user application complex and even not feasible. Worse more,data would go wrong when 2 applications writing to the same block if ACID attributes is not present.

TDH is the first Hadoop business product that offers good ACID support. Transwarp Inceptor offers serializable isolation, and the use consistency is assured by 2-PL locking and MVCC protocol. From the performance wise, we optimized the performance for analytical workload so as to minimize the transactional cost.

Rich Data Mining Framework

Transwarp Discover offers R interfaces for data mining, and implements more than 60 distributed algorithms for end users. Meanwhile Discover delivers several industry models including anti-fraud, text mining for financial industry and etc; these features accelerate machine learning adoption to end users.

Unified Security& Multi-tenant Management Capability

Hadoop security is a big challenge in real world productions, and there has been quiet a few hacker attacks against Hadoop since the beginning of 2017, and some HDFS invasion accidents are also reported tohappen in US earlier this year. Better authentication and authorization services are required to make Hadoop safe, so we offer a unified security and resource management service named Guardian to protect TDH. All services can use Kerberos to encrypt data and LDAP for identity authentication, and the tenant resources are managed in Guardian which also offers fine granularity accessing control to HDFS and all database objects.

SQL on Search Engine

Transwarp Search can be used to build search engine for an enterprise, and it offers SQL for building application on top of it, and full text SQL extension is supported as well. The SQL engine has many optimizations for data searching like aggregation pushing down to search for better performance. Comparing to API programming, SQL programming not only has better performance, but also offers better compatibility since API changes will never be the obstacle to block developers.

User friendly for Development

The technical barrier for big data is quite high, including cluster deployment, operation and application developing. Developers need to learn the underline architecture well before using it, and this is a very scalability issue for software engineering. Compared with other Hadoop Vendors, TDH is much more user friendly for development and management.


Easy for Management and Cloud

Transwarp Manager is a one-stop solution for cluster deployment, operating and alert, with which even some hardware disks management can be done in several clicks. It also has a metrics collecting and display framework which is especially useful in cluster performance management.All components of TDH are well optimized for Docker, and the computing engines can also use Kubernetes for resource management, so TDH can be deployed on public or private cloud at very low cost. We also leverage Docker and Kubernetes for better multi-tenant management through better resource isolation and QoS support in resource scheduling.

About Price