Overview Technical Advantages
Transwarp Data Hub (TDH)

Apache Hadoop is an opensource software framework developed for distributed storage and distributed processing of Big Data. It has become the pillar in enterprise Big Data management. However, the opensource Hadoop faces several challenges. Although Apache Hadoop has far better performance than traditional data processing technology when dealing with data over 100 terabytes, it is not efficient for data sets with sizes between gigabytes and terabytes. Only when data is thoroughly analyzed can information hidden in data be transformed into real business values. For this end, Apache Hadoop needs to be compatible with decision analysis tools. Finally, the market needs comprehensive enterprise solutions to accelerate widespread deployment of Big Data applications.

Learning from challenges faced by Hadoop and keeping in mind enterprises' needs for applications, Transwarp has been developing a series of technologies based on Hadoop to form the Transwarp Data Hub platform suitable for enterprise applications we see today. Efforts made by Transwarp can adapt Hadoop to needs from all kinds of enterprises.

Superior Execution Speed

Transwarp Inceptor uses its own high-efficiency columnar storage format and Apache Spark computation engine optimized for in-memory computation, avoiding frequent readings from/writings to harddrives practiced by Map/Reduce framework. Morever, Spark uses a light-weight dispatch framework and a multithreading computation model. As a result, compared to Map/Reduce, Spark has very little dispatch and start-up overhead, higher execution speed and shorter mean time to restoration (MTTR). With respect to real-time online applications, Transwarp Hyperbase has constructed a global index, an auxiliary index and a full-text index. It extends from SQL syntax and can satisfy needs for low latency from online storage and OLAP. With optimizations made on the execution engine and data storage layer, Transwarp Data Hub has overtaken the opensource Apache Hadoop 2.5. TDH has much stronger performance and much more comprehensive SQL support compared to Cloudera Impala. TDH is faster than mainstream MPP databases by 1.5 to 10 times.

Comprehensive SQL Support

Currently, Transwarp Data Hub supports SQL2003 and is in the course of realizing more complex PL/SQL syntax, including fucntionalities such as storage process, functions and cursors. TDH also supports and extends the complete HiveQL syntax and has optimized large portions of the execution plans. Most data warehouse and data market applications use complex SQL2003 syntax. Without support of SQL syntax needed, it is impossible to migrate applications from traditional databases to Hadoop. Therefore, more comprehensive support for SQL is even more important than performancece of particular functions. TDH not only manages much larger volumns of data, but also makes it effortless for users to migrate data analysis applications from traditional databases.

Superior Data Analysis Ability

It is more and more important to put Big Data in the hands of data analysts by allowing them to interatively explore data, gain insight and discover trends. TDH uses distributed columnar in-memory storage and optimized high-speed execution engine to support interactive SQL queries, making it possible to do real-time interactive analysis. TDH also supports R statistics engine. In the new version, apart from supporting accessing data in HDFS and Hyperbase via R, TDH also supports accessing data stored in Inceptor distributed RAM. Inceptor also has parallel realization of common machine learning algorithms built-in, which can be used together with thousands of algorithms in the R language. The new version of TDH also supports accessing data in TDH via both the R command line tool and R Studio GUI, making TDH the best tool in the fields of data mining of Big Data and visualizable applications. TDH has highly-optimized graph algorithms of its own, which can analyze graph data such as assoiation and relation networks. In addition, TDH integrates Mahout, a machine learning library that contains machine learning algorithms including clustering, classification, frequency association and recommendation systems.

Integration Into the Data Analysis Ecosystem

Transwarp Data Hub highly values integration with the data analysis Ecosystem to improve its usability. TDH seamlessly combines with curret mature systems with expertise in data retrieval, data analysis and data visualization. Data stored in traditional relational databases can b directly used as data sources in TDH clusters for analysis. TDH currently has support for Oracle, DB2 and MySQL databases. On ther other hand, integration of the data analysis layer and the R language utilizes thousands of statistical algorithms in R as well as graphing tools in R for creating professinal statistical reports. Data visualization not only demonstrates final results of analysis to users, it also helps data analysts to explore data to discover and solve new problems. TDH supports many visualization tools and report generating tools, including Tableau, SAP Business Objects, Orable OBIEE, etc, making it easier to comprehend and accept business decisions made upon Big Data analysis, hence optimizing potential values of Big Data. Although some other tools also support Apache Hadoop, only TDH with its superior performance can make interative exploration of Big Data a reality.

Complete Enterprise Solutions

By providing full support of data storage, distributed computation, data analysis and data mining, Transwarp Data Hub has solved many problems encountered by enterprises while analyzing data sets between gigabytes to petabytes. As an enterprise solution, manageability is a significant advantage of TDH. User-friendly management interface provides support for system installation, system and cluster configuration, monitoring, warning and many other repsects. TDH can fast recover from malfunctions. As its low level storage system supporting technology, HDFS 2.5 ensures persistence and redundancy of data. HDFS 2.5 has the ability to automatially examine and correct mistakes in data. All services based on HDFS are optimized for HA functionalities of HDFS 2.5 to ensure high useability of the entire Big Data system. With repect to security, TDH integrates Kerberos/LDAP to support fine-grained control for data access, application security, encoding and decoding of data, etc.