One-Stop Data Storage Platform
With its in-memory computing scheme, high-efficiency indexing, optimized execution strategy and high fault-tolerance, TDH has become a platform capable of handling data sizes on the orders of gigabytes to petabytes, and for every order of data size magnitude, it delivers better performance than current technologies. Businesses no longer need hybrid architectures and multiple independent clusters. TDH can grow with the data sizes of its clients by dynamic, in-operation expansion, avoiding problems in migration from MPP and traditional architectures.
One-Stop Resource Management Platform
TDH sets up a resource management layer on a central storage system to provide clients central computation resource management, dynaminc resource allocation, interdepartmental resource distribution and dynamic sharing.
One-Stop Data Analysis Platform
TDH supports batch processing statistical analysis, interactive SQL analysis, online data search, data mining with R, machine learning, real-time streaming processing, full text search and graph computing. Its versatile computation abilities make it possible to complete various complex tasks without switching platforms or architectures.
One-Stop Management Platform
As an enterprise solution with remarkable manageability, TDH provides a highly user-friendly managment interface as well as support for system installation, cluster configuration, control for secure access, monitoring and warning.
Transwarp's Distribution for Apache Hadoop
Transwarp's Distribution for Apache Hadoop for Enterprise has a five-layer architecture. Different applications provide customized support via flexible combination and efficient coordination among components.
Data Storage Layer: Based on HDFS 2.5, supports Erasure Code.
Resource Management Layer: Based on YARN, supports multiple computation frameworks running simultaneously.
Computing Enging Layer: Uses Map/Reduce2 for offline computation tasks.
Data Analysis and Mining Layer: Supports SQL processing in batches, the R language and Mahout.
Data Integration Layer: Uses Sqoop and Flume for data migration and collection.
Transwarp Inceptor in-memory analysis engine provides high-speed interactive SQL queries.
Higher performance: 10 to 100 times faster than Apache Hadoop, 2 to 10 times faster than MPP.
More comprehensive SQL support: Compatible with Orable PL/SQL and HiveQL.
BI and reporting tools: Supports Tableau, SAP BO, Oracle OBIEE.
Superior Scalability: Linearly scalable. Supports fast processing of data on the orders of gigabytes to petabytes.
Highly Stable: Repeatedly tested for stability. Runs uninterrupted 24/7.
Transwarp Hyperbase real-time data processing engine, based on Apache HBase, is optimal for highly concurrent online business systems for enterprises.
Supports various types of data: Supports structured, semi-structured and unstructured data.
High speed processing: Latency is on the order of milliseconds. Capable of millions of concurrent tasks.
OLAP and batch processing: Supports high-speed OLAP processing and SQL offline batch processing.
Transwarp Stream real-time streaming processing engine, based on Apache Spark, possesses powerful stream processing abilities.
More expressive: Supports DAG computing model.
Versatile output methods: HBase, warning page, real-time display page.
Diverse application scenarios: sensor network processing, sercive monitoring, anti-cheating.
Transwarp Discover machine learning engine provides data mining via R.
Apache Hadoop is an opensource software framework developed for distributed storage and distributed processing of Big Data. It has become the pillar in enterprise Big Data management. However, the opensource Hadoop faces several challenges. Although Apache Hadoop has far better performance than traditional data processing technology when dealing with data over 100 terabytes, it is not efficient for data sets with sizes between gigabytes and terabytes. Only when data is thoroughly analyzed can information hidden in data be transformed into real business values. For this end, Apache Hadoop needs to be compatible with decision analysis tools. Finally, the market needs comprehensive enterprise solutions to accelerate widespread deployment of Big Data applications.
Learning from challenges faced by Hadoop and keeping in mind enterprises' needs for applications, Transwarp has been developing a series of technologies based on Hadoop to form the Transwarp Data Hub platform suitable for enterprise applications we see today. Efforts made by Transwarp can adapt Hadoop to needs from all kinds of enterprises.
Superior Execution Speed
Transwarp Inceptor uses its own high-efficiency columnar storage format and Apache Spark computation engine optimized for in-memory computation, avoiding frequent readings from/writings to harddrives practiced by Map/Reduce framework. Morever, Spark uses a light-weight dispatch framework and a multithreading computation model. As a result, compared to Map/Reduce, Spark has very little dispatch and start-up overhead, higher execution speed and shorter mean time to restoration (MTTR). With respect to real-time online applications, Transwarp Hyperbase has constructed a global index, an auxiliary index and a full-text index. It extends from SQL syntax and can satisfy needs for low latency from online storage and OLAP. With optimizations made on the execution engine and data storage layer, Transwarp Data Hub has overtaken the opensource Apache Hadoop 2.5. TDH has much stronger performance and much more comprehensive SQL support compared to Cloudera Impala. TDH is faster than mainstream MPP databases by 1.5 to 10 times.
Comprehensive SQL Support
Currently, Transwarp Data Hub supports SQL2003 and is in the course of realizing more complex PL/SQL syntax, including fucntionalities such as storage process, functions and cursors. TDH also supports and extends the complete HiveQL syntax and has optimized large portions of the execution plans. Most data warehouse and data market applications use complex SQL2003 syntax. Without support of SQL syntax needed, it is impossible to migrate applications from traditional databases to Hadoop. Therefore, more comprehensive support for SQL is even more important than performancece of particular functions. TDH not only manages much larger volumns of data, but also makes it effortless for users to migrate data analysis applications from traditional databases.
Superior Data Analysis Ability
It is more and more important to put Big Data in the hands of data analysts by allowing them to interatively explore data, gain insight and discover trends. TDH uses distributed columnar in-memory storage and optimized high-speed execution engine to support interactive SQL queries, making it possible to do real-time interactive analysis. TDH also supports R statistics engine. In the new version, apart from supporting accessing data in HDFS and Hyperbase via R, TDH also supports accessing data stored in Inceptor distributed RAM. Inceptor also has parallel realization of common machine learning algorithms built-in, which can be used together with thousands of algorithms in the R language. The new version of TDH also supports accessing data in TDH via both the R command line tool and R Studio GUI, making TDH the best tool in the fields of data mining of Big Data and visualizable applications. TDH has highly-optimized graph algorithms of its own, which can analyze graph data such as assoiation and relation networks. In addition, TDH integrates Mahout, a machine learning library that contains machine learning algorithms including clustering, classification, frequency association and recommendation systems.
Integration Into the Data Analysis Ecosystem
Transwarp Data Hub highly values integration with the data analysis Ecosystem to improve its usability. TDH seamlessly combines with curret mature systems with expertise in data retrieval, data analysis and data visualization. Data stored in traditional relational databases can b directly used as data sources in TDH clusters for analysis. TDH currently has support for Oracle, DB2 and MySQL databases. On ther other hand, integration of the data analysis layer and the R language utilizes thousands of statistical algorithms in R as well as graphing tools in R for creating professinal statistical reports. Data visualization not only demonstrates final results of analysis to users, it also helps data analysts to explore data to discover and solve new problems. TDH supports many visualization tools and report generating tools, including Tableau, SAP Business Objects, Orable OBIEE, etc, making it easier to comprehend and accept business decisions made upon Big Data analysis, hence optimizing potential values of Big Data. Although some other tools also support Apache Hadoop, only TDH with its superior performance can make interative exploration of Big Data a reality.
Complete Enterprise Solutions
By providing full support of data storage, distributed computation, data analysis and data mining, Transwarp Data Hub has solved many problems encountered by enterprises while analyzing data sets between gigabytes to petabytes. As an enterprise solution, manageability is a significant advantage of TDH. User-friendly management interface provides support for system installation, system and cluster configuration, monitoring, warning and many other repsects. TDH can fast recover from malfunctions. As its low level storage system supporting technology, HDFS 2.5 ensures persistence and redundancy of data. HDFS 2.5 has the ability to automatially examine and correct mistakes in data. All services based on HDFS are optimized for HA functionalities of HDFS 2.5 to ensure high useability of the entire Big Data system. With repect to security, TDH integrates Kerberos/LDAP to support fine-grained control for data access, application security, encoding and decoding of data, etc.