Transwarp Distributed File Storage Engine

Wtih a basic distributed file system as its storage engine and YARN as its resource management framework, Transwarp Hadoop combines a series a Apache projects to enable data collection, storage, synchronization, batch processing, streaming analysis and full-text search. Transwarp has improved Apache YARN resource management framework, making it possible to dynamically create, on the same HDFS dataset, Inceptor interactive analysis clusters, Map/Reduce batch processing clusters as well as real-time streaming clusters. This set-up allows resource management, dynamic resource allocation and resource sharing among different departments within a company.

Features of Transwarp's Distribution for Apache Hadoop
Functionalities Description
Erasure Code An advanced fault-tolerant encoding scheme. It allows Transwarp's Distribution for Apache Hadoop to keep 1.5 duplicates (or redundant copies?) only (compared to 3 duplicates used in traditional HDFS strategy).
YARN A next generation resource management framework. Allowing multiple application clusters to efficiently run on the same physical cluster, it can serve the entire business as a true multi-application platform.
Map/Reduce A distributed batch processing computing model. It decomposes large input data sets (on the order of petabytes) into blocks to be processed in parallel.
Pig A data processing language that turns SQL-like data analysis requests into Map/Reduce tasks.
Oozie A workflow scheduler system whose jobs are triggered by time and data availability.
Flume A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data.
HUE A web based graphical development tool.
Sqoop/Sqoop2 A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Elastic Search A distributed real-time search and analysis engine capable of in-depth search.