COMPUTATIONAL RESEARCH in BOSTON and BEYOND (CRIBB)

Date Aug. 2, 2013
Speaker Nobuaki Tounaka, Developer of the Unicage Shell-Based Data Analytics Framework; President and CEO (Universal Shell Programming Laboratory, Ltd.)
Topic How to Analyze 50 Billion Records in Less than a Second without Hadoop or Big Iron
Abstract:

Today if you need to perform complex analytics on datasets of tens to hundreds of billions of records in a reasonable amount of time, you either need to set up a very large Hadoop cluster or use expensive big iron. Not only does this increase the cost of your project, but it also slows you down significantly both in the programming phase and in the execution phase.

Nobuaki Tounaka is visiting us from Japan in order to explain the Unicage framework together with live demonstrations. Unicage is a complete high-performance data analytics package implemented entirely in a Unix shell. It consists of a customized shell called the Unicage Shell (ush) based on the Bourne shell but with much more robust error handling and better pipelining performance. It also includes over 200 Unicage Commands that implement the database and analytics functionality. Alongside traditional SQL equivalent commands it also provides import/export, data formatting and a complete set of statistical tools based on "R". However, all of these commands have been optimized for high performance and compiled to take maximum advantage of system resources.

Unicage is fully consistent with the Unix philosophy; all programs are written in shell script so they are extremely easy and fast to develop, making ad-hoc analysis from the command line a reality. Unicage's clustering technology (BOA - BigData Oriented Architecture) scales linearly using the "Bubun File System" developed by USP Lab, and therefore does not suffer the diminishing performance at scale due to overhead of other cluster technologies such as Hadoop. Unicage has native support for parallel processing through the optimized pipelining support included in the Unicage Shell, making it ideal for applications such as real-time ETL of huge amounts of spatio-temporal datapoints or real-time mathematical analysis of billions of datapoints using the "R" toolkit.

Mr. Tounaka will share benchmarks comparing the processing speed of Unicage with Hadoop and other technologies. He will also show live demonstrations of several large-scale projects he has conducted jointly with Japanese universities, including genomics research and traffic engineering projects involving enormous datasets. Free evaluation copies of Unicage will be offered to all CRIBB participants starting on July 2, 2013. Simply visit en.usp-lab.com and click on "Get Unicage Now".

Archives

Acknowledgements

We thank the generous support of MIT IS&T, CSAIL, and the Department of Mathematics for their support of this series.

MIT Math CSAIL EAPS Lincoln Lab Harvard Astronomy

Accessibility