DateFeb. 1, 2013
Speaker Jeremy Kepner (MIT-Lincoln Laboratory)
TopicTransforming Big Data with D4M
Abstract: The growth of bioinformatics, social analysis, and network science is forcing data scientists to handle unstructured data in the form of genetic sequences, text, and graphs. Triple store databases are a key enabling technology for this data and are used by many large Internet companies (e.g., Google Big Table, Amazon Dynamo, Apache HBase, and Apache Accumulo). Triple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of the mathematical algorithms used by many data scientists. D4M (Dynamic Distributed Dimensional Data Model) provides a parallel linear algebraic interface to triple stores. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. The central mathematical concept of D4M is the "associative array" that combines spreadsheets, triple stores, and sparse linear algebra. Associative arrays are group theoretic constructs that use fuzzy algebra to extend linear algebra to words and strings. This talk describes the D4M technology, its mathematical foundations, application, and performance.