Date  Feb. 1, 2013 
Speaker  Jeremy Kepner (MITLincoln Laboratory) 
Topic  Transforming Big Data with D4M 
Abstract:  The growth of bioinformatics, social analysis, and network science is forcing data
scientists to handle unstructured data in the form of genetic sequences, text, and
graphs. Triple store databases are a key enabling technology for this data and are
used by many large Internet companies (e.g., Google Big Table, Amazon Dynamo, Apache
HBase, and Apache Accumulo). Triple stores are highly scalable and run on commodity
clusters, but lack interfaces to support efficient development of the mathematical
algorithms used by many data scientists. D4M (Dynamic Distributed Dimensional Data
Model) provides a parallel linear algebraic interface to triple stores. Using D4M,
it is possible to create composable analytics with significantly less effort than
using traditional approaches. The central mathematical concept of D4M is the
"associative array" that combines spreadsheets, triple stores, and sparse linear
algebra. Associative arrays are group theoretic constructs that use fuzzy algebra to
extend linear algebra to words and strings. This talk describes the D4M technology,
its mathematical foundations, application, and performance.
