DateFeb. 5, 2010
Speaker Constantinos Evangelinos (Massachusetts Institute of Technology)
TopicScientific Computing on the Cloud: Many Task Computing and other opportunities
Abstract: Over the past few years the application of Cloud Computing to scientific and not simply business uses has been mainly in the areas of bioinformatics. The usual Cloud science application is one that is essentially embarrassingly parallel (parameter studies etc.) and in many cases expressed in the usual map-reduce paradigms that the Cloud has made so popular.

We set out to explore a wider class of applications that can benefit from the type of resources a commercial Cloud provider such as Amazon EC2 offers, starting from the lower hanging fruit of loosely coupled applications.

Error Subspace Statistical Estimation (ESSE), an uncertainty prediction and data assimilation methodology employed for real-time ocean forecasts, is based on a characterization and prediction of the largest uncertainties. This is carried out by evolving an error subspace of variable size. We use an ensemble of stochastic model simulations, initialized based on an estimate of the dominant initial uncertainties, to predict the error subspace of the model fields. The ESSE procedure is a classic case of Many Task Computing: These codes are managed based on dynamic workflows for (i) The perturbation of the initial mean state, (ii) the subsequent ensemble of stochastic PE model runs, (iii) the continuous generation of the covariance matrix, (iv) the successive computations of the SVD of the ensemble spread until a convergence criterion is satisfied, and (v) the data assimilation. Its ensemble nature makes it a many task data intensive application and its dynamic workflow gives it heterogeneity. Subsequent acoustics propagation modeling involves a very large ensemble of very short in duration acoustics runs.

We study the execution characteristics and challenges of a distributed ESSE workflow on a large dedicated cluster and the usability of enhancing this with runs on Amazon EC2 and the Teragrid and the I/O challenges faced.

We then proceed to look into more closely coupled applications and the issues they face on Amazon.