In general, scientific applications require different types of computing resources based on the application's behavior and needs. For example, page indexing in an Arabic search engine requires sufficient network bandwidth to process millions of web pages while seismic modeling is CPU and graphics intensive for real-time fluid analysis and 3D visualization. As a potential solution, cloud computing, with its elastic, on-demand and pay-as-you-go model, can offer a variety of virtualized compute resources to satisfy the demands of various scientific applications. Currently, deploying scientific applications onto large-scale virtualized cloud computing platforms is based on a random mapping or some rule-of-thumb developed through past experience. Such provisioning and scheduling techniques cause overload or inefficient use of the shared underlying computing resources, while delivering little to no satisfactory performance guarantees. Virtualization, a core enabling technology in cloud computing, enables the coveted flexibility and elasticity yet it introduces several difficulties with resource mapping for scientific applications.

In order to enable informed provisioning, scheduling and perform optimizations on cloud infrastructures while running scientific workloads, we propose the utilization of a profiling technique to characterize the resource need and behavior of such applications. Our approach provides a framework to characterize scientific applications based on their resource capacity needs, communication patterns, bandwidth needs, sensitivity to latency, and degree of parallelism. Although the programming model could significantly affect these parameters, we focus this initial work on characterizing applications developed using the MapReduce and Dryad programming models. We profile several applications, while varying the cloud configurations and scale of resources in order to study the particular resource needs, behavior and identify potential resources that limit performance. A manual and iterative process using a variety of representative input data sets is necessary to reach informative conclusions about the major characteristics of an application's resource needs and behavior. Using this information, we provision and configure a cloud infrastructure, given the available resources, to best target the given application. In this preliminary work, we show experimental results across a variety of applications and highlight the merit in precise application characterization in order to efficiently utilize the resources available across different applications.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error