Image processing on the Cloud: Characterizing edge detection on biomedical images

Majd Sakr; Mohammad Hammoud; Manoj Dareddy Reddy

doi:10.5339/qfarf.2012.CSPS11

Abstract

In order to analyze and deduce valuable information from big image data, we have developed a framework for distributed image processing in Hadoop MapReduce. A vast amount of scientific data is now represented in the form of images from sources like medical tomography. Applying algorithms on these images has been continually limited by the processing capacity of a single machine. MapReduce created by Google presents a potential solution. MapReduce efficiently parallelizes computation by distributing tasks and data across multiple machines. Hadoop, an open source implementation of MapReduce, is gaining a widespread popularity due to features such as scalability and fault tolerance. Hadoop is primarily used with text-based input data. Its ability to process image data and its performance behavior with image processing have not been fully explored. We propose a framework that efficiently enables image processing on Hadoop and characterizes its behavior using a state-of-the-art image processing algorithm, Edge Detection. Existing approaches in distributed image processing suffer from two main problems: (1) input images need to be converted to a custom file format and (2) image processing algorithms require adherence to a specific API that might impose some restrictions on applying some algorithms to Hadoop. Our framework avoids these problems by: (1) bundling all small images into one large file that can be seamlessly parsed by Hadoop and (2) relaxing any restriction by allowing a direct porting of any image processing algorithm to Hadoop. A R educe-less job is then launched where the code for processing images and a mechanism to write the images back individually to HDFS are included in Mappers. We have tested the framework using Edge Detection on a dataset of 3760 biomedical images. Besides, we characterized Edge Detection along several dimensions, such as degree of parallelism and network traffic patterns. We observed that varying the number of map tasks has a significant impact on Hadoop's performance. The best performance was obtained when the number of map tasks equals the number of available slots as long as the application resource demand is satisfied. Compared to the default Hadoop configuration, a speedup of 2.1X was achieved.

oa Image processing on the Cloud: Characterizing edge detection on biomedical images

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Barriers and facilitators influencing the physical activity of Arabic adults: A literature review

Osteoporosis: An under-recognized public health problem

E-learning in Saudi Arabia: Past, present and future

Association of erythrocytes antioxidant enzymes and their cofactors with markers of oxidative stress in patients with sickle cell anemia

Qatar’s economy: Past, present and future