Open Access Open Access  Restricted Access Subscription Access

Simplifying the Development and Deployment of MapReduce Algorithms

Ferosh Jacob, Amber Wagner, Prateek Bahri, Susan Vrbsky, Jeff Gray


MapReduce algorithms can be difficult to write and test due to the accidental complexities involved with existing MapReduce implementations. Furthermore, the configuration details involved in running MapReduce algorithms within a cloud present a set of new challenges. Our research reveals that many details of cloud configuration can be hidden from programmers in an automated and transparent manner. Using concepts from software engineering, we have increased the ease of use for implementing MapReduce algorithms by creating a lightweight domain-specific language (DSL). Additionally, we created a plug-in for the Eclipse integrated development environment (IDE) based on this DSL to automate and hide many cloud configuration details. The goal of the combination of our IDE and DSL is to improve the efficiency and effectiveness for programmers to develop MapReduce algorithms for cloud computing.


This paper describes the existing challenges of creating MapReduce algorithms and how our approach minimizes these challenges. MapRedoop is a framework that can be used to transform a program written in a DSL to a MapReduce implementation, which can be deployed and executed in a cloud platform such as Eucalyptus or Amazon’s Elastic Compute Cloud (EC2). Assorted examples selected from various domains have been rewritten in the MapRedoop framework to demonstrate its expressiveness and usefulness. Our performance analysis reveals that the advantages gained using our approach can be attained with comparable execution times to the methodologies currently in practice.

Full Text: