A Classification of MapReduce Schedulers in Heterogeneous Environments
Abstract
MapReduce is an essential framework for distributed storage and parallel processing for large-scale dataintensive jobs proposed in recent times. Intelligent scheduling decisions can potentially help in significantly minimizing the overall runtime of jobs. Hadoop default scheduler assumes a homogeneous environment. This assumption of homogeneity does not work at all times in practice and limits the MapReduce performance. In heterogeneous environments, the job completion times do not synchronize. Data locality is essentially moving computation closer (faster access) to the input data. Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective. Improving data locality for MapReduce framework is an important issue to improve the performance of large-scale Hadoop clusters. This paper primarily provides an overview of the evaluation of Hadoop and introduces the MapReduce framework in detail. This paper also describes some relevant literature work on some recent developments in MapReduce scheduling algorithms in heterogeneous environments.
Copyright (c) 2017 Creative Commons Licence CVR Journal of Science & Technology by CVR College of Engineering is licensed under a Creative Commons Attribution 4.0 International License.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.