A Classification of MapReduce Schedulers in Heterogeneous Environments

  • Nenavath Srinivas Naik Asst. Professor, CVR College of Engineering/CSE Department, Hyderabad, India.
  • M. Badrinarayana Professor, CVR College of Engineering/CSE Department, Hyderabad, India.

Abstract

MapReduce is an essential framework for distributed storage and parallel processing for large-scale dataintensive jobs proposed in recent times. Intelligent scheduling decisions can potentially help in significantly minimizing the overall runtime of jobs. Hadoop default scheduler assumes a homogeneous environment. This assumption of homogeneity does not work at all times in practice and limits the MapReduce performance. In heterogeneous environments, the job completion times do not synchronize. Data locality is essentially moving computation closer (faster access) to the input data. Fundamentally, MapReduce does not always look into the heterogeneity from a data locality perspective. Improving data locality for MapReduce framework is an important issue to improve the performance of large-scale Hadoop clusters. This paper primarily provides an overview of the evaluation of Hadoop and introduces the MapReduce framework in detail. This paper also describes some relevant literature work on some recent developments in MapReduce scheduling algorithms in heterogeneous environments.

Published
2019-08-29