MapReduce is a popular distributed programming framework for large-scale data processing. To prevent MapReduce jobs from being interrupted by node failures that occur frequently in a MapReduce cluster consisting of a set of commodity machines/nodes, the most well-known MapReduce implementation, i.e. Hadoop, adopts a task re-execution policy (TR policy). When a map/reduce task of a job crashes, the TR policy assigns another node to reperform the task. However, the impact of the TR policy on MapReduce jobs in terms of reliability, job turnaround time (JTT) and energy consumption are not clear, particularly when jobs have different features, e.g. different filtering percentages, different input-data sizes, and different numbers of reduce tasks. In this paper, we formally analyze the job completion reliability (JCR) of a job based on Poisson distributions, and then derive the expected JTT and job energy consumption (JEC) based on the universal generation function. Extensive analyses are further conducted to explore the impact of the TR policy on JCR, JTT and JEC of jobs with different features. The results show that employing the TR policy can dramatically improve JCR for a large MapReduce job. Moreover, if the JCR of a job is highly improved by the TR policy, the expected JTT and JEC will not be significantly prolonged and increased, respectively.
- Poisson distribution
- job completion reliability
- job energy consumption
- job turnaround time
- universal generation function