问题描述
今天发现一个日常任务突然失败,失败的原因也不明显,因为这个hive sql会生成多个mapreduce任务,前面的mapreduce任务都成功了,到最后一个MR任务的时候,在任务启动之前就失败了,因此不能通过mapreduce任务监控页面来查找问题,后面通过hive的session日志,发现了如下错误信息:
submitJob failed java.io.IOException: Max block location exceeded for split:
StorageEngineClient.CombineFormatStorageFileInputFormat:Paths:
/user/.../default/attempt_1542378275159_154173895_r_001706_0.1552539467243:0+37738
/user/.../default/attempt_1542378275159_154173895_r_001845_0.1552539513594:0+38046
/user/.../default/attempt_1542378275159_154173895_r_000491_0.1552539063059:0+38698
...(此处省略几万行)
splitsize: 108807 maxsize: 100000
at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:610)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:568)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:417)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1279)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1276)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1276)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:1241)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:75)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:64)
问题分析
通过分析上面的错误信息,由
submitJob failed java.io.IOException: Max block location exceeded for split
可以知道,任务失败是由IO异常引起,并且跟split有关
StorageEngineClient.CombineFormatStorageFileInputFormat:Paths:
/user/.../default/attempt_1542378275159_154173895_r_001706_0.1552539467243:0+37738
/user/.../default/attempt_1542378275159_154173895_r_001845_0.1552539513594:0+38046
/user/.../default/attempt_1542378275159_154173895_r_000491_0.1552539063059:0+38698
上面提示可以知道,错误发生在hive sql的多个mapreduce任务之间,因为这个sql总共会执行3个MR任务,而在执行完第2个MR任务的时候,sql就结束执行了,另外也可以通过上面打印的路径可以知道,因为hive sql的中间数据会存放在hive表的路径的default文件夹中
splitsize: 108807 maxsize: 100000
由上面信息可以知道,是由于中间数据产生了大量的文件,导致split超过了maxsize,最终导致了IOException
总结一下就是:hive任务中间数据产生大量小文件,导致split超过了maxsize,引起了任务失败。
问题处理
既然失败原因弄清楚了,那就解决问题了,通过源码可以找到
String[] locations = split.getLocations();
if (locations.length > maxBlockLocations) {
LOG.warn("Max block location exceeded for split: "
+ split + " splitsize: " + locations.length +
" maxsize: " + maxBlockLocations);
locations = Arrays.copyOf(locations, maxBlockLocations);
}
显然这里的maxBlockLocations就是100000,而maxBlockLocations又是由这个决定
int maxBlockLocations = conf.getInt(MRConfig.MAX_BLOCK_LOCATIONS_KEY,
MRConfig.MAX_BLOCK_LOCATIONS_DEFAULT);
而这个值由参数mapreduce.job.max.split.locations决定,参数未配置默认为10,最终解决方案如下:
set mapreduce.job.max.split.locations=200000;
调大maxBlockLocations的值,最终解决问题
最后hive sql转换成的3个MR任务都成功运行。
问题解读
问题是解决了,但是却让我产生了疑问:为什么hadoop要设置maxBlockLocations呢?
通过上网查找到:
what's the recommended value of mapreduce.job.max.split.locations ?
这里有一个比较好的回答:
This configuration is involved since MR v1. It serves as an up limit for DN locations of job split which intend to protect the JobTracker from overloaded by jobs with huge numbers of split locations. For YARN in Hadoop 2, this concern is lessened as we have per job AM instead of JT. However, it will still impact RM as RM will potentially see heavy request from the AM which tries to obtain many localities for the split. With hitting this limit, it will truncate location number to given limit with sacrifice a bit data locality but get rid of the risk to hit bottleneck of RM.
Depends on your job's priority (I believer it is a per job configuration now), you can leave it as a default (for lower or normal priority job) or increase to a larger number. Increase this value to larger than DN number will be the same impact as set it to DN's number.
同时也可以通过hadoop的Issues列表MAPREDUCE-5186中的回答可以知道设置这个参数的大致原因