Yarn背景
Yarn全称为Yet Another Resource Negotiator。是一种资源管理器,负责集群资源的管理和调度,它可以实现对集群所有cpu,内存,文件系统,磁盘等各种资源的分配。
yarn是hadoop mapreduce的第二版本,解决version1的一些问题。
名词解释
Application Master (AM):
Resource Manager (RM):
Node Manager (NM):
The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.
The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
也就是说,RM和NM组成了数据计算框架,RM管理系统中所有资源的框架,NM是管理容器的机器级别的框架(管理机器cpu内存硬盘网络资源)并汇报给RM/Scheduler。应用级别的AM是框架定义的库,负责与RM协调资源,和NM一起执行并监控task。
Scheduler是RM的两个主要部分之一,分别是Scheduler和Applications Manager (ASM)。
yarn并不能单独安装,只能通过部署hadoop来安装yarn。
参考
Apache Hadoop YARN
Architecture of Next Generation Apache Hadoop MapReduce Framework
hadoop杂记-为什么会有Map-reduce v2 (Yarn)
Deploying MapReduce v2 (YARN) on a Cluster