概述
本文是对分布式任务调度平台XXL-JOB(版本 V2.0.2)的源码解读,限于本人能力水平有限,若有不对的地方请各位看官海涵,并联系更正
基础认识
XXL-JOB的整体架构
XXL-JOB的数据库设计
执行/xxl-job/doc/db/tables_xxl_job.sql这个SQL文件我们将得到一共16张表
-XXL_JOB_QRTZ_TRIGGER_GROUP:执行器信息表,维护任务执行器信息;
-XXL_JOB_QRTZ_TRIGGER_REGISTRY:执行器注册表,维护在线的执行器和调度中心机器地址信息;
-XXL_JOB_QRTZ_TRIGGER_INFO:调度扩展信息表: 用于保存XXL-JOB调度任务的扩展信息,如任务分组、任务名、机器地址、执行器、执行入参和报警邮件等等;
-XXL_JOB_QRTZ_TRIGGER_LOG:调度日志表: 用于保存XXL-JOB任务调度的历史信息,如调度结果、执行结果、调度入参、调度机器和执行器等等;
-XXL_JOB_QRTZ_TRIGGER_LOGGLUE:任务GLUE日志:用于保存GLUE更新历史,用于支持GLUE的版本回溯功能;
上述5张表是XXL-JOB扩展的表
在这16张表中,我们有必要着重认识一下如下几张表,对与源码解读有帮助:
-
XXL_JOB_QRTZ_TRIGGER_GROUP
这张表里记录的是执行器信息,后台管理系统对应【执行器管理】部分
代码中的PO对应:
public class XxlJobGroup { //主键ID private int id; //执行器名称 private String appName; //执行器标题 private String title; //排序 private int order; //执行器地址类型:0=自动注册、1=手动录入 private int addressType; //执行器地址列表,多地址逗号分隔(手动录入) private String addressList; //############## getter & setter ############## }
-
XXL_JOB_QRTZ_TRIGGER_INFO
这张表里记录的是任务信息,后台管理系统对应【任务管理】部分
代码中的PO对应:
public class XxlJobInfo { // 主键ID(JobKey.name) private int id; // 执行器主键ID(JobKey.group) private int jobGroup; // 任务执行CRON表达式(base on quartz) private String jobCron; // 任务描述 private String jobDesc; // 创建时间 private Date addTime; // 更新时间 private Date updateTime; // 负责人 private String author; // 报警邮件 private String alarmEmail; // 执行器路由策略 private String executorRouteStrategy; // 执行器,任务Handler名称 private String executorHandler; // 执行器,任务参数 private String executorParam; // 阻塞处理策略 private String executorBlockStrategy; // 任务执行超时时间,单位秒 private int executorTimeout; // 失败重试次数 private int executorFailRetryCount; // GLUE类型 #com.xxl.job.core.glue.GlueTypeEnum private String glueType; // GLUE源代码 private String glueSource; // GLUE备注 private String glueRemark; // GLUE更新时间 private Date glueUpdatetime; // 子任务ID,多个逗号分隔 private String childJobId; // 任务状态(base on quartz) // copy from quartz private String jobStatus; //############## getter & setter ############## }
-
XXL_JOB_QRTZ_TRIGGER_LOG
这张表记录的是调度日志,后台管理系统对应【调度日志】部分
代码中PO对应:
public class XxlJobLog { private int id; // job info private int jobGroup; private int jobId; // execute info private String executorAddress; private String executorHandler; private String executorParam; private String executorShardingParam; private int executorFailRetryCount; // trigger info private Date triggerTime; private int triggerCode; private String triggerMsg; // handle info private Date handleTime; private int handleCode; private String handleMsg; // alarm info private int alarmStatus; //############## getter & setter ############## }
-
XXL_JOB_QRTZ_TRIGGER_REGISTRY
这张表记录的是执行器进行自动注册时的信息
代码中的PO对应:public class XxlJobRegistry { //主键ID private int id; //注册类型(EXECUTOR:自动注册 ADMIN:手动录入) private String registryGroup; //执行器标识(即执行器项目定义的appName) private String registryKey; //执行器地址(即执行器项目的地址信息) private String registryValue; private Date updateTime; //############## getter & setter ############## }
调度中心源码分析
首先我们看下调度中心的配置信息,我们从这里入手,我在源码中写明了注视,所以部分逻辑并没有文字说明,大家看的时候一定要看注视内容
package com.xxl.job.admin.core.conf;
import com.xxl.job.admin.core.schedule.XxlJobDynamicScheduler;
import org.quartz.Scheduler;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.scheduling.quartz.SchedulerFactoryBean;
import javax.sql.DataSource;
/**
* @author xuxueli 2018-10-28 00:18:17
*/
@Configuration
public class XxlJobDynamicSchedulerConfig {
@Bean
public SchedulerFactoryBean getSchedulerFactoryBean(DataSource dataSource) {
SchedulerFactoryBean schedulerFactory = new SchedulerFactoryBean();
schedulerFactory.setDataSource(dataSource);
schedulerFactory.setAutoStartup(true); // 自动启动
schedulerFactory.setStartupDelay(20); // 延时启动,应用启动成功后在启动
schedulerFactory.setOverwriteExistingJobs(true); // 覆盖DB中JOB:true、以数据库中已经存在的为准:false
schedulerFactory.setApplicationContextSchedulerContextKey("applicationContext");
schedulerFactory.setConfigLocation(new ClassPathResource("quartz.properties"));
return schedulerFactory;
}
@Bean(initMethod = "start", destroyMethod = "destroy")
public XxlJobDynamicScheduler getXxlJobDynamicScheduler(SchedulerFactoryBean schedulerFactory) {
Scheduler scheduler = schedulerFactory.getScheduler();
//创建xxl job调度中心,在容器初始化bean后调用start方法(上面的initMethod="start")
XxlJobDynamicScheduler xxlJobDynamicScheduler = new XxlJobDynamicScheduler();
//xxl-job的动态调度中心底层使用quartz的调度中心,通过RemoteHttpJobBean这个类进行任务的触发(XxlJobDynamicScheduler类中的addJob方法)
xxlJobDynamicScheduler.setScheduler(scheduler);
return xxlJobDynamicScheduler;
}
}
这是一段JavaConfig配置,创建了2个Bean:SchedulerFactoryBean和XxlJobDynamicScheduler,由此看出,XXL-JOB其实是对Quartz的二次开发,其底层实际是Quartz,而当容器初始化XxlJobDynamicScheduler后会调用start方法,销毁Bean时会调用destroy方法,下面我们进入start方法看看到底做了些什么?代码片段如下:
public void start() throws Exception {
// valid
Assert.notNull(scheduler, "quartz scheduler is null");
// init i18n
initI18n();
// admin registry monitor run
JobRegistryMonitorHelper.getInstance().start();
// admin monitor run
JobFailMonitorHelper.getInstance().start();
// admin-server
initRpcProvider();
logger.info(">>>>>>>>> init xxl-job admin success.");
}
我们进入JobRegistryMonitorHelper.getInstance().start()看看这里做了些什么,代码片段如下:
public class JobRegistryMonitorHelper {
private static Logger logger = LoggerFactory.getLogger(JobRegistryMonitorHelper.class);
private static JobRegistryMonitorHelper instance = new JobRegistryMonitorHelper();
public static JobRegistryMonitorHelper getInstance() {
return instance;
}
private Thread registryThread;
private volatile boolean toStop = false;
public void start() {
//创建一个线程
registryThread = new Thread(new Runnable() {
@Override
public void run() {
//当toStop为false时进入该循环(注意:toStop是用volatile修饰的)
while (!toStop) {
try {
//获取类型为自动注册的执行器(XxlJobGroup)地址列表
List<XxlJobGroup> groupList = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().findByAddressType(0);
if (groupList != null && !groupList.isEmpty()) {
//删除90秒内没有更新的注册机器信息,90秒没有心跳信息返回代表机器已经出现问题,所以移除
XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().removeDead(RegistryConfig.DEAD_TIMEOUT);
// fresh online address (admin/executor)
HashMap<String, List<String>> appAddressMap = new HashMap<String, List<String>>();
//查询90秒内有更新的注册机器信息列表
List<XxlJobRegistry> list = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findAll(RegistryConfig.DEAD_TIMEOUT);
if (list != null) {
//遍历注册信息列表,得到自动注册类型的执行器与其对应的地址信息关系Map
for (XxlJobRegistry item : list) {
if (RegistryConfig.RegistType.EXECUTOR.name().equals(item.getRegistryGroup())) {
String appName = item.getRegistryKey();
List<String> registryList = appAddressMap.get(appName);
if (registryList == null) {
registryList = new ArrayList<String>();
}
if (!registryList.contains(item.getRegistryValue())) {
registryList.add(item.getRegistryValue());
}
//收集执行器信息,根据执行器appName做区分:appAddressMap的key为appName,value为此执行器的注册地址列表(集群环境下会有多个注册地址)
appAddressMap.put(appName, registryList);
}
}
}
//遍历所有的自动注册的执行器
for (XxlJobGroup group : groupList) {
//通过执行器的appName从刚刚区分的Map中拿到该执行器下的集群机器注册地址
List<String> registryList = appAddressMap.get(group.getAppName());
String addressListStr = null;
if (registryList != null && !registryList.isEmpty()) {
Collections.sort(registryList);
addressListStr = "";
//集群的多个注册地址通过逗号拼接转为字符串
for (String item : registryList) {
addressListStr += item + ",";
}
addressListStr = addressListStr.substring(0, addressListStr.length() - 1);
}
//集群的多个注册地址通过逗号拼接转成的字符串设置进执行器的addressList属性中,并更新执行器信息,保存到DB
group.setAddressList(addressListStr);
XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().update(group);
}
}
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
try {
//线程停顿30秒
TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
} catch (InterruptedException e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job registry monitor thread stop");
}
});
//将此线程设置成守护线程
registryThread.setDaemon(true);
registryThread.setName("xxl-job, admin JobRegistryMonitorHelper");
//执行该线程
registryThread.start();
}
public void toStop() {
toStop = true;
// interrupt and wait
registryThread.interrupt();
try {
registryThread.join();
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
}
其实start方法主要作用是开通了一个守护线程,每隔30s扫描一次执行器的注册信息表
* 剔除90s内没有进行健康检查的执行器信息
* 将自动注册类型的执行器注册信息(XxlJobRegistry)经过处理更新执行器信息(XxlJobGroup)
接下来我们看看JobFailMonitorHelper.getInstance().start()里做了些什么?
public class JobFailMonitorHelper {
private static Logger logger = LoggerFactory.getLogger(JobFailMonitorHelper.class);
private static JobFailMonitorHelper instance = new JobFailMonitorHelper();
public static JobFailMonitorHelper getInstance() {
return instance;
}
// ---------------------- monitor ----------------------
private Thread monitorThread;
private volatile boolean toStop = false;
public void start() {
//创建一个监控线程
monitorThread = new Thread(new Runnable() {
@Override
public void run() {
// monitor
while (!toStop) {
try {
//从数据库中取出所有执行失败且告警状态为0(默认)的日志ID,虽然这里传了pageSize为1000,实际没有进行分页,取出来的是所有符合条件的失败日志
List<Integer> failLogIds = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().findFailJobLogIds(1000);
if (failLogIds != null && !failLogIds.isEmpty()) {
for (int failLogId : failLogIds) {
//锁定日志
int lockRet = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, 0, -1);
if (lockRet < 1) {
continue;
}
//取出失败日志完整信息
XxlJobLog log = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().load(failLogId);
//获取失败日志对应的任务信息
XxlJobInfo info = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(log.getJobId());
// 如果剩余失败可重试次数>0(注意:日志里存的失败重试次数实为剩余可重复次数)
if (log.getExecutorFailRetryCount() > 0) {
//触发任务执行
JobTriggerPoolHelper.trigger(log.getJobId(), TriggerTypeEnum.RETRY, (log.getExecutorFailRetryCount() - 1), log.getExecutorShardingParam(), null);
String retryMsg = "<br><br><span style=\"color:#F39C12;\" > >>>>>>>>>>>" + I18nUtil.getString("jobconf_trigger_type_retry") + "<<<<<<<<<<< </span><br>";
log.setTriggerMsg(log.getTriggerMsg() + retryMsg);
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(log);
}
// 2、fail alarm monitor
//失败告警
int newAlarmStatus = 0; // 告警状态:0-默认、-1=锁定状态、1-无需告警、2-告警成功、3-告警失败
if (info != null && info.getAlarmEmail() != null && info.getAlarmEmail().trim().length() > 0) {
boolean alarmResult = true;
try {
//发送失败告警:默认是邮件通知
alarmResult = failAlarm(info, log);
} catch (Exception e) {
alarmResult = false;
logger.error(e.getMessage(), e);
}
newAlarmStatus = alarmResult ? 2 : 3;
} else {
newAlarmStatus = 1;
}
//更新告警状态
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, -1, newAlarmStatus);
}
}
TimeUnit.SECONDS.sleep(10);
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job fail monitor thread error:{}", e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job fail monitor thread stop");
}
});
monitorThread.setDaemon(true);
monitorThread.setName("xxl-job, admin JobFailMonitorHelper");
monitorThread.start();
}
public void toStop() {
toStop = true;
// interrupt and wait
monitorThread.interrupt();
try {
monitorThread.join();
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
// ---------------------- alarm ----------------------
// email alarm template
private static final String mailBodyTemplate = "<h5>" + I18nUtil.getString("jobconf_monitor_detail") + ":</span>" +
"<table border=\"1\" cellpadding=\"3\" style=\"border-collapse:collapse; width:80%;\" >\n" +
" <thead style=\"font-weight: bold;color: #ffffff;background-color: #ff8c00;\" >" +
" <tr>\n" +
" <td width=\"20%\" >" + I18nUtil.getString("jobinfo_field_jobgroup") + "</td>\n" +
" <td width=\"10%\" >" + I18nUtil.getString("jobinfo_field_id") + "</td>\n" +
" <td width=\"20%\" >" + I18nUtil.getString("jobinfo_field_jobdesc") + "</td>\n" +
" <td width=\"10%\" >" + I18nUtil.getString("jobconf_monitor_alarm_title") + "</td>\n" +
" <td width=\"40%\" >" + I18nUtil.getString("jobconf_monitor_alarm_content") + "</td>\n" +
" </tr>\n" +
" </thead>\n" +
" <tbody>\n" +
" <tr>\n" +
" <td>{0}</td>\n" +
" <td>{1}</td>\n" +
" <td>{2}</td>\n" +
" <td>" + I18nUtil.getString("jobconf_monitor_alarm_type") + "</td>\n" +
" <td>{3}</td>\n" +
" </tr>\n" +
" </tbody>\n" +
"</table>";
/**
* fail alarm
*
* @param jobLog
*/
private boolean failAlarm(XxlJobInfo info, XxlJobLog jobLog) {
boolean alarmResult = true;
// send monitor email
if (info != null && info.getAlarmEmail() != null && info.getAlarmEmail().trim().length() > 0) {
// alarmContent
String alarmContent = "Alarm Job LogId=" + jobLog.getId();
if (jobLog.getTriggerCode() != ReturnT.SUCCESS_CODE) {
alarmContent += "<br>TriggerMsg=<br>" + jobLog.getTriggerMsg();
}
if (jobLog.getHandleCode() > 0 && jobLog.getHandleCode() != ReturnT.SUCCESS_CODE) {
alarmContent += "<br>HandleCode=" + jobLog.getHandleMsg();
}
// email info
XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(Integer.valueOf(info.getJobGroup()));
String personal = I18nUtil.getString("admin_name_full");
String title = I18nUtil.getString("jobconf_monitor");
String content = MessageFormat.format(mailBodyTemplate,
group != null ? group.getTitle() : "null",
info.getId(),
info.getJobDesc(),
alarmContent);
Set<String> emailSet = new HashSet<String>(Arrays.asList(info.getAlarmEmail().split(",")));
for (String email : emailSet) {
// make mail
try {
MimeMessage mimeMessage = XxlJobAdminConfig.getAdminConfig().getMailSender().createMimeMessage();
MimeMessageHelper helper = new MimeMessageHelper(mimeMessage, true);
helper.setFrom(XxlJobAdminConfig.getAdminConfig().getEmailUserName(), personal);
helper.setTo(email);
helper.setSubject(title);
helper.setText(content, true);
XxlJobAdminConfig.getAdminConfig().getMailSender().send(mimeMessage);
} catch (Exception e) {
logger.error(">>>>>>>>>>> xxl-job, job fail alarm email send error, JobLogId:{}", jobLog.getId(), e);
alarmResult = false;
}
}
}
// TODO, custom alarm strategy, such as sms
return alarmResult;
}
}
也就是说这里主要作用是开通了一个守护线程,每隔10s扫描一次失败日志
* 如果任务失败可重试次数>0,那么重新触发任务
* 如果任务执行失败,会进行告警,默认采用邮件形式进行告警
接着再看看initRpcProvider()做了什么,代码片段如下:
private void initRpcProvider() {
// init
//初始化RPC配置
XxlRpcProviderFactory xxlRpcProviderFactory = new XxlRpcProviderFactory();
xxlRpcProviderFactory.initConfig(
NetEnum.NETTY_HTTP,
Serializer.SerializeEnum.HESSIAN.getSerializer(),
null,
0,
XxlJobAdminConfig.getAdminConfig().getAccessToken(),
null,
null);
// add services
//给xxlRpcProviderFactory加入服务
xxlRpcProviderFactory.addService(AdminBiz.class.getName(), null, XxlJobAdminConfig.getAdminConfig().getAdminBiz());
// servlet handler
servletServerHandler = new ServletServerHandler(xxlRpcProviderFactory);
}
跟进xxlRpcProviderFactory.addService(AdminBiz.class.getName(), null, XxlJobAdminConfig.getAdminConfig().getAdminBiz())
public void addService(String iface, String version, Object serviceBean) {
//生成服务标识,后续有执行器注册请求进来时根据此表示从工厂中获取服务
String serviceKey = makeServiceKey(iface, version);
//将serviceBean放入serviceData(serviceBean是AdminBiz接口)
this.serviceData.put(serviceKey, serviceBean);
logger.info(">>>>>>>>>>> xxl-rpc, provider factory add service success. serviceKey = {}, serviceBean = {}", serviceKey, serviceBean.getClass());
}
所以整个方法的作用主要是启动注册中心的RPC服务,使得执行器项目可以通过RPC进行注册和心跳检测
好了,我们回过头来再看一下JobFailMonitorHelper中的start()方法,当失败可重试次数>0的时候会调用JobTriggerPoolHelper.trigger(...)方法,进入这个方法中可以看到如下代码片段:(标记Tag:这里是个标记,记住这个标记,在文末会用到哦)
/**
* @param jobId
* @param triggerType
* @param failRetryCount >=0: use this param
* <0: use param from job info config
* @param executorShardingParam
* @param executorParam null: use job param
* not null: cover job param
*/
public static void trigger(int jobId, TriggerTypeEnum triggerType, int failRetryCount, String executorShardingParam, String executorParam) {
helper.addTrigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam);
}
我们继续跟进helper.addTrigger(...)
/**
* job trigger thread pool helper
*
* @author xuxueli 2018-07-03 21:08:07
*/
public class JobTriggerPoolHelper {
private static Logger logger = LoggerFactory.getLogger(JobTriggerPoolHelper.class);
// ---------------------- trigger pool ----------------------
// fast/slow thread pool
private ThreadPoolExecutor fastTriggerPool = new ThreadPoolExecutor(
8,
200,
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(1000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-fastTriggerPool-" + r.hashCode());
}
}
);
private ThreadPoolExecutor slowTriggerPool = new ThreadPoolExecutor(
0,
100,
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(2000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-slowTriggerPool-" + r.hashCode());
}
}
);
// job timeout count
private volatile long minTim = System.currentTimeMillis() / 60000; // ms > min
private volatile Map<Integer, AtomicInteger> jobTimeoutCountMap = new ConcurrentHashMap<>();
/**
* add trigger
*/
public void addTrigger(final int jobId, final TriggerTypeEnum triggerType, final int failRetryCount, final String executorShardingParam, final String executorParam) {
// choose thread pool
//选择线程池类型
ThreadPoolExecutor triggerPool_ = fastTriggerPool;
AtomicInteger jobTimeoutCount = jobTimeoutCountMap.get(jobId);
// 1分钟窗口期内任务耗时达500ms超过10次则判定为慢任务,慢任务自动降级进入"Slow"线程池,避免耗尽调度线程,提高系统稳定性;
if (jobTimeoutCount != null && jobTimeoutCount.get() > 10) { // job-timeout 10 times in 1 min
triggerPool_ = slowTriggerPool;
}
// trigger
triggerPool_.execute(new Runnable() {
@Override
public void run() {
long start = System.currentTimeMillis();
try {
// do trigger
//这里是重点,触发任务的执行
XxlJobTrigger.trigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam);
} catch (Exception e) {
logger.error(e.getMessage(), e);
} finally {
//在finally代码块中通过执行耗时时间是否>500ms将jobId与超时次数记录到到超时记录容器
// check timeout-count-map
//检查超时记录容器
long minTim_now = System.currentTimeMillis() / 60000;
if (minTim != minTim_now) {
//说明当前分钟数和上面的默认分钟数不同,即超过了1分钟
//这里的逻辑主要是为了使得超时记录容器以1分钟为时间间隔,也就是说超时容器1分钟记录1次超时的任务,到了下一分钟清空容器,重新记录
minTim = minTim_now;
jobTimeoutCountMap.clear();
}
// incr timeout-count-map
//如果线程执行时间>500毫秒:将此任务ID与次数记录进超时记录容器jobTimeoutCountMap
long cost = System.currentTimeMillis() - start;
if (cost > 500) { // ob-timeout threshold 500ms
//这里利用AtomicInteger的原子性,保证多线程并发结果的准确性
AtomicInteger timeoutCount = jobTimeoutCountMap.put(jobId, new AtomicInteger(1));
if (timeoutCount != null) {
timeoutCount.incrementAndGet();
}
}
}
}
});
}
public void stop() {
//triggerPool.shutdown();
fastTriggerPool.shutdownNow();
slowTriggerPool.shutdownNow();
logger.info(">>>>>>>>> xxl-job trigger thread pool shutdown success.");
}
// ---------------------- helper ----------------------
private static JobTriggerPoolHelper helper = new JobTriggerPoolHelper();
/**
* @param jobId
* @param triggerType
* @param failRetryCount >=0: use this param
* <0: use param from job info config
* @param executorShardingParam
* @param executorParam null: use job param
* not null: cover job param
*/
public static void trigger(int jobId, TriggerTypeEnum triggerType, int failRetryCount, String executorShardingParam, String executorParam) {
helper.addTrigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam);
}
public static void toStop() {
helper.stop();
}
}
继续跟进XxlJobTrigger.trigger(...)方法,这个方法里会取出任务信息和执行器信息,处理分片参数,如果任务的路由策略为分片广播,则遍历执行器集群地址,对每个执行器进行任务调度
/**
* trigger job
*
* @param jobId
* @param triggerType
* @param failRetryCount >=0: use this param
* <0: use param from job info config
* @param executorShardingParam
* @param executorParam null: use job param
* not null: cover job param
*/
public static void trigger(int jobId, TriggerTypeEnum triggerType, int failRetryCount, String executorShardingParam, String executorParam) {
// load data
//通过jobId查询出任务信息
XxlJobInfo jobInfo = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(jobId);
if (jobInfo == null) {
logger.warn(">>>>>>>>>>>> trigger fail, jobId invalid,jobId={}", jobId);
return;
}
if (executorParam != null) {
jobInfo.setExecutorParam(executorParam);
}
//计算失败重试次数
int finalFailRetryCount = failRetryCount >= 0 ? failRetryCount : jobInfo.getExecutorFailRetryCount();
//获取任务对应的执行器
XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(jobInfo.getJobGroup());
// sharding param
//处理分片参数
int[] shardingParam = null;
if (executorShardingParam != null) {
String[] shardingArr = executorShardingParam.split("/");
if (shardingArr.length == 2 && isNumeric(shardingArr[0]) && isNumeric(shardingArr[1])) {
shardingParam = new int[2];
shardingParam[0] = Integer.valueOf(shardingArr[0]);
shardingParam[1] = Integer.valueOf(shardingArr[1]);
}
}
//处理路由策略
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null)
&& group.getRegistryList() != null && !group.getRegistryList().isEmpty()
&& shardingParam == null) {
//如果路由策略为分片广播,且执行器地址不为空,则遍历执行器地址进行广播触发任务调度
for (int i = 0; i < group.getRegistryList().size(); i++) {
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, i, group.getRegistryList().size());
}
} else {
//否则初始化分片参数
if (shardingParam == null) {
shardingParam = new int[]{0, 1};
}
processTrigger(group, jobInfo, finalFailRetryCount, triggerType, shardingParam[0], shardingParam[1]);
}
}
重点是processTrigger(...)这个函数:
/**
* @param group job group, registry list may be empty
* @param jobInfo
* @param finalFailRetryCount
* @param triggerType
* @param index sharding index
* @param total sharding index
*/
private static void processTrigger(XxlJobGroup group, XxlJobInfo jobInfo, int finalFailRetryCount, TriggerTypeEnum triggerType, int index, int total) {
// param
//执行器阻塞策略:调度过于密集执行器来不及处理时的处理策略,策略包括:单机串行(默认)、丢弃后续调度、覆盖之前调度;
ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(jobInfo.getExecutorBlockStrategy(), ExecutorBlockStrategyEnum.SERIAL_EXECUTION); // block strategy
//执行器路由策略:执行器集群部署时提供丰富的路由策略,包括:第一个、最后一个、轮询、随机、一致性HASH、最不经常使用、最近最久未使用、故障转移、忙碌转移等
ExecutorRouteStrategyEnum executorRouteStrategyEnum = ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null); // route strategy
//处理分片参数:如果为分片广播则将index和total使用/进行拼接
String shardingParam = (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == executorRouteStrategyEnum) ? String.valueOf(index).concat("/").concat(String.valueOf(total)) : null;
// 1、save log-id
//保存任务执行日志
XxlJobLog jobLog = new XxlJobLog();
jobLog.setJobGroup(jobInfo.getJobGroup());
jobLog.setJobId(jobInfo.getId());
jobLog.setTriggerTime(new Date());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().save(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger start, jobId:{}", jobLog.getId());
// 2、init trigger-param
//初始化触发器参数
TriggerParam triggerParam = new TriggerParam();
triggerParam.setJobId(jobInfo.getId());
//取出执行器任务handler名称
triggerParam.setExecutorHandler(jobInfo.getExecutorHandler());
triggerParam.setExecutorParams(jobInfo.getExecutorParam());
triggerParam.setExecutorBlockStrategy(jobInfo.getExecutorBlockStrategy());
triggerParam.setExecutorTimeout(jobInfo.getExecutorTimeout());
triggerParam.setLogId(jobLog.getId());
triggerParam.setLogDateTim(jobLog.getTriggerTime().getTime());
triggerParam.setGlueType(jobInfo.getGlueType());
triggerParam.setGlueSource(jobInfo.getGlueSource());
triggerParam.setGlueUpdatetime(jobInfo.getGlueUpdatetime().getTime());
triggerParam.setBroadcastIndex(index);
triggerParam.setBroadcastTotal(total);
// 3、init address
//初始化执行器地址信息
String address = null;
ReturnT<String> routeAddressResult = null;
if (group.getRegistryList() != null && !group.getRegistryList().isEmpty()) {
//如果执行器的注册地址不为空
if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST == executorRouteStrategyEnum) {
//如果路由策略是分片广播,取出index对应的执行器地址信息
if (index < group.getRegistryList().size()) {
address = group.getRegistryList().get(index);
} else {
address = group.getRegistryList().get(0);
}
} else {
//获取执行器路由地址信息
//⚠️⚠️️⚠️非常重要:此处使用了策略模式, 根据不同的策略 使用不同的实现类,从而选举出本次使用执行器集群地址列表中不同的地址信息,这是xxl-job实现任务路由的地方
routeAddressResult = executorRouteStrategyEnum.getRouter().route(triggerParam, group.getRegistryList());
if (routeAddressResult.getCode() == ReturnT.SUCCESS_CODE) {
address = routeAddressResult.getContent();
}
}
} else {
routeAddressResult = new ReturnT<String>(ReturnT.FAIL_CODE, I18nUtil.getString("jobconf_trigger_address_empty"));
}
// 4、trigger remote executor
//触发远程执行器执行任务(执行器中的handler)
ReturnT<String> triggerResult = null;
if (address != null) {
//这里是重点⚠️⚠️⚠️:启动执行器, 向执行器发送指令都是从这个方法中执行的
triggerResult = runExecutor(triggerParam, address);
} else {
triggerResult = new ReturnT<String>(ReturnT.FAIL_CODE, null);
}
// 5、collection trigger info
StringBuffer triggerMsgSb = new StringBuffer();
triggerMsgSb.append(I18nUtil.getString("jobconf_trigger_type")).append(":").append(triggerType.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_admin_adress")).append(":").append(IpUtil.getIp());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regtype")).append(":")
.append((group.getAddressType() == 0) ? I18nUtil.getString("jobgroup_field_addressType_0") : I18nUtil.getString("jobgroup_field_addressType_1"));
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobconf_trigger_exe_regaddress")).append(":").append(group.getRegistryList());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorRouteStrategy")).append(":").append(executorRouteStrategyEnum.getTitle());
if (shardingParam != null) {
triggerMsgSb.append("(" + shardingParam + ")");
}
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorBlockStrategy")).append(":").append(blockStrategy.getTitle());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_timeout")).append(":").append(jobInfo.getExecutorTimeout());
triggerMsgSb.append("<br>").append(I18nUtil.getString("jobinfo_field_executorFailRetryCount")).append(":").append(finalFailRetryCount);
triggerMsgSb.append("<br><br><span style=\"color:#00c0ef;\" > >>>>>>>>>>>" + I18nUtil.getString("jobconf_trigger_run") + "<<<<<<<<<<< </span><br>")
.append((routeAddressResult != null && routeAddressResult.getMsg() != null) ? routeAddressResult.getMsg() + "<br><br>" : "").append(triggerResult.getMsg() != null ? triggerResult.getMsg() : "");
// 6、save log trigger-info
//更新执行日志
jobLog.setExecutorAddress(address);
jobLog.setExecutorHandler(jobInfo.getExecutorHandler());
jobLog.setExecutorParam(jobInfo.getExecutorParam());
jobLog.setExecutorShardingParam(shardingParam);
jobLog.setExecutorFailRetryCount(finalFailRetryCount);
//jobLog.setTriggerTime();
jobLog.setTriggerCode(triggerResult.getCode());
jobLog.setTriggerMsg(triggerMsgSb.toString());
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(jobLog);
logger.debug(">>>>>>>>>>> xxl-job trigger end, jobId:{}", jobLog.getId());
}
在这个函数中有两大重点:如果不为分片广播时,执行器集群下具体哪个选择哪个执行器地址进行任务调度(路由策略);启动执行器runExecutor(triggerParam, address)
先看略有策略是怎么实现的:
其实执行器的路由使用了策略模式:routeAddressResult = executorRouteStrategyEnum.getRouter().route(triggerParam, group.getRegistryList());
其中executorRouteStrategyEnum是从jobInfo中的配置(后台配置)经过枚举匹配得到的,得到具体的策略会获取路由器并执行route(triggerParam, group.getRegistryList())函数
顺遍看下路由策略枚举:
那么每个路由策略里是怎么实现的呢?我们分别点开看看:
-
第一个:直接获取执行器集群地址的第一个
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteFirst extends ExecutorRouter { @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { //获取执行器集群地址的第一个 return new ReturnT<String>(addressList.get(0)); } }
-
最后一个:直接获取执行器地址列表中的最后一个
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteLast extends ExecutorRouter { @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { //直接获取执行器地址列表中的最后一个 return new ReturnT<String>(addressList.get(addressList.size() - 1)); } }
-
失败转移:遍历集群地址列表心跳检测试探机器是否正常,如果失败,则继续调用下一台机器,成功则跳出循环,返回成功信息
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteFailover extends ExecutorRouter { @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { StringBuffer beatResultSB = new StringBuffer(); //循环集群地址 for (String address : addressList) { // beat ReturnT<String> beatResult = null; try { // 向执行器发送执行beat信息,试探该机器是否可以正常工作 ExecutorBiz executorBiz = XxlJobDynamicScheduler.getExecutorBiz(address); beatResult = executorBiz.beat(); } catch (Exception e) { logger.error(e.getMessage(), e); beatResult = new ReturnT<String>(ReturnT.FAIL_CODE, "" + e); } beatResultSB.append((beatResultSB.length() > 0) ? "<br><br>" : "") .append(I18nUtil.getString("jobconf_beat") + ":") .append("<br>address:").append(address) .append("<br>code:").append(beatResult.getCode()) .append("<br>msg:").append(beatResult.getMsg()); // beat success // 如果心跳检测结果返回成功,则使用该地址 if (beatResult.getCode() == ReturnT.SUCCESS_CODE) { beatResult.setMsg(beatResultSB.toString()); beatResult.setContent(address); return beatResult; } } return new ReturnT<String>(ReturnT.FAIL_CODE, beatResultSB.toString()); } }
-
忙碌转移:这个策略跟故障转移的原理一致,不同的是:故障转移是判断机器是否存活,而忙碌转移是向执行器发送消息判断该任务对应的线程是否处于执行状态
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteBusyover extends ExecutorRouter { @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { StringBuffer idleBeatResultSB = new StringBuffer(); // 循环集群地址 for (String address : addressList) { // beat ReturnT<String> idleBeatResult = null; try { // 向执行服务器发送消息,判断当前jobId对应的线程是否忙碌(处于执行中) ExecutorBiz executorBiz = XxlJobDynamicScheduler.getExecutorBiz(address); idleBeatResult = executorBiz.idleBeat(triggerParam.getJobId()); } catch (Exception e) { logger.error(e.getMessage(), e); idleBeatResult = new ReturnT<String>(ReturnT.FAIL_CODE, "" + e); } idleBeatResultSB.append((idleBeatResultSB.length() > 0) ? "<br><br>" : "") .append(I18nUtil.getString("jobconf_idleBeat") + ":") .append("<br>address:").append(address) .append("<br>code:").append(idleBeatResult.getCode()) .append("<br>msg:").append(idleBeatResult.getMsg()); // beat success // 心跳检测成功,则使用该地址 if (idleBeatResult.getCode() == ReturnT.SUCCESS_CODE) { idleBeatResult.setMsg(idleBeatResultSB.toString()); idleBeatResult.setContent(address); return idleBeatResult; } } return new ReturnT<String>(ReturnT.FAIL_CODE, idleBeatResultSB.toString()); } }
-
轮询
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteRound extends ExecutorRouter { //缓存map private static ConcurrentHashMap<Integer, Integer> routeCountEachJob = new ConcurrentHashMap<Integer, Integer>(); //缓存过期时间戳 private static long CACHE_VALID_TIME = 0; private static int count(int jobId) { // cache clear if (System.currentTimeMillis() > CACHE_VALID_TIME) { //缓存过期,清空缓存map routeCountEachJob.clear(); //缓存有效期为1天 CACHE_VALID_TIME = System.currentTimeMillis() + 1000 * 60 * 60 * 24; } // count++ Integer count = routeCountEachJob.get(jobId); // 当第一次执行轮循这个策略的时候,routeCountEachJob这个Map里面肯定是没有这个地址的,count==null // 当count==null或者count大于100万的时候,系统会默认在100之间随机一个数字,放入hashMap,然后返回该数字 // 当系统第二次进来的时候,count!=null 并且小于100万, 那么把count加1 之后返回出去。 // 为啥首次需要随机一次,而不是指定第一台呢? // 因为如果默认指定第一台的话,那么所有任务的首次加载全部会到第一台执行器上面去,这样会导致第一台机器刚开始的时候压力很大。 count = (count == null || count > 1000000) ? (new Random().nextInt(100)) : ++count; // 初始化时主动Random一次,缓解首次压力 routeCountEachJob.put(jobId, count); return count; } @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { // 通过count(jobId)拿到数字之后, 通过求余的方式,拿到执行器地址 // 例: count=2 , addressList.size() = 3 // 2%3 = 2 , 则拿addressList中下标为2的地址 String address = addressList.get(count(triggerParam.getJobId()) % addressList.size()); return new ReturnT<String>(address); } }
-
随机
/** * Created by xuxueli on 17/3/10. */ public class ExecutorRouteRandom extends ExecutorRouter { private static Random localRandom = new Random(); @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { //通过在集群列表的大小内随机拿出一台机器来执行 String address = addressList.get(localRandom.nextInt(addressList.size())); return new ReturnT<String>(address); } }
-
最不经常使用
/** * 单个JOB对应的每个执行器,使用频率最低的优先被选举 * a(*)、LFU(Least Frequently Used):最不经常使用,频率/次数 * b、LRU(Least Recently Used):最近最久未使用,时间 * <p> * Created by xuxueli on 17/3/10. */ public class ExecutorRouteLFU extends ExecutorRouter { //静态缓存Map,保存任务ID与对应的执行信息,其中执行器信息存放的是执行器地址与对应的执行次数 private static ConcurrentHashMap<Integer, HashMap<String, Integer>> jobLfuMap = new ConcurrentHashMap<Integer, HashMap<String, Integer>>(); //缓存过期时间戳 private static long CACHE_VALID_TIME = 0; public String route(int jobId, List<String> addressList) { // cache clear if (System.currentTimeMillis() > CACHE_VALID_TIME) { jobLfuMap.clear(); //重新设置缓存过期时间,默认为1天 CACHE_VALID_TIME = System.currentTimeMillis() + 1000 * 60 * 60 * 24; } // lfu item init HashMap<String, Integer> lfuItemMap = jobLfuMap.get(jobId); // Key排序可以用TreeMap+构造入参Compare;Value排序暂时只能通过ArrayList; if (lfuItemMap == null) { lfuItemMap = new HashMap<String, Integer>(); jobLfuMap.putIfAbsent(jobId, lfuItemMap); // 避免重复覆盖 } // put new // 对新加入的执行器地址信息的处理 for (String address : addressList) { // map中不包含,或者执行次数大于一万的时候,需要重新初始化执行器地址对应的执行次数 // 初始化的规则是在机器地址列表size里面进行随机 // 当运行一段时间后,有新机器加入的时候,此时,新机器初始化的执行次数较小,所以一开始,新机器的压力会比较大,后期慢慢趋于平衡 if (!lfuItemMap.containsKey(address) || lfuItemMap.get(address) > 1000000) { lfuItemMap.put(address, new Random().nextInt(addressList.size())); // 初始化时主动Random一次,缓解首次压力 } } // remove old // 对已废弃的执行器地址信息的处理 List<String> delKeys = new ArrayList<>(); for (String existKey : lfuItemMap.keySet()) { if (!addressList.contains(existKey)) { delKeys.add(existKey); } } if (delKeys.size() > 0) { for (String delKey : delKeys) { lfuItemMap.remove(delKey); } } // load least userd count address //给执行器执行次数进行排序,次数小的放前面 List<Map.Entry<String, Integer>> lfuItemList = new ArrayList<Map.Entry<String, Integer>>(lfuItemMap.entrySet()); Collections.sort(lfuItemList, new Comparator<Map.Entry<String, Integer>>() { @Override public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) { return o1.getValue().compareTo(o2.getValue()); } }); //取第一个,也就是最小的一个,将address返回,同时对该address对应的值加1 Map.Entry<String, Integer> addressItem = lfuItemList.get(0); String minAddress = addressItem.getKey(); addressItem.setValue(addressItem.getValue() + 1); return addressItem.getKey(); } @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { String address = route(triggerParam.getJobId(), addressList); return new ReturnT<String>(address); } }
-
最近最久未使用
/** * 单个JOB对应的每个执行器,最久为使用的优先被选举 * a、LFU(Least Frequently Used):最不经常使用,频率/次数 * b(*)、LRU(Least Recently Used):最近最久未使用,时间 * <p> * Created by xuxueli on 17/3/10. */ public class ExecutorRouteLRU extends ExecutorRouter { /** * 使用是linkHashMap来实现LRU算法的 */ //定义个静态的MAP, 用来存储任务ID对应的执行信息 private static ConcurrentHashMap<Integer, LinkedHashMap<String, String>> jobLRUMap = new ConcurrentHashMap<Integer, LinkedHashMap<String, String>>(); //缓存过期时间 private static long CACHE_VALID_TIME = 0 public String route(int jobId, List<String> addressList) { // cache clear if (System.currentTimeMillis() > CACHE_VALID_TIME) { jobLRUMap.clear(); CACHE_VALID_TIME = System.currentTimeMillis() + 1000 * 60 * 60 * 24; } // init lru LinkedHashMap<String, String> lruItem = jobLRUMap.get(jobId); if (lruItem == null) { /** * LinkedHashMap * a、accessOrder:ture=访问顺序排序(get/put时排序);false=插入顺序排期; * b、removeEldestEntry:新增元素时将会调用,返回true时会删除最老元素;可封装LinkedHashMap并重写该方法,比如定义最大容量,超出是返回true即可实现固定长度的LRU算法; */ lruItem = new LinkedHashMap<String, String>(16, 0.75f, true); jobLRUMap.putIfAbsent(jobId, lruItem); } // put new // 对新加入的执行器地址的处理 for (String address : addressList) { if (!lruItem.containsKey(address)) { lruItem.put(address, address); } } // remove old // 对已废弃的执行器地址的处理 List<String> delKeys = new ArrayList<>(); for (String existKey : lruItem.keySet()) { if (!addressList.contains(existKey)) { delKeys.add(existKey); } } if (delKeys.size() > 0) { for (String delKey : delKeys) { lruItem.remove(delKey); } } // load // 取头部的一个元素,也就是最久操作过的数据 String eldestKey = lruItem.entrySet().iterator().next().getKey(); String eldestValue = lruItem.get(eldestKey); return eldestValue; } @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { String address = route(triggerParam.getJobId(), addressList); return new ReturnT<String>(address); } }
-
一致性哈希
/** * 分组下机器地址相同,不同JOB均匀散列在不同机器上,保证分组下机器分配JOB平均;且每个JOB固定调度其中一台机器; * a、virtual node:解决不均衡问题 * b、hash method replace hashCode:String的hashCode可能重复,需要进一步扩大hashCode的取值范围 * Created by xuxueli on 17/3/10. */ public class ExecutorRouteConsistentHash extends ExecutorRouter { /** * 先构造一个长度为2^32的整数环(这个环被称为一致性Hash环),根据节点名称的Hash值(其分布为[0, 2^32-1]) * 将服务器节点放置在这个Hash环上,然后根据数据的Key值计算得到其Hash值(其分布也为[0, 2^32-1]),接着 * 在Hash环上顺时针查找距离这个Key值的Hash值最近的服务器节点,完成Key到服务器的映射查找。 */ private static int VIRTUAL_NODE_NUM = 5; /** * get hash code on 2^32 ring (md5散列的方式计算hash值) * * @param key * @return */ private static long hash(String key) { // md5 byte MessageDigest md5; try { md5 = MessageDigest.getInstance("MD5"); } catch (NoSuchAlgorithmException e) { throw new RuntimeException("MD5 not supported", e); } md5.reset(); byte[] keyBytes = null; try { keyBytes = key.getBytes("UTF-8"); } catch (UnsupportedEncodingException e) { throw new RuntimeException("Unknown string :" + key, e); } md5.update(keyBytes); byte[] digest = md5.digest(); // hash code, Truncate to 32-bits long hashCode = ((long) (digest[3] & 0xFF) << 24) | ((long) (digest[2] & 0xFF) << 16) | ((long) (digest[1] & 0xFF) << 8) | (digest[0] & 0xFF); long truncateHashCode = hashCode & 0xffffffffL; return truncateHashCode; } public String hashJob(int jobId, List<String> addressList) { // ------A1------A2-------A3------ // -----------J1------------------ TreeMap<Long, String> addressRing = new TreeMap<Long, String>(); for (String address : addressList) { for (int i = 0; i < VIRTUAL_NODE_NUM; i++) { // 通过自定义的Hash方法,得到服务节点的Hash值,同时放入treeMap long addressHash = hash("SHARD-" + address + "-NODE-" + i); addressRing.put(addressHash, address); } } // 得到JobId的Hash值 long jobHash = hash(String.valueOf(jobId)); // 调用treeMap的tailMap方法,拿到map中键大于jobHash的值列表 SortedMap<Long, String> lastRing = addressRing.tailMap(jobHash); // 如果addressRing中有比jobHash的那么直接取lastRing 的第一个 if (!lastRing.isEmpty()) { return lastRing.get(lastRing.firstKey()); } // 如果没有,则直接取addressRing的第一个 // 反正最终的效果是在Hash环上,顺时针拿离jobHash最近的一个值 return addressRing.firstEntry().getValue(); } @Override public ReturnT<String> route(TriggerParam triggerParam, List<String> addressList) { String address = hashJob(triggerParam.getJobId(), addressList); return new ReturnT<String>(address); } }
路由策略理清楚了,我们再看下另外一个重点:triggerResult = runExecutor(triggerParam, address)
进去这个函数:
/**
* run executor
*
* @param triggerParam
* @param address
* @return
*/
public static ReturnT<String> runExecutor(TriggerParam triggerParam, String address) {
ReturnT<String> runResult = null;
try {
//这是重点:获取ExecutorBiz代理对象,先从缓存(内存map)中取,取不到new一个XxlRpcReferenceBean放进缓存map
ExecutorBiz executorBiz = XxlJobDynamicScheduler.getExecutorBiz(address);
// 这个run方法不会最终执行,仅仅只是为了触发代理对象的invoke方法,同时将目标的类型传送给服务端,因为在代理对象的invoke的方法里面没有执行目标对象的方法
runResult = executorBiz.run(triggerParam);
} catch (Exception e) {
logger.error(">>>>>>>>>>> xxl-job trigger error, please check if the executor[{}] is running.", address, e);
runResult = new ReturnT<String>(ReturnT.FAIL_CODE, ThrowableUtil.toString(e));
}
StringBuffer runResultSB = new StringBuffer(I18nUtil.getString("jobconf_trigger_run") + ":");
runResultSB.append("<br>address:").append(address);
runResultSB.append("<br>code:").append(runResult.getCode());
runResultSB.append("<br>msg:").append(runResult.getMsg());
runResult.setMsg(runResultSB.toString());
return runResult;
}
继续进入:XxlJobDynamicScheduler.getExecutorBiz(address)
// ---------------------- executor-client ----------------------
private static ConcurrentHashMap<String, ExecutorBiz> executorBizRepository = new ConcurrentHashMap<String, ExecutorBiz>();
public static ExecutorBiz getExecutorBiz(String address) throws Exception {
// valid
if (address == null || address.trim().length() == 0) {
return null;
}
// load-cache
address = address.trim();
ExecutorBiz executorBiz = executorBizRepository.get(address);
if (executorBiz != null) {
return executorBiz;
}
// set-cache
// 创建ExecutorBiz的代理对象,重点在这个getObject方法
executorBiz = (ExecutorBiz) new XxlRpcReferenceBean(
// v2.0.2由"JETTY"方案调整为"NETTY_HTTP"方案,执行器内嵌netty-http-server提供服务,调度中心复用容器端口提供服务
NetEnum.NETTY_HTTP,
Serializer.SerializeEnum.HESSIAN.getSerializer(),
CallType.SYNC,
LoadBalance.ROUND,
ExecutorBiz.class,
null,
5000,
address,
XxlJobAdminConfig.getAdminConfig().getAccessToken(),
null,
null).getObject();//进入getObject看看真相
executorBizRepository.put(address, executorBiz);
return executorBiz;
}
跟进getObject方法,会发现进入了xxl-rpc-core的jar包里:在这个方法里会创建一个代理对象,同时构造XxlRpcRequest,最终通过RPC调用,调用执行器执行任务
public Object getObject() {
return Proxy.newProxyInstance(Thread.currentThread().getContextClassLoader(), new Class[]{this.iface}, new InvocationHandler() {
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
String className = method.getDeclaringClass().getName();
String varsion_ = XxlRpcReferenceBean.this.version;
String methodName = method.getName();
Class<?>[] parameterTypes = method.getParameterTypes();
Object[] parameters = args;
if (className.equals(XxlRpcGenericService.class.getName()) && methodName.equals("invoke")) {
Class<?>[] paramTypes = null;
if (args[3] != null) {
String[] paramTypes_str = (String[])((String[])args[3]);
if (paramTypes_str.length > 0) {
paramTypes = new Class[paramTypes_str.length];
for(int i = 0; i < paramTypes_str.length; ++i) {
paramTypes[i] = ClassUtil.resolveClass(paramTypes_str[i]);
}
}
}
className = (String)args[0];
varsion_ = (String)args[1];
methodName = (String)args[2];
parameterTypes = paramTypes;
parameters = (Object[])((Object[])args[4]);
}
if (className.equals(Object.class.getName())) {
XxlRpcReferenceBean.logger.info(">>>>>>>>>>> xxl-rpc proxy class-method not support [{}#{}]", className, methodName);
throw new XxlRpcException("xxl-rpc proxy class-method not support");
} else {
String finalAddress = XxlRpcReferenceBean.this.address;
if ((finalAddress == null || finalAddress.trim().length() == 0) && XxlRpcReferenceBean.this.invokerFactory != null && XxlRpcReferenceBean.this.invokerFactory.getServiceRegistry() != null) {
String serviceKey = XxlRpcProviderFactory.makeServiceKey(className, varsion_);
TreeSet<String> addressSet = XxlRpcReferenceBean.this.invokerFactory.getServiceRegistry().discovery(serviceKey);
if (addressSet != null && addressSet.size() != 0) {
if (addressSet.size() == 1) {
finalAddress = (String)addressSet.first();
} else {
finalAddress = XxlRpcReferenceBean.this.loadBalance.xxlRpcInvokerRouter.route(serviceKey, addressSet);
}
}
}
if (finalAddress != null && finalAddress.trim().length() != 0) {
XxlRpcRequest xxlRpcRequest = new XxlRpcRequest();
xxlRpcRequest.setRequestId(UUID.randomUUID().toString());
xxlRpcRequest.setCreateMillisTime(System.currentTimeMillis());
xxlRpcRequest.setAccessToken(XxlRpcReferenceBean.this.accessToken);
xxlRpcRequest.setClassName(className);
xxlRpcRequest.setMethodName(methodName);
xxlRpcRequest.setParameterTypes(parameterTypes);
xxlRpcRequest.setParameters(parameters);
XxlRpcFutureResponse futureResponse;
if (CallType.SYNC == XxlRpcReferenceBean.this.callType) {
futureResponse = new XxlRpcFutureResponse(XxlRpcReferenceBean.this.invokerFactory, xxlRpcRequest, (XxlRpcInvokeCallback)null);
Object var31;
try {
XxlRpcReferenceBean.this.client.asyncSend(finalAddress, xxlRpcRequest);
XxlRpcResponse xxlRpcResponse = futureResponse.get(XxlRpcReferenceBean.this.timeout, TimeUnit.MILLISECONDS);
if (xxlRpcResponse.getErrorMsg() != null) {
throw new XxlRpcException(xxlRpcResponse.getErrorMsg());
}
var31 = xxlRpcResponse.getResult();
} catch (Exception var21) {
XxlRpcReferenceBean.logger.info(">>>>>>>>>>> xxl-rpc, invoke error, address:{}, XxlRpcRequest{}", finalAddress, xxlRpcRequest);
throw (Throwable)(var21 instanceof XxlRpcException ? var21 : new XxlRpcException(var21));
} finally {
futureResponse.removeInvokerFuture();
}
return var31;
} else if (CallType.FUTURE == XxlRpcReferenceBean.this.callType) {
futureResponse = new XxlRpcFutureResponse(XxlRpcReferenceBean.this.invokerFactory, xxlRpcRequest, (XxlRpcInvokeCallback)null);
try {
XxlRpcInvokeFuture invokeFuture = new XxlRpcInvokeFuture(futureResponse);
XxlRpcInvokeFuture.setFuture(invokeFuture);
XxlRpcReferenceBean.this.client.asyncSend(finalAddress, xxlRpcRequest);
return null;
} catch (Exception var19) {
XxlRpcReferenceBean.logger.info(">>>>>>>>>>> xxl-rpc, invoke error, address:{}, XxlRpcRequest{}", finalAddress, xxlRpcRequest);
futureResponse.removeInvokerFuture();
throw (Throwable)(var19 instanceof XxlRpcException ? var19 : new XxlRpcException(var19));
}
} else if (CallType.CALLBACK == XxlRpcReferenceBean.this.callType) {
XxlRpcInvokeCallback finalInvokeCallback = XxlRpcReferenceBean.this.invokeCallback;
XxlRpcInvokeCallback threadInvokeCallback = XxlRpcInvokeCallback.getCallback();
if (threadInvokeCallback != null) {
finalInvokeCallback = threadInvokeCallback;
}
if (finalInvokeCallback == null) {
throw new XxlRpcException("xxl-rpc XxlRpcInvokeCallback(CallType=" + CallType.CALLBACK.name() + ") cannot be null.");
} else {
XxlRpcFutureResponse futureResponsex = new XxlRpcFutureResponse(XxlRpcReferenceBean.this.invokerFactory, xxlRpcRequest, finalInvokeCallback);
try {
XxlRpcReferenceBean.this.client.asyncSend(finalAddress, xxlRpcRequest);
return null;
} catch (Exception var20) {
XxlRpcReferenceBean.logger.info(">>>>>>>>>>> xxl-rpc, invoke error, address:{}, XxlRpcRequest{}", finalAddress, xxlRpcRequest);
futureResponsex.removeInvokerFuture();
throw (Throwable)(var20 instanceof XxlRpcException ? var20 : new XxlRpcException(var20));
}
}
} else if (CallType.ONEWAY == XxlRpcReferenceBean.this.callType) {
XxlRpcReferenceBean.this.client.asyncSend(finalAddress, xxlRpcRequest);
return null;
} else {
throw new XxlRpcException("xxl-rpc callType[" + XxlRpcReferenceBean.this.callType + "] invalid");
}
} else {
throw new XxlRpcException("xxl-rpc reference bean[" + className + "] address empty");
}
}
}
});
}
到此就完成了调度中心对执行器的任务调度
当然了,上面是从失败监控守护线程为入口分析的任务调度过程,那么正常情况下的任务调度是什么流程呢?这时候我们就需要看一下这个类:
RemoteHttpJobBean
xxl-job 所有的任务触发最终都是通过这个类来执行
该类继承关系如下:RemoteHttpJobBean > QuartzJobBean > Job
当quartz监听到有任务需要触发时,会调用JobRunShell的run方法,在该类的run方法中,会调用当前任务的JOB_CLASS的excute方法,
调用链最终会调用到remoteHttpJobBean的executeInternal()
而在executeInternal(...)方法中最终也调用了JobTriggerPoolHelper.trigger(...)方法,所以就明了了,之前已经分析了这个流程(上文中的标记Tag)
调度中心执行器注册
执行器项目启动成功后会定时向注册中心进行注册,
2019-05-29 22:17:06.559 INFO 610 --- [ Thread-21] c.x.r.r.i.reference.XxlRpcReferenceBean : >>>>>>>>>>> xxl-job, invoke error, address:http://127.0.0.1:8080/xxl-job-admin/api, XxlRpcRequestXxlRpcRequest{requestId='44e64cca-8d1c-4661-b108-b0e5391b076f', createMillisTime=1559139426245, accessToken='', className='com.xxl.job.core.biz.AdminBiz', methodName='registry', parameterTypes=[class com.xxl.job.core.biz.model.RegistryParam], parameters=[RegistryParam{registGroup='EXECUTOR', registryKey='xxl-job-executor-athena', registryValue='192.168.0.107:9999'}], version='null'}
可以看出请求信息如下:
address:http://127.0.0.1:8080/xxl-job-admin/api,
XxlRpcRequest{
requestId='44e64cca-8d1c-4661-b108-b0e5391b076f',
createMillisTime=1559139426245,
accessToken='',
className='com.xxl.job.core.biz.AdminBiz',
methodName='registry',
parameterTypes=[class com.xxl.job.core.biz.model.RegistryParam],
parameters=[
RegistryParam{registGroup='EXECUTOR',
registryKey='xxl-job-executor-athena',
registryValue='192.168.0.107:9999'}
],
version='null'
}
而调度中心接收注册请求的API为JobApiController
@Controller
public class JobApiController implements InitializingBean {
@Override
public void afterPropertiesSet() throws Exception {
}
@RequestMapping(AdminBiz.MAPPING)
@PermessionLimit(limit = false)
public void api(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
XxlJobDynamicScheduler.invokeAdminService(request, response);
}
}
跟进XxlJobDynamicScheduler.invokeAdminService(request, response)方法中:
public static void invokeAdminService(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
//主力注册请求
servletServerHandler.handle(null, request, response);
}
继续跟进servletServerHandler.handle(null, request, response)
public void handle(String target, HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
StringBuffer stringBuffer;
if (!"/services".equals(target)) {
stringBuffer = null;
XxlRpcRequest xxlRpcRequest;
try {
//将请求转换为XxlRpcRequest
xxlRpcRequest = this.parseRequest(request);
} catch (Exception var7) {
this.writeResponse(response, ThrowableUtil.toString(var7).getBytes());
return;
}
//处理请求XxlRpcRequest
XxlRpcResponse xxlRpcResponse = this.xxlRpcProviderFactory.invokeService(xxlRpcRequest);
byte[] responseBytes = this.xxlRpcProviderFactory.getSerializer().serialize(xxlRpcResponse);
this.writeResponse(response, responseBytes);
} else {
stringBuffer = new StringBuffer("<ui>");
Iterator i$ = this.xxlRpcProviderFactory.getServiceData().keySet().iterator();
while(i$.hasNext()) {
String serviceKey = (String)i$.next();
stringBuffer.append("<li>").append(serviceKey).append(": ").append(this.xxlRpcProviderFactory.getServiceData().get(serviceKey)).append("</li>");
}
stringBuffer.append("</ui>");
this.writeResponse(response, stringBuffer.toString().getBytes());
}
}
注意xxlRpcProviderFactory在前文讲解的XxlJobDynamicScheduler中的start()方法里的initRpcProvider()函数中已经给Rpc服务工厂加入了serviceData,xxlRpcProviderFactory.addService
跟进this.xxlRpcProviderFactory.invokeService(xxlRpcRequest)看看做了些什么
public XxlRpcResponse invokeService(XxlRpcRequest xxlRpcRequest) {
XxlRpcResponse xxlRpcResponse = new XxlRpcResponse();
xxlRpcResponse.setRequestId(xxlRpcRequest.getRequestId());
//通过请求信息className和version生成服务标识
String serviceKey = makeServiceKey(xxlRpcRequest.getClassName(), xxlRpcRequest.getVersion());
//获取服务类,在前文中我们知道,com.xxl.job.admin.core.schedule.XxlJobDynamicScheduler中的initRpcProvider方法内已经初始了serviceData
Object serviceBean = this.serviceData.get(serviceKey);
if (serviceBean == null) {
//未获取到服务Bean
xxlRpcResponse.setErrorMsg("The serviceKey[" + serviceKey + "] not found.");
return xxlRpcResponse;
} else if (System.currentTimeMillis() - xxlRpcRequest.getCreateMillisTime() > 180000L) {
//执行器端每次注册时会带上当时的时间戳,这里会判断如果当前时间比请求过来的时间超过180s,则超时
xxlRpcResponse.setErrorMsg("The timestamp difference between admin and executor exceeds the limit.");
return xxlRpcResponse;
} else if (this.accessToken != null && this.accessToken.trim().length() > 0 && !this.accessToken.trim().equals(xxlRpcRequest.getAccessToken())) {
//如果配置了accessToken,则校验accessToken是否正确,若不正确,直接return
xxlRpcResponse.setErrorMsg("The access token[" + xxlRpcRequest.getAccessToken() + "] is wrong.");
return xxlRpcResponse;
} else {
try {
Class<?> serviceClass = serviceBean.getClass();
String methodName = xxlRpcRequest.getMethodName();
Class<?>[] parameterTypes = xxlRpcRequest.getParameterTypes();
Object[] parameters = xxlRpcRequest.getParameters();
Method method = serviceClass.getMethod(methodName, parameterTypes);
method.setAccessible(true);
//通过反射获取到服务bean和method,并执行该方法
//通过请求参数我们可以知道:serviceClass即为:com.xxl.job.core.biz.AdminBiz 而要执行的方法即为:registry
Object result = method.invoke(serviceBean, parameters);
xxlRpcResponse.setResult(result);
} catch (Throwable var11) {
logger.error("xxl-rpc provider invokeService error.", var11);
xxlRpcResponse.setErrorMsg(ThrowableUtil.toString(var11));
}
return xxlRpcResponse;
}
}
那么我们就看一看AdminBiz的registry方法做了什么?AdminBizImpl是AdminBiz的具体实现类
@Override
public ReturnT<String> registry(RegistryParam registryParam) {
int ret = xxlJobRegistryDao.registryUpdate(registryParam.getRegistGroup(), registryParam.getRegistryKey(), registryParam.getRegistryValue());
if (ret < 1) {
xxlJobRegistryDao.registrySave(registryParam.getRegistGroup(), registryParam.getRegistryKey(), registryParam.getRegistryValue());
}
return ReturnT.SUCCESS;
}
由此可以看出,实际请求过来经过处理最终会存入XXL_JOB_QRTZ_TRIGGER_REGISTRY表中,如果有记录则更新updateTime,否则说明是第一次注册,则新建一条记录
这就是XXL-JOB调度中心的核心调度逻辑