动态配置支持
该特性默认未打开, 目前SkyWalking支持两种动态配置:Single和Group。
- Single: {configKey}:{configVaule}
- Gourp
{configKey}: |{subItemkey1}:{subItemValue1}
|{subItemkey2}:{subItemValue2}
|{subItemkey3}:{subItemValue3}
...
Single 支持的配置包含 alarm-settings, 因此使用 Single 模式, 使用配置 alarm.default.alarm-settings
作为key, 覆盖 alarm-settings.yml 文件内容, 内容配置请查看 alarm-setttings.xml
nacos , 使用的模块为 NacosConfigurationProvider
, 核心类为NacosConfigWatcherRegister
, 对应 dataId名称生成规则如下
// org.apache.skywalking.oap.server.configuration.api.ConfigWatcherRegister
@Getter
protected class WatcherHolder {
private ConfigChangeWatcher watcher;
private final String key;
public WatcherHolder(ConfigChangeWatcher watcher) {
this.watcher = watcher;
// 此处为名称生成规则
this.key = String.join(
".", watcher.getModule(), watcher.getProvider().name(),
watcher.getItemName()
);
}
}
作为 AlarmRulesWatcher, 注册的信息为 alarm.default.alarm-settings
// org.apache.skywalking.oap.server.core.alarm.provider.AlarmRulesWatcher
// moduleName: alarm, provider: default, itemName: alarm-settings
super(AlarmModule.NAME, provider, "alarm-settings");
其他的动态配置注册的监听配置列表, 可以通过断点设置在
NacosConfigWatcherRegister#readConfig
中看到
配置
- 注意区分 cluster 模块和 Configuation 模块, 两块都有 nacos 配置, 但功能完全不一样的
集群管理配置
nacos:
serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
hostPort: ${SW_CLUSTER_NACOS_HOST_PORT:nacos.soa.dev.test.com:8848}
# Nacos Configuration namespace, 这里填写namesapce 的id, 不是名称
namespace: ${SW_CLUSTER_NACOS_NAMESPACE:"xxxxxx"}
# Nacos auth username
username: ${SW_CLUSTER_NACOS_USERNAME:"nacos"}
password: ${SW_CLUSTER_NACOS_PASSWORD:"nacos"}
# Nacos auth accessKey
accessKey: ${SW_CLUSTER_NACOS_ACCESSKEY:""}
secretKey: ${SW_CLUSTER_NACOS_SECRETKEY:""}
配置完成, 可以在 nacos 的服务列表中查看注册信息
配置中心配置
nacos:
# Nacos Server Host
serverAddr: ${SW_CONFIG_NACOS_SERVER_ADDR:nacos.soa.dev.abc.com}
# Nacos Server Port
port: ${SW_CONFIG_NACOS_SERVER_PORT:8848}
# Nacos Configuration Group
group: ${SW_CONFIG_NACOS_SERVER_GROUP:DEFAULT_GROUP}
# Nacos Configuration namespace, 这里填写namesapce 的id, 不是名称
namespace: ${SW_CONFIG_NACOS_SERVER_NAMESPACE:xxxx}
# Unit seconds, sync period. Default fetch every 60 seconds.
period: ${SW_CONFIG_NACOS_PERIOD:60}
# Nacos auth username
username: ${SW_CONFIG_NACOS_USERNAME:"nacos"}
password: ${SW_CONFIG_NACOS_PASSWORD:"nacos"}
# Nacos auth accessKey
accessKey: ${SW_CONFIG_NACOS_ACCESSKEY:""}
secretKey: ${SW_CONFIG_NACOS_SECRETKEY:""}
nacos 中注册的监听信息可以查询到, 说明已经配置生效
修改 nacos 中配置可以看到以下日志
检查逻辑
核心类 AlarmCore
和 RunningRule
RunningRule.Window: metrics 窗口, 通过保存最近 period 个 bucket 来计算值
- 消息检测, 发送逻辑
public void start(List<AlarmCallback> allCallbacks) {
LocalDateTime now = LocalDateTime.now();
lastExecuteTime = now;
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
try {
final List<AlarmMessage> alarmMessageList = new ArrayList<>(30);
LocalDateTime checkTime = LocalDateTime.now();
// 获取上次执行时间,和当前时间
int minutes = Minutes.minutesBetween(lastExecuteTime, checkTime).getMinutes();
boolean[] hasExecute = new boolean[]{false};
alarmRulesWatcher.getRunningContext().values().forEach(ruleList -> ruleList.forEach(runningRule -> {
// 这里定时器 10s 执行一次, 但是需要一分钟后才能执行
if (minutes > 0) {
// 时间窗口向后移动, 移除掉最开始加入的 bucket, 添加新的bucket并设置为null
runningRule.moveTo(checkTime);
/*
* Don't run in the first quarter per min, avoid to trigger false alarm.
*/
// 不在每分钟的前15秒执行, 不知道为啥, 检查当前保存的 Metrics 是否满足条件, 满足的添加的通知消息列表
if (checkTime.getSecondOfMinute() > 15) {
hasExecute[0] = true;
alarmMessageList.addAll(runningRule.check());
}
}
}));
// Set the last execute time, and make sure the second is `00`, such as: 18:30:00
// 保存上次执行时间(时间转为分钟模式)
if (hasExecute[0]) {
lastExecuteTime = checkTime.minusSeconds(checkTime.getSecondOfMinute());
}
if (alarmMessageList.size() > 0) {
if (alarmRulesWatcher.getCompositeRules().size() > 0) {
List<AlarmMessage> messages = alarmRulesWatcher.getCompositeRuleEvaluator().evaluate(alarmRulesWatcher.getCompositeRules(), alarmMessageList);
alarmMessageList.addAll(messages);
}
List<AlarmMessage> filteredMessages = alarmMessageList.stream().filter(msg -> !msg.isOnlyAsCondition()).collect(Collectors.toList());
if (filteredMessages.size() > 0) {
// 执行实际的消息发送
allCallbacks.forEach(callback -> callback.doAlarm(filteredMessages));
}
}
} catch (Exception e) {
LOGGER.error(e.getMessage(), e);
}
}, 10, 10, TimeUnit.SECONDS);
}
- 消息收集检测
RunningRule#in
, 提供 RunningRule.Window 中的 values 维护了最近的 metrics, 添加逻辑为- 首先将 bucket 移动到最新的位置, 一般添加的metrcis 会比通过定时器增加的时间更新, 或者在同一个 bucket 内
- 如果定时器时间大于指标收集时间, 则说明可能客户端时间存在问题, 直接返回
- 设置当前 metrics 数据到当前时间 bucket 的位置上
public class Window {
private LocalDateTime endTime;
private int period;
private int silenceCountdown;
private LinkedList<Metrics> values;
// 初始化
public Window(int period) {
this.period = period;
// -1 means silence countdown is not running.
silenceCountdown = -1;
values = new LinkedList<>();
for (int i = 0; i < period; i++) {
values.add(null);
}
}
public void add(Metrics metrics) {
long bucket = metrics.getTimeBucket();
LocalDateTime timeBucket = TIME_BUCKET_FORMATTER.parseLocalDateTime(bucket + "");
this.lock.lock();
try {
if (this.endTime == null) {
init();
this.endTime = timeBucket;
}
int minutes = Minutes.minutesBetween(timeBucket, this.endTime).getMinutes();
if (minutes < 0) {
this.moveTo(timeBucket);
minutes = 0;
}
if (minutes >= values.size()) {
// too old data
// also should happen, but maybe if agent/probe mechanism time is not right.
if (log.isTraceEnabled()) {
log.trace(
"Timebucket is {}, endTime is {} and value size is {}", timeBucket, this.endTime,
values.size()
);
}
return;
}
this.values.set(values.size() - minutes - 1, metrics);
} finally {
this.lock.unlock();
}
if (log.isTraceEnabled()) {
log.trace("Add metric {} to window {}", metrics, transformValues(this.values));
}
}
}
- 静默处理: 在每分钟来检查时判断 silenceCountdown 是否为0 , 不为0 说明静默期未过, 为0 已过, 返回消息
public Optional<AlarmMessage> checkAlarm() {
if (isMatch()) {
/*
* When
* 1. Alarm trigger conditions are satisfied.
* 2. Isn't in silence stage, judged by SilenceCountdown(!=0).
*/
if (silenceCountdown < 1) {
silenceCountdown = silencePeriod;
return Optional.of(new AlarmMessage());
} else {
silenceCountdown--;
}
} else {
silenceCountdown--;
}
return Optional.empty();
}