Oozie-监控体系-Instrumentation

Oozie 第一个版的监控是自定义的,后面引进了 做监控当下主流的框架 Codahale Metrics 本文从oozie自身的Instrumentation介绍,以及后面如果对接 Codahale Metrics

Oozie自身的Instrumentation 框架定义了四种类型的监控TimersCountersVariablesSampler,所有的指标都通过 group和name两个key来进行分类。

public class Instrumentation {
    private ScheduledExecutorService scheduler;
    private Lock counterLock;
    private Lock timerLock;
    private Lock variableLock;
    private Lock samplerLock;
    private Map<String, Map<String, Map<String, Object>>> all;
    private Map<String, Map<String, Element<Long>>> counters;
    private Map<String, Map<String, Element<Timer>>> timers;
    private Map<String, Map<String, Element<Variable>>> variables;
    private Map<String, Map<String, Element<Double>>> samplers;

cron 计时器的定义:

/**
 * Cron is a stopwatch that can be started/stopped several times. <p/> This class is not thread safe, it does not
 * need to be. <p/> It keeps track of the total time (first start to last stop) and the running time (total time
 * minus the stopped intervals). <p/> Once a Cron is complete it must be added to the corresponding group/name in a
 * Instrumentation instance.
 */
public static class Cron {
    private long start;
    private long end;
    private long lapStart;
    private long own;
    private long total;
    private boolean running;
    /**
     * Creates new Cron, stopped, in zero.
     */
    public Cron() {
        running = false;
    }
    /**
     * Start the cron. It cannot be already started.
     */
    public void start() {
        if (!running) {
            if (lapStart == 0) {
                lapStart = System.currentTimeMillis();
                if (start == 0) {
                    start = lapStart;
                    end = start;
                }
            }
            running = true;
        }
    }
    /**
     * Stops the cron. It cannot be already stopped.
     */
    public void stop() {
        if (running) {
            end = System.currentTimeMillis();
            if (start == 0) {
                start = end;
            }
            total = end - start;
            if (lapStart > 0) {
                own += end - lapStart;
                lapStart = 0;
            }
            running = false;
        }
    }

Counter定义:

  • action.executors - Counters related to actions.

  • [action_type]#action.[operation_performed] (start, end, check, kill)

  • [action_type]#ex.[exception_type] (transient, non-transient, error, failed)

  • e.g.

  • callablequeue - count of events in various execution queues.

  • delayed.queued: Number of commands queued with a delay.

  • executed: Number of executions from the queue.

  • failed: Number of queue attempts which failed.

  • queued: Number of queued commands.

  • commands: Execution Counts for various commands. This data is generated for all commands.

  • action.end

  • action.notification

  • action.start

  • callback

  • job.info

  • job.notification

  • purge

  • signal

  • start

  • submit
    -jobs: Job Statistics

  • start: Number of started jobs.

  • submit: Number of submitted jobs.

  • succeeded: Number of jobs which succeeded.

  • kill: Number of killed jobs.
    -authorization

  • failed: Number of failed authorization attempts.
    -webservices: Number of request to various web services along with the request type.

  • failed: total number of failed requests.

  • requests: total number of requests.

  • admin

  • admin-GET

  • callback

  • callback-GET

  • jobs

  • jobs-GET

  • jobs-POST

  • version

  • version-GET


private static class Counter extends AtomicLong implements Element<Long> {
    /**
     * Return the counter snapshot.
     *
     * @return the counter snapshot.
     */
    public Long getValue() {
        return get();
    }
    /**
     * Return the String representation of the counter value.
     *
     * @return the String representation of the counter value.
     */
    public String toString() {
        return Long.toString(get());
    }
}

Timer定义:

  • action.executors - Counters related to actions.
  • [action_type]#action.[operation_performed] (start, end, check, kill)
  • callablequeue
  • time.in.queue: Time a callable spent in the queue before being processed.
  • commands: Generated for all Commands.
  • action.end
  • action.notification
  • action.start
  • callback
  • job.info
  • job.notification
  • purge
  • signal
  • start
  • submit
  • Timers related to various database operations.
  • create-workflow
  • load-action
  • load-pending-actions
  • load-running-actions
  • load-workflow
  • load-workflows
  • purge-old-workflows
  • save-action
  • update-action
  • update-workflow
    -webservices
  • admin
  • admin-GET
  • callback
  • callback-GET
  • jobs
  • jobs-GET
  • jobs-POST
  • version
  • version-GET

public static class Timer implements Element<Timer> {
    Lock lock = new ReentrantLock();
    private long ownTime;
    private long totalTime;
    private long ticks;
    private long ownSquareTime;
    private long totalSquareTime;
    private long ownMinTime;
    private long ownMaxTime;
    private long totalMinTime;
    private long totalMaxTime;
    /**
     * Timer constructor. <p/> It is project private for test purposes.
     */
    Timer() {    }
    /**
     * Return the String representation of the timer value.
     *
     * @return the String representation of the timer value.
     */
    public String toString() {
        return XLog.format("ticks[{0}] totalAvg[{1}] ownAvg[{2}]", ticks, getTotalAvg(), getOwnAvg());
    }
    /**
     * Return the timer snapshot.
     *
     * @return the timer snapshot.
     */
    public Timer getValue() {
        try {
            lock.lock();
            Timer timer = new Timer();
            timer.ownTime = ownTime;
            timer.totalTime = totalTime;
            timer.ticks = ticks;
            timer.ownSquareTime = ownSquareTime;
            timer.totalSquareTime = totalSquareTime;
            timer.ownMinTime = ownMinTime;
            timer.ownMaxTime = ownMaxTime;
            timer.totalMinTime = totalMinTime;
            timer.totalMaxTime = totalMaxTime;
            return timer;
        }
        finally {
            lock.unlock();
        }
    }
    /**
     * Add a cron to a timer. <p/> It is project private for test purposes.
     *
     * @param cron Cron to add.
     */
    void addCron(Cron cron) {
        try {
            lock.lock();
            long own = cron.getOwn();
            long total = cron.getTotal();
            ownTime += own;
            totalTime += total;
            ticks++;
            ownSquareTime += own * own;
            totalSquareTime += total * total;
            if (ticks == 1) {
                ownMinTime = own;
                ownMaxTime = own;
                totalMinTime = total;
                totalMaxTime = total;
            }
            else {
                ownMinTime = Math.min(ownMinTime, own);
                ownMaxTime = Math.max(ownMaxTime, own);
                totalMinTime = Math.min(totalMinTime, total);
                totalMaxTime = Math.max(totalMaxTime, total);
            }
        }
        finally {
            lock.unlock();
        }
    }

Variable 定义:

  • oozie

  • version: Oozie build version.

  • configuration

  • config.dir: directory from where the configuration files are loaded. If null, all configuration files are loaded from the classpath

  • config.file: the Oozie custom configuration for the instance.
    -jvm

  • free.memory

  • max.memory

  • total.memory

-locks

  • locks: Locks are used by Oozie to synchronize access to workflow and action entries when the database being used does not support 'select for update' queries. (MySQL supports 'select for update').
    -logging
  • config.file: Log4j '.properties' configuration file.
  • from.classpath: whether the config file has been read from the claspath or from the config directory.
  • reload.interval: interval at which the config file will be realoded. 0 if the config file will never be reloaded, when loaded from the classpath is never reloaded.
public interface Variable<T> extends Element<T> {}

Sampler定义:

  • callablequeue
  • delayed.queue.size: The size of the delayed command queue.
  • queue.size: The size of the command queue.
  • threads.active: The number of threads processing callables.
  • jdbc:
  • connections.active: Active Connections over the past minute.
  • webservices: Requests to the Oozie HTTP endpoints over the last minute.
  • admin
  • callback
  • job
  • jobs
  • requests
  • version
private static class Sampler implements Element<Double>, Runnable {
    private Lock lock = new ReentrantLock();
    private int samplingInterval;
    private Variable<Long> variable;
    private long[] values;
    private int current;
    private long valuesSum;
    private double rate;
    public Sampler(int samplingPeriod, int samplingInterval, Variable<Long> variable) {
        this.samplingInterval = samplingInterval;
        this.variable = variable;
        values = new long[samplingPeriod / samplingInterval];
        valuesSum = 0;
        current = -1;
    }
    public int getSamplingInterval() {
        return samplingInterval;
    }
    public void run() {
        try {
            lock.lock();
            long newValue = variable.getValue();
            if (current == -1) {
                valuesSum = newValue;
                current = 0;
                values[current] = newValue;
            }
            else {
                current = (current + 1) % values.length;
                valuesSum = valuesSum - values[current] + newValue;
                values[current] = newValue;
            }
            rate = ((double) valuesSum) / values.length;
        }
        finally {
            lock.unlock();
        }
    }
    public Double getValue() {
        return rate;
    }
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,905评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,140评论 2 379
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,791评论 0 335
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,483评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,476评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,516评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,905评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,560评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,778评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,557评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,635评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,338评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,925评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,898评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,142评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,818评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,347评论 2 342

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,351评论 0 23
  • 我依然喜欢你,但是我不再爱你了。分手吧。 这是桃子发给前男友的短信。 桃子和男朋友相恋了三年,这也不是他们第一次分...
    晨一文阅读 601评论 0 4
  • 一荤一素七块钱,一荤两素九块钱。他问我要吃什么菜,我说‘一荤一素吧。’他回过头来看了我一眼,说,‘爱吃什么,这么几...
    千卷阅读 261评论 0 2
  • 即使再难过,也要笑;即使再生气,生气一分钟;因为会笑的人有福气,别把自己的福气赶走了。
    小亮妹Megan阅读 265评论 0 0