python实现系统指标采集

前言

这周公司新上的项目需要压测，根据各个压测场景，需要拿到linux服务器不同的系统消耗指标。

思来想去觉得还是使用python更轻量，也更容易被后续的第三方agent来执行，就写了这样的一个指标采集工具。

指标采集

指标包括cpu、内存、io、网卡等一系列常见的性能指标，具体的指标以及计算也可以参考github上的淘宝开源项目tsar

整体的采集思路非常简单，分为两种：

读取特定的文件，解析文件，格式化数据；
执行指定命令，获取输出，格式化数据

所有的指标都乘以了一个系数，我贪快，所以全都直接写的10000 :(

具体的数据解析可以自行cat输出对应的文件，结合命令输出来对比

1.负载

从/proc/loadavg文件中读取

def collector_load():
    # 读取负载文件
    load_file = open("/proc/loadavg")
    content = load_file.read().split()
    load_file.close()
    load_avg = {
        "load1": int(string.atof(content[0]) * 10000),
        "load5": int(string.atof(content[1]) * 10000),
        "load15": int(string.atof(content[2]) * 10000)
    }
    return load_avg

2. 内存

从/proc/meminfo中读取

# 采集内存信息
def collect_memory_info():
    # 读取内存信息文件
    memory_buffer = {}
    with open("/proc/meminfo") as mem_file:
        for line in mem_file:
            memory_buffer[line.split(':')[0]] = string.atoi(line.split(':')[1].split()[0])
    # 过滤只取关注的指标
    mem_total = memory_buffer["MemTotal"]
    mem_free = memory_buffer["MemFree"] + memory_buffer["Buffers"] + memory_buffer["Cached"]
    mem_util = int((float(mem_total - mem_free)/float(mem_total)) * 10000)
    mem_buff = int(float(memory_buffer["Buffers"])/float(mem_total) * 10000)
    mem_cache = int(float(memory_buffer["Cached"])/float(mem_total) * 10000)
    mem_info = {
        "mem_buff": mem_buff,
        "mem_util": mem_util,
        "mem_cache": mem_cache,
    }
    return mem_info

3. cpu信息

从/proc/stat中获取

# 采集cpu信息
def collect_cpu_info():
    cpu_buffer = {}
    with open("/proc/stat") as cpu_file:
        for line in cpu_file:
            line_fields = line.split()
            if line_fields[0] == "cpu":
                total = 0
                for field in line_fields:
                    if field == "cpu":
                        continue
                    total += string.atoi(field)

                cpu_buffer = {
                    "User": string.atoi(line_fields[1]),
                    "Sys": string.atoi(line_fields[3]),
                    "Idle": string.atoi(line_fields[4]),
                    "Steal": string.atoi(line_fields[8]),
                    "Wait": string.atoi(line_fields[5]),
                    "Total": total
                }
                break
    return cpu_buffer

这个指标在系统中是累加的，因此需要再次进行计算，即本次结果与上次结果的差值才是本段时间内的指标值：

# 计算cpu数据
def calculate_cpu_info():
    global last_cpu_info
    cpu_info = collect_cpu_info()
    if last_cpu_info is None:
        last_cpu_info = cpu_info
        return {}
    else:
        delta_total = cpu_info["Total"] - last_cpu_info["Total"]
        delta_user = cpu_info["User"] - last_cpu_info["User"]
        delta_sys = cpu_info["Sys"] - last_cpu_info["Sys"]
        delta_idle = cpu_info["Idle"] - last_cpu_info["Idle"]
        delta_wait = cpu_info["Wait"] - last_cpu_info["Wait"]
        delta_steal = cpu_info["Steal"] - last_cpu_info["Steal"]
        last_cpu_info = cpu_info
        return {
            "cpu_user": int(float(delta_user)/float(delta_total) * 10000),
            "cpu_sys": int(float(delta_sys)/float(delta_total) * 10000),
            "cpu_wait": int(float(delta_wait)/float(delta_total) * 10000),
            "cpu_steal": int(float(delta_steal)/float(delta_total) * 10000),
            "cpu_idle": int(float(delta_idle)/float(delta_total) * 10000),
            "cpu_util": int(float(delta_total - delta_idle - delta_wait - delta_steal)/float(delta_total) * 10000)
        }

4. IO相关

从文件/proc/diskstats中读取

# 采集io
def collect_io_info():
    io_buffer = {}
    with open("/proc/diskstats") as io_file:
        for line in io_file:
            line_fields = line.split()
            device_name = line_fields[2]
            if line_fields[3] == "0":
                continue
            if should_handle_device(device_name):
                io_buffer[device_name] = {
                    "ReadRequest": string.atoi(line_fields[3]),
                    "WriteRequest": string.atoi(line_fields[7]),
                    "MsecRead": string.atoi(line_fields[6]),
                    "MsecWrite": string.atoi(line_fields[10]),
                    "MsecTotal": string.atoi(line_fields[12]),
                    "Timestamp": int(time.time())
                }
    return io_buffer

# 当前的硬盘设备是否需要使用
def should_handle_device(device):
    normal = len(device) == 3 and device.startswith("sd") or device.startswith("vd")
    aws = len(device) >= 4 and device.startswith("xvd") or device.startswith("sda")
    return normal or aws

这个指标也是累加的，需要进行求差：

# 计算io信息
def calculate_io_info():
    global last_io_info
    io_info = collect_io_info()
    result = []
    if last_io_info is not None:
        for key in io_info.keys():
            total_duration = io_info[key]["Timestamp"] - last_io_info[key]["Timestamp"]
            read_use_io = io_info[key]["MsecRead"] - last_io_info[key]["MsecRead"]
            write_use_io = io_info[key]["MsecWrite"] - last_io_info[key]["MsecWrite"]
            read_io = io_info[key]["ReadRequest"] - last_io_info[key]["ReadRequest"]
            write_io = io_info[key]["WriteRequest"] - last_io_info[key]["WriteRequest"]
            read_write_io = io_info[key]["MsecTotal"] - last_io_info[key]["MsecTotal"]
            readwrite_io = read_io + write_io
            io_awit = 0
            if readwrite_io > 0:
                io_awit = int(float(read_use_io + write_use_io) / float(readwrite_io) * 10000)
            result.append({
                "io_rs": int((read_io/total_duration) * 10000),
                "io_ws": int((write_io/total_duration) * 10000),
                "io_await": io_awit,
                "io_util": int(float(read_write_io) / (total_duration * 1000) * 10000),
            })

    last_io_info = io_info
    return result

5. 采集网卡

网卡数据从/proc/net/dev中读取

# 采集网卡流量数据
def collect_net_info():
    net_buffer = {}
    with open("/proc/net/dev") as net_file:
        for line in net_file:
            if line.find(":") < 0:
                continue
            card_name = line.split(":")[0].strip()
            if should_collect_card(card_name):
                line_fields = line.split(":")[1].lstrip().split()
                net_buffer[card_name] = {
                    "InBytes": string.atoi(line_fields[0]),
                    "InPackets": string.atoi(line_fields[1]),
                    "InErrors": string.atoi(line_fields[2]),
                    "InDrops": string.atoi(line_fields[3]),
                    "OutBytes": string.atoi(line_fields[8]),
                    "OutPackets": string.atoi(line_fields[9]),
                    "OutErrors": string.atoi(line_fields[10]),
                    "OutDrops": string.atoi(line_fields[11])
                }
    return net_buffer

# 是否需要采集相应的网卡
def should_collect_card(line):
    return line.startswith("eth") or line.startswith("em")

网卡指标也是一个累加值，需要求差：

# 计算网卡的指标
def calculate_net_info():
    global last_net_info
    net_info = collect_net_info()
    result = []
    if last_net_info is not None:
        for key in net_info.keys():
            result.append({
                "in_bytes": (net_info[key]["InBytes"] - last_net_info[key]["InBytes"]) * 10000,
                "in_packets": (net_info[key]["InPackets"] - last_net_info[key]["InPackets"]) * 10000,
                "in_errors": (net_info[key]["InErrors"] - last_net_info[key]["InErrors"]) * 10000,
                "in_drops": (net_info[key]["InDrops"] - last_net_info[key]["InDrops"]) * 10000,
                "out_bytes": (net_info[key]["OutBytes"] - last_net_info[key]["OutBytes"]) * 10000,
                "out_packets": (net_info[key]["OutPackets"] - last_net_info[key]["OutPackets"]) * 10000,
                "out_errors": (net_info[key]["OutErrors"] - last_net_info[key]["OutErrors"]) * 10000,
                "out_drops": (net_info[key]["OutDrops"] - last_net_info[key]["OutDrops"]) * 10000
            })
    last_net_info = net_info
    return result

6. 采集tcp指标

tcp与udp的指标信息都可以从/proc/net/snmp中读取

# 采集tcp相关数据
def collect_tcp_info():
    tcp_buffer = {}
    is_title = True
    with open("/proc/net/snmp") as tcp_file:
        for line in tcp_file:
            protocol_name = line.split(":")[0].strip()
            if protocol_name == "Tcp":
                if is_title:
                    is_title = False
                    continue
                else:
                    line_fields = line.split(":")[1].lstrip().split()
                    tcp_buffer = {
                        "ActiveOpens": string.atoi(line_fields[4]),
                        "PassiveOpens": string.atoi(line_fields[5]),
                        "InSegs": string.atoi(line_fields[9]),
                        "OutSegs": string.atoi(line_fields[10]),
                        "RetransSegs": string.atoi(line_fields[11]),
                        "CurrEstab": string.atoi(line_fields[8]),
                    }
                    break
    return tcp_buffer

里面有累加值也有实时值，当前的连接数为实时值：

# 计算tcp数据
def calculate_tcp_info():
    global last_tcp_info
    tcp_info = collect_tcp_info()
    result = {}
    if last_tcp_info is not None:
        outSegsTcp = tcp_info["OutSegs"] - last_tcp_info["OutSegs"]
        retransRate = float(tcp_info["RetransSegs"] - last_tcp_info["RetransSegs"])/float(outSegsTcp)
        result = {
            "tcp_active": (tcp_info["ActiveOpens"] - last_tcp_info["ActiveOpens"]) * 10000,
            "tcp_passive": (tcp_info["PassiveOpens"] - last_tcp_info["PassiveOpens"]) * 10000,
            "tcp_inseg": (tcp_info["InSegs"] - last_tcp_info["InSegs"]) * 10000,
            "tcp_outseg": outSegsTcp * 10000,
            "tcp_established": tcp_info["CurrEstab"] * 10000,
            "tcp_retran": int(retransRate * 10000)
        }
    last_tcp_info = tcp_info
    return result

7. 采集指定进程的cpu与内存

有两种方式，其一是执行ps命令，取到的是当前进程启动之后的平均cpu与内存占用；其二是在proc/pid下面读取，在这里用的是第一种。

指定的进程的名称通过ps auxc | grep "进程名1|进程名2|...."来获取进程id

# 采集指定进程数据
def collect_process_info():
    global processes
    process_info = {}
    if processes == "":
        return process_info
    process_filter = processes.replace(",", "\|")
    process_filter = "'" + process_filter + "'"
    commandline = "ps auxc | grep " + process_filter
    status_code, result = commands.getstatusoutput(commandline)
    if status_code == 0:
        # 分割结果
        result_array = result.split("\n")
        for item in result_array:
            item_fields = item.split()
            process_info[item_fields[10]] = {
                "process_cpu_util": int(string.atof(item_fields[2]) * 10000),
                "process_mem_util": int(string.atof(item_fields[3]) * 10000)
            }
    return process_info

如果需实时的数据，应该从proce/pid中的文件夹去读取数据，拿pid的方式和上述的方式是一样的

最后编辑于：2017.12.04 04:43:58

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 200,738评论 5赞 472
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 84,377评论 2赞 377
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 147,774评论 0赞 333
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,032评论 1赞 272
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,015评论 5赞 361
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,239评论 1赞 278
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,724评论 3赞 393
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,374评论 0赞 255
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,508评论 1赞 294
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,410评论 2赞 317
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,457评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,132评论 3赞 316
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,733评论 3赞 303
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,804评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,022评论 1赞 255
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,515评论 2赞 346
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,116评论 2赞 341