首页 文章

telegraf - exec插件 - aws ec2 ebs volumen info - 度量解析错误,原因:[缺少字段]或遇到错误:[无效数字]

提问于
浏览
0

Machine - CentOS 7.2或Ubuntu 14.04 / 16.xx

Telegraf 版本:1.0.1

Python 版本:2.7.5

Telegraf支持名为:exec的INPUT插件 . 首先请参见 README doc中的 EXAMPLE 2 . 我不能使用JSON格式,因为它只使用指标的数字值 . 根据文档:

If using JSON, only numeric values are parsed and turned into floats. Booleans and strings will be ignored.

所以,这个想法很简单,你在exec插件部分指定一个脚本,它应该吐出一些有意义的信息(以JSON - 或 - 流入数据格式 in my case ,因为我有一些包含非数字值的指标)你想要的在一个很酷的仪表板中捕获/显示,例如 Wavefront Dashboard ,如下所示:
Wavefront

基本上,人们可以使用这些指标,标签,来自这些指标来源的来源,找出有关内存,CPU,磁盘,网络,其他有意义信息的各种信息,并在发生不需要的事情时使用这些信息创建警报 .

好的,我想出了这个python脚本:

#!/usr/bin/python

# sudo pip install boto3 if you don't have it on your machine.
import boto3


def generate(key, value):
    """
    Creates a nicely formatted Key(Value) item for output
    """
    return '{}="{}"'.format(key, value)
    #return '{}={}'.format(key, value)


def main():
    ec2 = boto3.resource('ec2', region_name="us-west-2")
    volumes = ec2.volumes.all()

    for vol in volumes:
        # You don't need to wrap everything in `str` unless it is not a string
        # By default most things will come back as a string 
        # unless they are very obviously not (complex, date time, etc)
        # but since we are printing these (and formatting them into strings)
        # the cast to string will be implicit and we don't need to make it 
        # explicit


        # vol is already a fully returned volume you are essentially DOUBLING
        # your API calls when you do this
        #iv = ec2.Volume(vol.id)
        output_parts = [
            # Volume level details
            generate('create_time', vol.create_time),
            generate('availability_zone', vol.availability_zone),
            generate('volume_id', vol.volume_id),
            generate('volume_type', vol.volume_type),
            generate('state', vol.state),
            generate('size', vol.size),
            generate('iops', vol.iops),
            generate('encrypted', vol.encrypted),
            generate('snapshot_id', vol.snapshot_id),
            generate('kms_key_id', vol.kms_key_id),
        ]

        for _ in vol.attachments:
            # Will get any attachments and since it is a list
            # we should write this to handle MULTIPLE attachments
            output_parts.extend([
                generate('InstanceId', _.get('InstanceId')),
                generate('InstanceVolumeState', _.get('State')),
                generate('DeleteOnTermination', _.get('DeleteOnTermination')),
                generate('Device', _.get('Device')),
            ])

        # only process when there are tags to process        
        if vol.tags:
            for _ in vol.tags:
                # Get all of the tags
                output_parts.extend([
                    generate(_.get('Key'), _.get('Value')),
                ])

        # output everything at once.. 
        print ','.join(output_parts)


if __name__ == '__main__':
    main()

此脚本将与AWS EC2 EBS卷对话并输出它可以找到的所有值(通常是您在AWS EC2 EBS卷控制台中看到的),并将该信息格式化为有意义的CSV格式,我将其重定向到.csv日志文件 . 我们不希望一直运行python脚本(AWS API限制/成本因素) .

因此,一旦创建了.csv文件,我就创建了这个小的shell脚本,我将在Telegraf 's exec plugin'部分中设置它 .

Shell script /tmp/aws-vol-info.sh 在Telegraf exec插件中设置为:

#!/bin/bash

cat /tmp/aws-vol-info.csv

使用exec插件( /etc/telegraf/telegraf.d/exec-plugin-aws-info.conf )创建的Telegraf配置文件:

#--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec

[[inputs.exec]]
  commands = ["/tmp/aws-vol-info.sh"]

  ## Timeout for each command to complete.
  timeout = "5s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  name_suffix = "_telegraf_execplugin"

tweaked .py(用于生成函数的Python脚本)生成以下三种类型的输出格式(.csv文件),并希望 test 在启用配置文件(/ etc / telegraf / telegraf)之前,telegraf将如何处理这些数据 . d / catch-aws-ebs-info.conf)并重启 telegraf service .

Format 1: (每个值都包含双引号 "

create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"

在telegraf目录上测试 telegraf 配置会给我以下错误 .

Command$ telegraf --config-directory=/etc/telegraf --test --input-filter=exec

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:37:48 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:37:48Z E! Errors encountered: [ metric parsing error, reason: [invalid field format], buffer: [create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_id="vol-058e1d47dgh721121",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"], index: [372]]
[vagrant@myvagrant ~] $

Format 2: (没有任何 " 双引号)

create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app

Getting same error 在测试Telegraf的exec插件配置时:

2017/03/10 00:45:01 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:45:01Z E! Errors encountered: [ metric parsing error, reason: [invalid value], buffer: [create_time=2017-01-09 23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90] secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [63]]

Format 3: (此格式在值中没有任何 " 双引号和空格 `` 字符) . 具有 _ 字符的替换空格 .

create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app

Still didn't work ,得到同样的错误:

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 00:50:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T00:50:30Z E! Errors encountered: [ metric parsing error, reason: [missing fields], buffer: [create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app], index: [476]]

Format 4 :如果我按照此页面关注 influx line protocolhttps://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

awsebs,Name=[company-2b-app90]_secondary,hostname=company-2b-app90-i-0jjb1boop26f42f50,high_availability=1,mirror=secondary,cluster=company,autoscale=true,role=app create_time=2017-01-09_23:24:29.428000+00:00,availability_zone=us-east-2b,volume_id=vol-058e1d47dgh721121,volume_type=gp2,state=in-use,size=8,iops=100,encrypted=False,snapshot_id=snap-06h1h1b91bh662avn,kms_key_id=None,InstanceId=i-0jjb1boop26f42f50,InstanceVolumeState=attached,DeleteOnTermination=True,Device=/dev/sda1

I'm getting this error

[vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
2017/03/10 02:34:30 I! Using config file: /etc/telegraf/telegraf.conf
* Plugin: inputs.exec, Collection 1
2017-03-10T02:34:30Z E! Errors encountered: [ invalid number]

HOW 我可以摆脱这个错误并使用telegraf与exec插件(运行.sh脚本)一起工作吗?


Other Info

Python脚本每天运行一次/两次(通过cron),telegraf将每1分钟运行一次(运行exec插件 - 运行.sh脚本 - 这将捕获.csv文件,以便telegraf可以以流入数据格式使用它) .

https://galaxy.ansible.com/wavefrontHQ/wavefront-ansible/

https://github.com/influxdata/telegraf/issues/2525

1 回答

  • 2

    似乎规则非常严格,我应该更仔细地观察 .

    您可以使用的任何程序的输出语法必须匹配或遵循下面显示的 INFLUX LINE PROTOCOL 格式以及随附的所有 RULES .

    例如:

    weather,location=us-midwest temperature=82 1465839830100400200
      |    -------------------- --------------  |
      |             |             |             |
      |             |             |             |
    +-----------+--------+-+---------+-+---------+
    |measurement|,tag_set| |field_set| |timestamp|
    +-----------+--------+-+---------+-+---------+
    

    您可以在此处阅读有关测量,标记,字段和 optional (时间戳)的更多信息:https://docs.influxdata.com/influxdb/v1.2/write_protocols/line_protocol_tutorial/

    Important rules are

    1)测量和标签集之间必须有 , 且没有 `` 空格 .

    2)标签集和字段集之间必须有 `` 空格 .

    3)对于标记键,标记值和字段键,如果要转义测量名称,标记或字段集名称及其值中的任何字符,则始终使用反斜杠字符\来转义!

    4)你无法逃避 \\

    5)线路协议处理表情符号没有问题:)

    6) OPTIONAL 中的TAG / TAG集(标记为逗号分隔)

    7)FIELD / FIELD集(字段,逗号分隔) - 每行需要 At least ONE .

    8) TIMESTAMP (格式中显示的最后一个值)是 OPTIONAL .

    9) VERY IMPORTANT QUOTING 规则如下:

    a) Never 双引号或单引号 timestamp . 它不是有效的线路协议 . 如果#有效,则'123123131312313'或"1231313213131"将无效 .

    b) Never 单引号 field values (即使它们是字符串!) . 它也不是有效的线路协议 . 即fieldname = 'giga'将无效 .

    c) Do not 双引号或单引号测量名称,标记键,标记值和字段键 . NOTE :这确实说!!!标签值!!!!好小心

    d) Do not 双引号 field values 只有浮点数,整数或布尔格式,否则InfluxDB会假设这些值是字符串 .

    e)双引号 field values 是字符串 .

    f)和 MOST IMPORTANT 一个(这将使你免于获得 BALD ):如果设置了一个没有双引号的FIELD值/即你认为它是一个整数值或浮动在一行(例如:任何人都会说字段 sizeiops )在某些其他行(telegraf将使用exec插件读取/解析文件中的任何位置)如果您设置了 non-integer 值( i.e. a String ),则会收到以下错误消息 Errors encountered: [ invalid number 错误 .

    所以要修复它, RULEif any possible FIELD value 对于一个FIELD键是一个 string ,然后你 MUST 确保使用 " 来包装它(在每一行),它是否有值1,200或1.5无关紧要一些行(例如:iops可以是 15 )和其他一些行( iops 可以是 None ) .

    Error message: Errors encountered: [ invalid number

    [vagrant@myvagrant ~] $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
    2017/03/10 11:13:18 I! Using config file: /etc/telegraf/telegraf.conf
    * Plugin: inputs.exec, Collection 1
    2017-03-10T11:13:18Z E! Errors encountered: [ invalid number metric parsing error, reason: [invalid field format], buffer: [awsebsvol,host=myvagrant ], index: [25]]
    

    所以,经过这一切的学习,很明显,首先我错过了Influx Line协议格式,而且 RULES !!

    现在,我希望我的python脚本生成的输出应该是这样的(根据INFLUX LINE协议) . 您只需更改.sh文件并使用 sed "s/^/awsec2ebs,/" 或也可以使用 sed "s/^/awsec2ebs,sourcehost=$(hostname) /" (注意:关闭sed / 字符前的空格)然后您可以在任何键=值对周围使用 " . 我确实将.py文件更改为不使用 " 用于 sizeiops 字段 .

    无论如何,如果输出是这样的:

    awsec2ebs,volume_id=vol-058e1d47dgh721121 create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",volume_type="gp2",state="in-use",size="8",iops="100",encrypted="False",snapshot_id="snap-06h1h1b91bh662avn",kms_key_id="None",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",DeleteOnTermination="True",Device="/dev/sda1",Name="[company-2b-app90] secondary",hostname="company-2b-app90-i-0jjb1boop26f42f50",high_availability="1",mirror="secondary",cluster="company",autoscale="true",role="app"
    

    在上面的最终工作解决方案中,我创建了一个名为 awsec2ebs 的测量,然后在此测量和标记键 volume_id 之间给出 , ,对于标记值,我没有使用任何 '" 引号然后我给了一个 `` 空格字符(因为我只是想要现在只有一个标签,否则你可以在标签集和字段集之间使用命令分隔方式并遵循规则来获得更多标签 .

    最后跑了 command

    $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec 哪个像神子一样工作!

    2017/03/10 03:33:54 I! Using config file: /etc/telegraf/telegraf.conf
    * Plugin: inputs.exec, Collection 1
    > awsec2ebs_telegraf_execplugin,volume_id=vol-058e1d47dgh721121,host=myvagrant volume_type="gp2",iops="100",kms_key_id="None",role="app",size="8",encrypted="False",InstanceId="i-0jjb1boop26f42f50",InstanceVolumeState="attached",Name="[company-2b-app90] secondary",snapshot_id="snap-06h1h1b91bh662avn",DeleteOnTermination="True",mirror="secondary",cluster="company",autoscale="true",high_availability="1",create_time="2017-01-09 23:24:29.428000+00:00",availability_zone="us-east-2b",state="in-use",Device="/dev/sda1",hostname="company-2b-app90-i-0jjb1boop26f42f50" 1489116835000000000
    [vagrant@myvagrant ~] $ echo $?
    0
    

    在上面的例子中, size 是唯一一个永远是数字/数值的字段,所以我们不需要用 " 包装它,但这取决于你 . 回想一下上面的MOST IMPORTANT规则以及它产生的错误 .

    So final python file is:

    #!/usr/bin/python
    
    #Do `sudo pip install boto3` first
    import boto3
    
    def generate(key, value, qs, qe):
        """
        Creates a nicely formatted Key(Value) item for output
        """
        return '{}={}{}{}'.format(key, qs, value, qe)
    
    def main():
        ec2 = boto3.resource('ec2', region_name="us-west-2")
        volumes = ec2.volumes.all()
    
        for vol in volumes:
            # You don't need to wrap everything in `str` unless it is not a string
            # By default most things will come back as a string
            # unless they are very obviously not (complex, date time, etc)
            # but since we are printing these (and formatting them into strings)
            # the cast to string will be implicit and we don't need to make it
            # explicit
    
            # vol is already a fully returned volume you are essentially DOUBLING
            # your API calls when you do this
            #iv = ec2.Volume(vol.id)
            output_parts = [
                # Volume level details
                generate('volume_id', vol.volume_id, '"', '"'),
                generate('create_time', vol.create_time, '"', '"'),
                generate('availability_zone', vol.availability_zone, '"', '"'),
                generate('volume_type', vol.volume_type, '"', '"'),
                generate('state', vol.state, '"', '"'),
                generate('size', vol.size, '', ''),
                #The following vol.iops variable can be a number or None so you must wrap it with double quotes otherwise "invalid number" error will come.
                generate('iops', vol.iops, '"', '"'),
                generate('encrypted', vol.encrypted, '"', '"'),
                generate('snapshot_id', vol.snapshot_id, '"', '"'),
                generate('kms_key_id', vol.kms_key_id, '"', '"'),
            ]
    
            for _ in vol.attachments:
                # Will get any attachments and since it is a list
                # we should write this to handle MULTIPLE attachments
                output_parts.extend([
                    generate('InstanceId', _.get('InstanceId'), '"', '"'),
                    generate('InstanceVolumeState', _.get('State'), '"', '"'),
                    generate('DeleteOnTermination', _.get('DeleteOnTermination'), '"', '"'),
                    generate('Device', _.get('Device'), '"', '"'),
                ])
    
            # only process when there are tags to process
            if vol.tags:
                for _ in vol.tags:
                    # Get all of the tags
                    output_parts.extend([
                        generate(_.get('Key'), _.get('Value'), '"', '"'),
                    ])
    
            # output everything at once..
            print ','.join(output_parts)
    
    if __name__ == '__main__':
        main()
    

    Final aws-vol-info.sh is:

    #!/bin/bash
    
    cat aws-vol-info.csv | sed "s/^/awsebsvol,host=`hostname|head -1|sed "s/[ \t][ \t]*/_/g"` /"
    

    Final telegraf exec plugin config file is/etc/telegraf/telegraf.d/exec-plugin-aws-info.conf )给.conf任何名字:

    #--- https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec
    
    [[inputs.exec]]
      commands = ["/some/valid/path/where/csvfileexists/aws-vol-info.sh"]
    
      ## Timeout for each command to complete.
      timeout = "5s"
    
      # Data format to consume.
      # NOTE json only reads numerical measurements, strings and booleans are ignored.
      data_format = "influx"
    
      name_suffix = "_telegraf_exec"
    

    Run: 现在一切都会好起来的!

    $ telegraf --config-directory=/etc/telegraf --test --input-filter=exec
    

相关问题