转载

基于dropwizard/metrics ，kafka，zabbix构建应用统计数据收集展示系统

想要实现的功能

应用可以用少量的代码，实现统计某类数据的功能
统计的数据可以很方便地展示

metrics

metrics，按字面意思是度量，指标。

举具体的例子来说，一个web服务器：

- 一分钟内请求多少次？

- 平均请求耗时多长？

- 最长请求时间？

- 某个方法的被调用次数，时长？

以缓存为例：

- 平均查询缓存时间？

- 缓存获取不命中的次数/比例？

以jvm为例：

- GC的次数？

- Old Space的大小？

在一个应用里，需要收集的metrics数据是多种多样的，需求也是各不同的。需要一个统一的metrics收集，统计，展示平台。

流行的metrics的库

https://github.com/dropwizard/metrics

Java实现，很多开源项目用到，比如Hadoop，kafka。下面称为dropwizard/metrics。

https://github.com/tumblr/colossus

Scala实现，把数据存到OpenTsdb上。

spring boot 项目里的metrics：

http://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-metrics.html

spring boot里的metrics很多都是参考dropwizard/metrics的。

metrics的种类

dropwizard/metrics 里主要把metrics分为下面几大类：

https://dropwizard.github.io/metrics/3.1.0/getting-started/

gauge用于测量一个数值。比如队列的长度：

public class QueueManager {
    private final Queue queue;
    public QueueManager(MetricRegistry metrics, String name) {
        this.queue = new Queue();
        metrics.register(MetricRegistry.name(QueueManager.class, name, "size"),
                         new Gauge<Integer>() {
                             @Override
                             public Integer getValue() {
                                 return queue.size();
                             }
                         });
    }
}

Counters

counter是AtomicLong类型的gauge。比如可以统计阻塞在队列里的job的数量：

private final Counter pendingJobs = metrics.counter(name(QueueManager.class, "pending-jobs"));
public void addJob(Job job) {
    pendingJobs.inc();
    queue.offer(job);
}
public Job takeJob() {
    pendingJobs.dec();
    return queue.take();
}

Histograms

histogram统计数据的分布。比如最小值，最大值，中间值，还有中位数，75百分位, 90百分位, 95百分位, 98百分位, 99百分位, and 99.9百分位的值(percentiles)。

比如request的大小的分布：

private final Histogram responseSizes = metrics.histogram(name(RequestHandler.class, "response-sizes"));

public void handleRequest(Request request, Response response) {
    // etc
    responseSizes.update(response.getContent().length);
}

timer正如其名，统计的是某部分代码/调用的运行时间。比如统计response的耗时：

private final Timer responses = metrics.timer(name(RequestHandler.class, "responses"));

public String handleRequest(Request request, Response response) {
    final Timer.Context context = responses.time();
    try {
        // etc;
        return "OK";
    } finally {
        context.stop();
    }
}

Health Checks

这个实际上不是统计数据。是接口让用户可以自己判断系统的健康状态。如判断数据库是否连接正常：

final HealthCheckRegistry healthChecks = new HealthCheckRegistry();

public class DatabaseHealthCheck extends HealthCheck {
    private final Database database;

    public DatabaseHealthCheck(Database database) {
        this.database = database;
    }

    @Override
    public HealthCheck.Result check() throws Exception {
        if (database.isConnected()) {
            return HealthCheck.Result.healthy();
        } else {
            return HealthCheck.Result.unhealthy("Cannot connect to " + database.getUrl());
        }
    }
}

Metrics Annotation

利用dropwizard/metrics 里的annotation，可以很简单的实现统计某个方法，某个值的数据。

如：

/**
     * 统计调用的次数和时间
     */
    @Timed
    public void call() {
    }

    /**
     * 统计登陆的次数
     */
    @Counted
    public void userLogin(){
    }

想要详细了解各种metrics的实际效果，简单的运行下测试代码，用ConsoleReporter输出就可以知道了。

metrics数据的传输和展示

dropwizard/metrics 里提供了reporter的接口，用户可以自己实现如何处理metrics数据。

dropwizard/metrics有不少现成的reporter：

ConsoleReporter  输出到stdout
JmxReporter  转化为MBean
metrics-servlets  提供http接口，可以查询到metrics信息
CsvReporter 输出为CSV文件
Slf4jReporter 以log方式输出
GangliaReporter  上报到Ganglia
GraphiteReporter 上报到Graphite

上面的各种reporter中，Ganglia开源多年，但缺少一些监控的功能，图形展示也很简陋。Graphite已经停止开发了。

而公司所用的监控系统是zabbix，而dropwizard/metrics没有现成的zabbix reporter。