本文中的安装测试,主要在CentOS 6.5下完成。先来张Grafna效果图,左边是我们的数据源Graphite,右边是我们的Grafna的效果图:
安装Diamond最直接和简单的方法就是自己编译RPM或者DEB的安装包, Diamond在这方面提供了比较好的支持。
# cd /root # yum install -y git rpm-build python-configobj python-setuptools # git clone https://github.com/python-diamond/Diamond # cd Diamond # make rpm # cd dist # rpm -ivh diamond-*.noarch.rpm
# cp -f /etc/diamond/diamond.conf.example /etc/diamond/diamond.conf # cat << EOF | tee -a /etc/diamond/diamond.conf [configs] path = "/etc/diamond/collectors/" extension = ".conf" EOF # cat << EOF | tee /etc/diamond/collectors/net.conf [collectors] [[NetworkCollector]] enabled = True EOF
# yum install -y graphite-web graphite-web-selinux # yum install -y mysql mysql-server MySQL-python # yum install -y python-carbon python-whisper
# /etc/init.d/mysqld start # mysql -e "CREATE DATABASE graphite;" -u root # mysql -e "GRANT ALL PRIVILEGES ON graphite.* TO 'graphite'@'localhost' IDENTIFIED BY 'sysadmin';" -u root # mysql -e 'FLUSH PRIVILEGES;' -u root
# SECRET_KEY=$(md5sum /etc/passwd | awk {'print $1'}) # echo "SECRET_KEY = '$SECRET_KEY'" | tee -a /etc/graphite-web/local_settings.py # echo "TIME_ZONE = 'Asia/Shanghai'" | tee -a /etc/graphite-web/local_settings.py # cat << EOF | tee -a /etc/graphite-web/local_settings.py DATABASES = { 'default': { 'NAME': 'graphite', 'ENGINE': 'django.db.backends.mysql', 'USER': 'graphite', 'PASSWORD': 'sysadmin', } } EOF # cd /usr/lib/python2.6/site-packages/graphite # ./manage.py syncdb --noinput # echo "from django.contrib.auth.models import User; User.objects.create_superuser('admin', 'admin@hihuron.com', 'sysadmin')" | ./manage.py shell
Listen <VirtualHost *:10000> ServerName graphite-web DocumentRoot "/usr/share/graphite/webapp" ErrorLog /var/log/httpd/graphite-web-error.log CustomLog /var/log/httpd/graphite-web-access.log common Alias /media/ "/usr/lib/python2.6/site-packages/django/contrib/admin/media/" WSGIScriptAlias / /usr/share/graphite/graphite-web.wsgi WSGIImportScript /usr/share/graphite/graphite-web.wsgi process-group=%{GLOBAL} application-group=%{GLOBAL} <Location "/content/"> SetHandler None </Location> <Location "/media/"> SetHandler None </Location> </VirtualHost>
# HOST_IP=$(ifconfig | sed -En 's/;s/.*inet (addr:)?(([0-9]*/.){3}[0-9]*).*//2/p' | head -1) # sed -i "/^/[/[GraphiteHandler/]/]$/,/^/[.*/]/s/^host =$/host = $HOST_IP/" /etc/diamond/diamond.conf # sed -i "/^/[/[GraphitePickleHandler/]/]$/,/^/[.*/]/s/^host =$/host = $HOST_IP/" /etc/diamond/diamond.conf
# service carbon-cache restart # service httpd restart # service diamond restart
Grafana最主要的功能就是对数据的呈现,基于一切可提供time series的后台服务。这里面我们使用Graphite为Grafana提供数据。
# yum install -y nodejs # rpm -ivh https://grafanarel.s3.amazonaws.com/builds/grafana-2.5.0-1.x86_64.rpm # sudo /sbin/chkconfig --add grafana-server # sed -i 's/^;http_port = 3000$/http_port = 10001/g' /etc/grafana/grafana.ini # sudo service grafana-server start
Grafana提供了非常丰富的REST API,我们不仅可以直接利用Grafana作为数据呈现层,还可以利用REST API直接将Grafana的Graph集成在我们的应用中。下面我们利用REST API为Grafana添加datasource。
# curl -i 'http://admin:admin@localhost:10001/api/datasources' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -d '{"name": "graphite", "type": "graphite", "url": "http://localhost:10000", "access": "proxy", "basicAuth": false}'
Diamond是基于Python开发的,但是由于CentOS 6.5的Python版本较低(2.6),所以直接使用社区版本的Ceph监控时,会导致错误。可以通过简单的修改进行修复。
def _get_stats_from_socket(self, name): """Return the parsed JSON data returned when ceph is told to dump the stats from the named socket. In the event of an error error, the exception is logged, and an empty result set is returned. """ try: #json_blob = subprocess.check_output( # [self.config['ceph_binary'], # '--admin-daemon', # name, # 'perf', # 'dump', # ]) cmd = [ self.config['ceph_binary'], '--admin-daemon', name, 'perf', 'dump', ] process = subprocess.Popen(cmd, stdout=subprocess.PIPE) json_blob = process.communicate()[0]
在实际运维Ceph过程中,ceph osd perf是一个非常重要的指令,能够观察出集群中磁盘的latency的信息,通过观察变化,可以辅助判断磁盘出现性能问题。Diamond的设计中,每个Diamond Agent只会采集自己本机的指标,所以我们在添加的时候,只需要在一个节点上增加这个监控就可以了。在ceph.py中结尾处新增加一个类。
class CephOsdCollector(CephCollector): def _get_stats(self): """Return the parsed JSON data returned when ceph is told to dump the stats from the named socket. In the event of an error error, the exception is logged, and an empty result set is returned. """ try: #json_blob = subprocess.check_output( # [self.config['ceph_binary'], # '--admin-daemon', # name, # 'perf', # 'dump', # ]) cmd = [ self.config['ceph_binary'], 'osd', 'perf', '--format=json', ] process = subprocess.Popen(cmd, stdout=subprocess.PIPE) json_blob = process.communicate()[0] except subprocess.CalledProcessError, err: self.log.info('Could not get stats from %s: %s', name, err) self.log.exception('Could not get stats from %s' % name) return {} try: json_data = json.loads(json_blob) except Exception, err: self.log.info('Could not parse stats from %s: %s', name, err) self.log.exception('Could not parse stats from %s' % name) return {} return json_data def _publish_stats(self, stats): """Given a stats dictionary from _get_stats_from_socket, publish the individual values. """ for perf in stats['osd_perf_infos']: counter_prefix = 'osd.' + str(perf['id']) for stat_name, stat_value in flatten_dictionary( perf['perf_stats'], prefix=counter_prefix, ): self.log.info('stat_name is %s', stat_name) self.log.info('stat_value is %s', stat_value) self.publish_gauge(stat_name, stat_value) def collect(self): """ Collect stats """ self.log.info('in ceph osd collector') stats = self._get_stats() self._publish_stats(stats)
# cat << EOF | tee /etc/diamond/collectors/ceph.conf [collectors] [[CephCollector]] enabled = True [[CephOsdCollector]] enabled = True EOF
# service diamond restart