0 测试环境
本次测试的集群软件环境:
Hadoop-1.2.1、Zookeeper-3.4.6、HBase-0.94.8、jdk7、centos7操作系统。
本次测试的集群硬件环境:
双CPU,四核处理器,32G内存,10T硬盘的PC物理机两台;
双CPU,四核处理器,8G内存,10T硬盘的PC物理机三台。
1 测试方案及结果分析
注:以下各测试每开启一个线程就会读取一个数据文件,即几线程就对应几个数据文件。
1.1 WAL
测试目标:测试WAL对写入性能有多大的影响。
测试原理:其实不推荐关闭WAL,不过关了的确可以提升性能。因为HBase
在写数据前会先把操作持久化在WAL中,以保证在异常情况下,HBase可以按照
WAL的记录来恢复还未持久化的数据。
测试条件:4线程+12M Write Buffer Size +120 RPC Handler
测试数据样例:
京EA31276 1640751143 CAM14357485 2 2 5 0 1 0 2016-01-2309:25:01 20160123092501023 JGJ59034 0 0
测试结果:
WAL状态 | 用时(秒) | 性能(MB/S) |
开启WAL | 408 | 0.56 |
关闭WAL | 41 | 5.6 |
测试结论:关闭WAL的写入性能要远远大于开启WAL的性能,所以在数据可
以容忍部分丢失的前提下,推荐关闭WAL。
1.2 HBase的最优RPC Handler参数测试
测试目标:找到本集群最合适的PRC Handler数。
测试原理:该配置定义了每个Region Server上的RPC Handler的数量。Region
Server通过RPC Handler接收外部请求并加以处理。所以提升RPCHandler
的数量可以一定程度上提高HBase接收请求的能力。当然,handler数量也
不是越大越好,这要取决于节点的硬件情况。
测试条件:禁止auto flush和WAL,Write Buffer Size 12M
参数配置:
修改hbase-site.xml的hbase.regionserver.handler.count配置
hbase.regionserver.handler.count
100
测试数据样例:
京EA31276 1640751143 CAM14357485 2 2 5 0 1 0 2016-01-2309:25:01 20160123092501023 JGJ59034 0 0
测试结果:
线程数 | 用时(秒) | 写入数据量(条) | RPC Handler值 | 性能(MB/s) |
4 | 49 | 230W | 10 | 4.7 |
4 | 45 | 230W | 200 | 5.1 |
4 | 44 | 230W | 100 | 5.2 |
4 | 46 | 230W | 50 | 5 |
4 | 45 | 230W | 150 | 5.1 |
4 | 41 | 230W | 120 | 5.6 |
测试结论:在该测试环境下,当RPC Handler数低于120时,增加RPC Handler
数可以提升写入性能,当大于120时,性能开始下降,得出本集群环境下,HBase
最优RPC Handler数约为120。
1.3 HBase的最优Write Buffer Size参数测试
测试目标:找到本集群最合适的Write Buffer Size大小。
测试原理:HBase Client会在数据累积到设置的阈值后才提交Region Server。
这样做的好处在于可以减少RPC连接次数。
测试条件:禁止auto flush和WAL,代码语句分别如下所示
table.setAutoFlush(false); //禁止auto flush
put.setWriteToWAL(false); //禁止WAL
测试命令:java -jar SendData_60L.jar 线程数 BufferSize flushSize 表名
测试数据样例 :
京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894
测试结果:
线程 | Buffer Size(M) | Flush Size( 条) | 性能(条/s ) |
4 | 5 | 10000 | 240240 |
4 | 6 | 10000 | 205920 |
4 | 7 | 10000 | 240240 |
4 | 8 | 10000 | 205920 |
4 | 9 | 10000 | 240240 |
4 | 10 | 10000 | 205920 |
4 | 11 | 10000 | 205920 |
4 | 12 | 10000 | 240240 |
4 | 15 | 10000 | 205920 |
4 | 20 | 10000 | 205920 |
5 | 5 | 10000 | 300300 |
5 | 6 | 10000 | 257400 |
5 | 7 | 10000 | 257400 |
5 | 8 | 10000 | 225225 |
5 | 9 | 10000 | 257400 |
5 | 10 | 10000 | 257400 |
5 | 11 | 10000 | 225225 |
5 | 12 | 10000 | 225225 |
5 | 15 | 10000 | 257400 |
5 | 20 | 10000 | 257400 |
8 | 5 | 10000 | 360360 |
8 | 6 | 10000 | 320320 |
8 | 7 | 10000 | 360360 |
8 | 8 | 10000 | 320320 |
8 | 9 | 10000 | 320320 |
8 | 10 | 10000 | 320320 |
8 | 11 | 10000 | 360360 |
8 | 12 | 10000 | 320320 |
8 | 15 | 10000 | 320320 |
8 | 20 | 10000 | 320320 |
9 | 5 | 10000 | 324324 |
9 | 6 | 10000 | 324324 |
9 | 7 | 10000 | 324324 |
9 | 8 | 10000 | 324324 |
9 | 9 | 10000 | 360360 |
9 | 10 | 10000 | 360360 |
9 | 11 | 10000 | 324324 |
9 | 12 | 10000 | 360360 |
9 | 15 | 10000 | 360360 |
9 | 20 | 10000 | 294840 |
10 | 5 | 10000 | 360360 |
10 | 6 | 10000 | 360360 |
10 | 7 | 10000 | 360360 |
10 | 8 | 10000 | 360360 |
10 | 9 | 10000 | 360360 |
10 | 10 | 10000 | 400400 |
10 | 11 | 10000 | 400400 |
10 | 12 | 10000 | 400400 |
10 | 15 | 10000 | 327600 |
10 | 20 | 10000 | 360360 |
11 | 5 | 10000 | 360360 |
11 | 6 | 10000 | 396396 |
11 | 7 | 10000 | 396396 |
11 | 8 | 10000 | 360360 |
11 | 9 | 10000 | 360360 |
11 | 10 | 10000 | 396396 |
12 | 5 | 10000 | 432432 |
12 | 6 | 10000 | 393120 |
12 | 7 | 10000 | 393120 |
12 | 8 | 10000 | 432432 |
12 | 9 | 10000 | 432432 |
12 | 10 | 10000 | 393120 |
14 | 10 | 10000 | 388080 |
14 | 9 | 10000 | 420420 |
14 | 8 | 10000 | 388080 |
14 | 7 | 10000 | 420420 |
14 | 6 | 10000 | 420420 |
14 | 5 | 10000 | 420420 |
20 | 5 | 10000 | 400400 |
20 | 6 | 10000 | 423952 |
20 | 7 | 10000 | 379326 |
20 | 8 | 10000 | 379326 |
20 | 9 | 10000 | 379326 |
20 | 10 | 10000 | 423952 |
20 | 11 | 10000 | 379326 |
19 | 5 | 10000 | 380380 |
19 | 6 | 10000 | 380380 |
19 | 7 | 10000 | 402755 |
19 | 8 | 10000 | 402755 |
19 | 9 | 10000 | 402755 |
19 | 10 | 10000 | 402755 |
19 | 11 | 10000 | 456456 |
19 | 15 | 10000 | 402755 |
18 | 5 | 10000 | 432432 |
18 | 6 | 10000 | 432432 |
18 | 7 | 10000 | 432432 |
18 | 8 | 10000 | 405405 |
18 | 9 | 10000 | 405405 |
18 | 10 | 10000 | 405405 |
18 | 11 | 10000 | 405405 |
17 | 5 | 10000 | 437580 |
17 | 6 | 10000 | 408408 |
17 | 7 | 10000 | 437580 |
17 | 8 | 10000 | 437580 |
17 | 9 | 10000 | 408408 |
17 | 10 | 10000 | 408408 |
16 | 5 | 10000 | 443520 |
16 | 6 | 10000 | 411840 |
16 | 7 | 10000 | 411840 |
16 | 8 | 10000 | 443520 |
16 | 9 | 10000 | 384384 |
16 | 10 | 10000 | 411840 |
15 | 5 | 10000 | 415800 |
15 | 6 | 10000 | 450450 |
15 | 7 | 10000 | 443520 |
15 | 8 | 10000 | 450450 |
15 | 9 | 10000 | 491400 |
15 | 10 | 10000 | 415800 |
13 | 5 | 10000 | 425880 |
13 | 6 | 10000 | 425880 |
13 | 7 | 10000 | 390390 |
13 | 8 | 10000 | 360360 |
13 | 9 | 10000 | 425880 |
13 | 10 | 10000 | 390390 |
测试结论:由结果分析可知,在本集群环境下,开启15个线程并设置bufferSize为9M时结果最好。
1.4 HBase集群Regionserver 数量测试
测试目标:测试Regionserver数量对写入性能的影响趋势。
测试原理:写入的压力一般都会集中在RegionServer上,当RegionServer
数量增加时,在写入一定的前提下每一台Server的负载压力就会减少。
测试条件:4线程+10M Write Buffer Size +120 RPC Handler。
测试数据样例 :
京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894
测试结果:
Regionserver 数量 | 性能( 万条/秒) |
2 | 13.3 |
3 | 23.2 |
4 | 26.5 |
测试结论:增加Regionserver服务器,可以提升写入速度。
1.5 减少HBase列名字节数
测试目标:测试列所占空间大小对写入性能的影响。
测试原理:减少列名所占字节数,会减少单个cell的总信息量大小。
测试条件:8线程+11M Write Buffer Size +120 RPC Handler+HBase表记录
每行60列。
测试数据样例:
京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894
测试结果:
列名总大小 | 性能( 条/S) |
减少列字节大小前(480B ) | 320320 |
减少列字节大小后(240B ) | 320320 |
测试结论:改变列名占有的空间大小几乎不影响性能。可能是因为,每一列减少的两个字节数相对于整个cell所占的字节数而言太小了。
1.6 每列value值大小
测试目标:测试列value的大小对写入性能的影响。
测试条件:4线程+11M Write Buffer Size +120 RPC Handler+HBase表记录
每行60列。
测试数据样例:
大小(0.026KB)
京EA31276 2016-01-2309:25:00 116.7739757 39.9181911 ;
大小(0.126KB)
京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ 39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ ;
大小(0.526KB)
京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ;
大小(1.026KB)
京EA31276 2016-01-2309:25:00 116.7739757QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ39.9181911QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ;
测试结果:
列值大小 | 性能( 条/S) | 性能(MB/S) |
0.026KB | 240240 | 6.24 |
0.126KB | 160160 | 20 .6 |
0.526KB | 80080 | 40.2 |
1.026KB | 43680 | 43.7 |
测试结论:在相同列情况下,每一列的值越大,写入越快。
1.7 列数
测试目标:测试列数目对写入性能的影响。
测试原理:在每行存储值相同的情况下,列数越多,即所需的额外数据也就越多,比如时间戳和列标识等。
测试条件:10线程+11M WriteBuffer Size +120 RPC Handler+HBase表每
一行记录的总大小相同。
测试数据样例:
京EA31276 2016-01-23 09:25;0 116.7739757 39.9181911 ;1 116.1156204 39.3061622 ;2 116.2767457 39.6845126 ;3 116.68283 39.785515 ;4 116.5265639 39.1413711 ;5 116.5716852 39.7667081 ;6 116.265372 39.4464143 ;7 116.4854542 39.2365814 ;8 116.8428374 39.2240418 ;9 116.2548630 39.6447603 ;10 116.118823 39.964609 ;11 116.4857220 39.3285316 ;12 116.6228221 39.2873718 ;13 116.3668650 39.1239610 ;14 116.1466091 39.7767223 ;15 116.3834822 39.5707098 ;16 116.4568834 39.7347751 ;17 116.1185462 39.44076 ;18 116.9274841 39.3211428 ;19 116.9648843 39.6537633 ;20 116.2809645 39.6609038 ;21 116.6651957 39.4583201 ;22 116.4475937 39.6767574 ;23 116.6229150 39.6681265 ;24 116.8879526 39.94014 ;25 116.9352276 39.166974 ;26 116.3062773 39.4250674 ;27 116.5362078 39.5918600 ;28 116.655801 39.3478595 ;29 116.7911087 39.3105966 ;30 116.8989259 39.3485116 ;31 116.3451064 39.6591314 ;32 116.3620885 39.8758627 ;33 116.6254488 39.262504 ;34 116.6791192 39.5431246 ;35 116.706863 39.8259232 ;36 116.5606280 39.6052318 ;37 116.8849616 39.9964014 ;38 116.400404 39.6563677 ;39 116.5915861 39.3331378 ;40 116.2414020 39.5078832 ;41 116.1466572 39.2394249 ;42 116.5861500 39.7719353 ;43 116.4466894 39.7355696 ;44 116.1568285 39.5417675 ;45 116.5938800 39.4523919 ;46 116.9459531 39.9213514 ;47 116.9611731 39.7998314 ;48 116.9873481 39.2377567 ;49 116.8488279 39.8495391 ;50 116.4287103 39.4727894 ;51 116.7807852 39.2478749 ;52 116.4720670 39.189716 ;53 116.5181192 39.6537954 ;54 116.5004762 39.4856760 ;55 116.5962549 39.1443630 ;56 116.2558550 39.4980812 ;57 116.675138 39.6755529 ;58 116.289822 39.2409512 ;59 116.906777 39.7238894
测试结果:
列数 | 存储的列值大小 | 性能(条/S ) | 性能(MB/S ) |
60 | 0.026KB | 360360 | 9.36 |
1 | 1.55KB | 1201200 | 31.2 |
测试结论:在每行存储值大小相同的情况下,列数越少越快。
2 测试总结
本次测试主要采用的是优化HBase集群参数配置和优化表结构设计,以及最大化挖掘硬件环境潜力的方法。根据以上测试结果分析,该集群对没有经过特殊处理的数据,最好的写入性能约是30W条/S;对于经过特殊处理的数据,最好的写入性能达到了120W条/S,虽然有着如此令人兴奋的结果,但是这是在没有考虑数据处理耗时的前提的下的。目前对数据处理的方法效果还是很差的,所以如何提升数据处理效率是真正实现该集群百万级写入的一个关键所在。