部署Druid服务时遇到了启动失败的异常。相关的节点是historical和broker。异常信息如下:
2018-07-12T09:01:30,523 ERROR [main] io.druid.cli.CliHistorical - Error when starting up. Failing. com.google.inject.ProvisionException: Unable to provision, see the following errors: 1) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, druid.processing.numThreads, or druid.processing.numMergeBuffers: maxDirectMemory[4,294,967,296], memoryNeeded[5,368,709,120] = druid.processing.buffer.sizeBytes[536,870,912] * (druid.processing.numMergeBuffers[2] + druid.processing.numThreads[7] + 1) at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.DruidProcessingModule) at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.DruidProcessingModule) while locating io.druid.collections.NonBlockingPool<java.nio.ByteBuffer> annotated with @io.druid.guice.annotations.Global() for the 2nd parameter of io.druid.query.groupby.GroupByQueryEngine.<init>(GroupByQueryEngine.java:81) at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:88) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFactoryModule) while locating io.druid.query.groupby.GroupByQueryEngine for the 2nd parameter of io.druid.query.groupby.strategy.GroupByStrategyV1.<init>(GroupByStrategyV1.java:77) while locating io.druid.query.groupby.strategy.GroupByStrategyV1 for the 2nd parameter of io.druid.query.groupby.strategy.GroupByStrategySelector.<init>(GroupByStrategySelector.java:43) while locating io.druid.query.groupby.strategy.GroupByStrategySelector for the 1st parameter of io.druid.query.groupby.GroupByQueryQueryToolChest.<init>(GroupByQueryQueryToolChest.java:104) at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:101) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFactoryModule) while locating io.druid.query.groupby.GroupByQueryQueryToolChest while locating io.druid.query.QueryToolChest annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=80, type=MAPBINDER, keyType=java.lang.Class<? extends io.druid.query.Query>) at io.druid.guice.DruidBinders.queryToolChestBinder(DruidBinders.java:45) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFactoryModule -> com.google.inject.multibindings.MapBinder$RealMapBinder) while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryToolChest> for the 1st parameter of io.druid.query.MapQueryToolChestWarehouse.<init>(MapQueryToolChestWarehouse.java:36) while locating io.druid.query.MapQueryToolChestWarehouse while locating io.druid.query.QueryToolChestWarehouse for the 1st parameter of io.druid.server.QueryLifecycleFactory.<init>(QueryLifecycleFactory.java:52) at io.druid.server.QueryLifecycleFactory.class(QueryLifecycleFactory.java:52) while locating io.druid.server.QueryLifecycleFactory for the 1st parameter of io.druid.server.QueryResource.<init>(QueryResource.java:113) at io.druid.server.QueryResource.class(QueryResource.java:78) while locating io.druid.server.QueryResource ...... 12 errors at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028) ~[guice-4.1.0.jar:?] at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1050) ~[guice-4.1.0.jar:?] at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:132) ~[druid-api-0.12.1.jar:0.12.1] at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:101) [druid-services-0.12.1.jar:0.12.1] at io.druid.cli.ServerRunnable.run(ServerRunnable.java:50) [druid-services-0.12.1.jar:0.12.1] at io.druid.cli.Main.main(Main.java:116) [druid-services-0.12.1.jar:0.12.1]
异常信息中比较关键的部分是下面这一句:
Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, druid.processing.numThreads, or druid.processing.numMergeBuffers: maxDirectMemory[4,294,967,296], memoryNeeded[5,368,709,120] = druid.processing.buffer.sizeBytes[536,870,912] * (druid.processing.numMergeBuffers[2] + druid.processing.numThreads[7] + 1)
这一句指明了启动失败的原因:分配的直接内存不足。需要的直接内存大小是“5,368,709,120”,实际提供的大小是“4,294,967,296”。
并且指明了解决方案:在虚拟机参数中添加“-XX:MaxDirectMemorySize”这样一个指标。
以为这样就够了!?druid的开发人员做的实际上还要多一点,他(们)在这段异常信息中还解释了为什么需要这么多直接内存,看看异常信息中提供的计算公式:
memoryNeeded[5,368,709,120] = druid.processing.buffer.sizeBytes[536,870,912] * (druid.processing.numMergeBuffers[2] + druid.processing.numThreads[7] + 1)
为了看着方便,我手动做了下换行。
公式中使用到的配置参数如druid.processing.buffer.sizeBytes和druid.processing.numMergeBuffers等可以在historical和middleManager的runtime.properties文件中找到。方括号里的值是用户自己设置的值。
至于怎么修改这个问题,网上找到的资料多是建议在middleManager的runtime.properties的druid.indexer.runner.javaOpts配置项中进行配置,我试了一下,不中。
忘了说了,使用的druid版本是0.12.1。
成功了的做法是分别在conf/druid/historical/jvm.config和conf/druid/broker/jvm.config中添加“-XX:MaxDirectMemorySize”参数。参数值可以参考异常信息的提示酌情设置。这里是我的historical/jvm.config的配置:
-server -Xms8g -Xmx8g -XX:MaxDirectMemorySize=8G -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
设置完成后,重启,可以成功,说明配置生效。就这样。
######