转载

如何处理Dubbo调用超时？

Dubbo 是阿里开源的RPC框架，因为他基于接口开发支持负载均衡、集群容错、版本控制等特性，因此现在有很多互联网公司都在使用Dubbo。

本文主要解决使用超时设置以及处理进行分析，Dubbo有三个级别的超时设置分别为：

针对方法设置超时时间
在服务方设置超时时间
在调用方设置超时时间

具体设置方法可参考Dubbo的官方文档。Dubbo调用超时后会发生啥情况呢？目前了解的会有两种情况：

TimeoutException
The timeout response finally returned at xxx

看起来还蛮正常的，但是实际上会有这样问题：调用超时后服务端还是会继续执行，该如何处理呢？为了演示超时的情况，先做了个服务：

@Service(version = "1.0")
@Slf4j
public class DubboDemoServiceImpl implements DubboDemoService {
    public String sayHello(String name) {
        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

        String result = "hello: " + name;
        log.info("Result: {}" , result);
        
        return result;
    }
}
复制代码

服务非常简单，三秒后返回字符串。然后写个controller调用它：

@RestController
@RequestMapping
public class DubboDemoController {

    @Reference(url = "dubbo://127.0.0.1:22888?timeout=2000", version = "1.0")
    private DubboDemoService demoService;


    @GetMapping
    public ResponseEntity<String> sayHello(@RequestParam("name") String name){
        return ResponseEntity.ok(demoService.sayHello(name));
    }
}
复制代码

连接 DubboDemoService 服务使用的直连方式（dubbo://127.0.0.1:22888?timeout=2000），演示中的超时时间都由url中的timeout指定。

Consumer超时处理

前面提到发生调用超时后，客户端会收到一个 TimeoutException 异常，服务端的sayHello实现中是休眠了3秒的：

public String sayHello(String name) {
        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
        ...
}
复制代码

而连接服务时指定的超时时间是2000ms,那肯定会收到一个 TimeoutException 异常：

There was an unexpected error (type=Internal Server Error, status=500).
Invoke remote method timeout. method: sayHello, provider: dubbo://127.0.0.1:22888/com.example.dubbo.dubbodemo.service.DubboDemoService?application=dubbo-demo&default.check=false&default.lazy=false&default.sticky=false&dubbo=2.0.2&interface=com.example.dubbo.dubbodemo.service.DubboDemoService&lazy=false&methods=sayHello&pid=28662&qos.enable=false&register.ip=192.168.0.103&remote.application=&revision=1.0&side=consumer&sticky=false&timeout=2000&timestamp=1571800026289&version=1.0, cause: Waiting server-side response timeout. start time: 2019-10-23 11:13:00.745, end time: 2019-10-23 11:13:02.751, client elapsed: 5 ms, server elapsed: 2000 ms, timeout: 2000 ms, request: Request [id=4, version=2.0.2, twoway=true, event=false, broken=false, data=RpcInvocation [methodName=sayHello, parameterTypes=[class java.lang.String], arguments=[name], attachments={path=com.example.dubbo.dubbodemo.service.DubboDemoService, interface=com.example.dubbo.dubbodemo.service.DubboDemoService, version=1.0, timeout=2000}]], channel: /192.168.0.103:56446 -> /192.168.0.103:22888
复制代码

客户端超时处理比较简单，既然发生了异常也能捕获到异常那该回滚还是不做处理，完全可以由开发者解决。

try{
    return ResponseEntity.ok(demoService.sayHello(name));
}catch (RpcException te){
     //do something...
    log.error("consumer", te);
    return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR.value()).body("");
}
复制代码

重点还是解决服务方的超时异常。

Provider超时处理

Provider的处理就不像客户端那样简单呢，因为Provider不会收到异常，而且线程也不会中断，这样就会导致Consumer超时数据回滚，而Providerder继续执行最终执行完数据插入成功，数据不一致。

在演示项目中，Provider方法休眠3000ms且Consumer的超时是参数是2000ms，调用发生2000ms后就会发生超时，而Provider的sayHello方法不会中断在1000ms后打印 hello xx 。

很明显要保持数据一致就需要在超时后，将Provider的执行终止或回滚才行，如何做到数据一致性呢？

重试机制

Dubbo自身有重试机制，调用超时后会发起重试，Provider端需考虑幂等性。

最终一致性

使用补偿事务或异步MQ保持最终一致性，需要写一些与业务无关的代码来保持数据最终一致性。比如在Provider端加个check方法，检查是否成功，具体实现还需要结合自身的业务需求来处理。

@GetMapping
public ResponseEntity<String> sayHello(@RequestParam("name") String name){
    try{
        return ResponseEntity.ok(demoService.sayHello(name));
    }catch (RpcException te){
         //do something...
        try{
            demoService.check(name);
        }catch (RpcException ignore){

        }
        log.error("consumer", te);
        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR.value()).body("");
    }
}
复制代码

虽然可以通过添加检查来验证业务状态，但是这个调用执行时间是没办法准确预知的，所以这样简单的检测是效果不大，最好还是通过MQ来做这样的检测。

基于时间回滚

原理比较简单，在Consumer端调用时设置两个参数 ctime 、 ttime 分别表示调用时间、超时时间，将参数打包发给Provider收到两个参数后进行操作，如果执行时间越过 ttime 则回滚数据，否则正常执行。改造下我们的代码：

public ResponseEntity<String> sayHello(@RequestParam("name") String name){
        try{
            RpcContext context = RpcContext.getContext();
            context.setAttachment("ctime", System.currentTimeMillis() + "");
            context.setAttachment("ttime", 2000 + "");

            return ResponseEntity.ok(demoService.sayHello(name));
        }catch (RpcException te){
             //do something...
            log.error("consumer", te);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR.value()).body("");
        }
    }
复制代码

将 ctime 、 ttime 两个参数传到Provider端处理：

public String sayHello(String name) {
        long curTime = System.currentTimeMillis();
        String ctime = RpcContext.getContext().getAttachment("ctime");
        String ttime = RpcContext.getContext().getAttachment("ttime");

        long ctimeAsLong = Long.parseLong(ctime);
        long ttimeAsLong = Long.parseLong(ttime);


        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

        long spent = System.currentTimeMillis() - curTime;
        if(spent >= (ttimeAsLong - ctimeAsLong - curTime)){
            throw new RpcException("Server-side timeout.");
        }

        String result = "hello: " + name;
        log.info("Result: {}" , result);
        return result;
    }
复制代码

画个图看一下执行的时间线：